Data Engineer Resume
0/5 (Submit Your Rating)
SUMMARY
- 5+ years of experience in statistical data analysis, model building, data cleaning, data acquirement and data visualization.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice versa.
- Collected data from different sources like web servers and social media for storing in HDFS and analyzing the data using other Hadoop technologies.
- Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Extensive experience in analyzing social media data and optimization of social media content.
- Strong knowledge in importing and exporting data into data visualization tools such as Tableau, SSRS, R, Python, MS excel etc. from various sources: JSON, CSV, MS/Excel, txt, SPSS.sav, Stata.dta using RJSONIO, xlsx and foreign package.
- Vast knowledge in R coding by gdata, Hmisc, plyr, reshape2, stringr, timeDate, XML, splitstackshape, Caret, Rattle, CreditMetrics packages for data analysis in Mac, Unix and Windows environments
- Expertise in R in data parsing, data manipulation and data preparation. Methods include: describe data contents, compute descriptive statistics of data, regex, split and combine, remap, merge, subset, reindex, melt and reshape.
- Experienced in acquiring, merging, cleaning, analyzing and mining structured, semi - structured and un-structured data sets for analysis on Big data platforms such as Hadoop, and Impala
- Created real time machine learning algorithm for fraud and error detection using R and SQL and implemented the algorithms on real time data flow tasks.
- Carried data analysis on 300+ TB data using MS SQL server.
- Capable of writing machine learning based algorithm using spark MLLIB libraries.
- Designed several self-serve BI data mart for non-technical users using Sql server reporting services and SSAS OLAP cubes.
- Experience of creating easy to interpret dashboards and reports using tools such as SSRS, Tableau, R, python, MS excel, Dundas dashboard etc.
- Created several types of visualizations such as linear, planer, volumetric, temporal, hierarchical, network etc.
- Used agile methodology to execute recent projects and waterfall for older projects.
- Excellent troubleshooting and debugging abilities. Self-starter and a team player with analytical, communication and interpersonal skills with an aptitude for learning.
TECHNICAL SKILLS
Languages: R, Java - J2EE, PHP, T-SQL
Databases: Hive, Impala, MS SQL Server, Oracle, mySQL, PostGreSQL
Operating Systems: Linux/Unix, WINDOWS XP /7/8.X
Data Analysis: Clustering, Classification, Regression analysispredictive analysis, forecasting
Framework and Architecture: Hadoop-Map reduce, SOA, Hibernate, GWT, Java EEMVC
Methodologies/Tools/Technologies: Visual Studio 2008/2010, Cloud computing, AgileEclipse, GWT, Data modelling
BI Tools: Microsoft Business Intelligence, SSRS, SSAS, SSISDundas Dashboard, Excel, Tableau
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Wrote various programming logic to acquire data from various structured, semi-structured and unstructured sources, and performed data mining on processed data.
- Created several Hadoop clusters ranging from 2 to 32 nodes on Amazon EC2 instances, for data analysis and performance benchmarking projects.
- Utilized factor analysis, logistic regression, decision tree, principal component analysis (PCA) to discover the relationships among a wide variety of socioeconomic factors and demographics.
- Performed data analysis on cloud platform using Amazon AWS and Hadoop-Hive.
- Created several dashboards, reports and OLAP cubes for very technical to non-technical audience on data sets on 300+ Terabytes.
- Used K nearest neighbour clustering approach for real time fraud and data error detection.
- Performed sentiment analysis on Facebook feeds of company's Facebook page using R's sentiment package and Naive Bayes algorithm.
- Designed, modelled and implemented enterprise BI, data analysis and data warehousing solutions using Microsoft BI suite.
- Created several dashboards, automated reports, self-serve BI data marts and OLAP cubes to perform statistical analysis for non-technical audience.
- Developed various web based systems on J2EE, PHP, GWT and wrote APIs using Java and R to access data from RSS feeds, Facebook, twitter and other web based data sources.
- Performed statistical data analysis on data capacity of 300+ terabytes with 50+ Gigabytes of daily data addition.
Confidential
Software Engineer
Responsibilities:
- Got exposure of working in start-up where I worked on end to end development and analysis.
- Analysed data and created daily, weekly, monthly and ad-hoc reports and dashboards.
- Created dynamic reports with live data feeds.
- Developed web applications based on SOA architecture integrated with data warehouse.
- Performed data analysis such as simple regression, time series and forecasting