We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

SUMMARY

  • 5+ years of experience in statistical data analysis, model building, data cleaning, data acquirement and data visualization.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice versa.
  • Collected data from different sources like web servers and social media for storing in HDFS and analyzing the data using other Hadoop technologies.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Extensive experience in analyzing social media data and optimization of social media content.
  • Strong knowledge in importing and exporting data into data visualization tools such as Tableau, SSRS, R, Python, MS excel etc. from various sources: JSON, CSV, MS/Excel, txt, SPSS.sav, Stata.dta using RJSONIO, xlsx and foreign package.
  • Vast knowledge in R coding by gdata, Hmisc, plyr, reshape2, stringr, timeDate, XML, splitstackshape, Caret, Rattle, CreditMetrics packages for data analysis in Mac, Unix and Windows environments
  • Expertise in R in data parsing, data manipulation and data preparation. Methods include: describe data contents, compute descriptive statistics of data, regex, split and combine, remap, merge, subset, reindex, melt and reshape.
  • Experienced in acquiring, merging, cleaning, analyzing and mining structured, semi - structured and un-structured data sets for analysis on Big data platforms such as Hadoop, and Impala
  • Created real time machine learning algorithm for fraud and error detection using R and SQL and implemented the algorithms on real time data flow tasks.
  • Carried data analysis on 300+ TB data using MS SQL server.
  • Capable of writing machine learning based algorithm using spark MLLIB libraries.
  • Designed several self-serve BI data mart for non-technical users using Sql server reporting services and SSAS OLAP cubes.
  • Experience of creating easy to interpret dashboards and reports using tools such as SSRS, Tableau, R, python, MS excel, Dundas dashboard etc.
  • Created several types of visualizations such as linear, planer, volumetric, temporal, hierarchical, network etc.
  • Used agile methodology to execute recent projects and waterfall for older projects.
  • Excellent troubleshooting and debugging abilities. Self-starter and a team player with analytical, communication and interpersonal skills with an aptitude for learning.

TECHNICAL SKILLS

Languages: R, Java - J2EE, PHP, T-SQL

Databases: Hive, Impala, MS SQL Server, Oracle, mySQL, PostGreSQL

Operating Systems: Linux/Unix, WINDOWS XP /7/8.X

Data Analysis: Clustering, Classification, Regression analysispredictive analysis, forecasting

Framework and Architecture: Hadoop-Map reduce, SOA, Hibernate, GWT, Java EEMVC

Methodologies/Tools/Technologies: Visual Studio 2008/2010, Cloud computing, AgileEclipse, GWT, Data modelling

BI Tools: Microsoft Business Intelligence, SSRS, SSAS, SSISDundas Dashboard, Excel, Tableau

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Wrote various programming logic to acquire data from various structured, semi-structured and unstructured sources, and performed data mining on processed data.
  • Created several Hadoop clusters ranging from 2 to 32 nodes on Amazon EC2 instances, for data analysis and performance benchmarking projects.
  • Utilized factor analysis, logistic regression, decision tree, principal component analysis (PCA) to discover the relationships among a wide variety of socioeconomic factors and demographics.
  • Performed data analysis on cloud platform using Amazon AWS and Hadoop-Hive.
  • Created several dashboards, reports and OLAP cubes for very technical to non-technical audience on data sets on 300+ Terabytes.
  • Used K nearest neighbour clustering approach for real time fraud and data error detection.
  • Performed sentiment analysis on Facebook feeds of company's Facebook page using R's sentiment package and Naive Bayes algorithm.
  • Designed, modelled and implemented enterprise BI, data analysis and data warehousing solutions using Microsoft BI suite.
  • Created several dashboards, automated reports, self-serve BI data marts and OLAP cubes to perform statistical analysis for non-technical audience.
  • Developed various web based systems on J2EE, PHP, GWT and wrote APIs using Java and R to access data from RSS feeds, Facebook, twitter and other web based data sources.
  • Performed statistical data analysis on data capacity of 300+ terabytes with 50+ Gigabytes of daily data addition.

Confidential

Software Engineer

Responsibilities:

  • Got exposure of working in start-up where I worked on end to end development and analysis.
  • Analysed data and created daily, weekly, monthly and ad-hoc reports and dashboards.
  • Created dynamic reports with live data feeds.
  • Developed web applications based on SOA architecture integrated with data warehouse.
  • Performed data analysis such as simple regression, time series and forecasting

We'd love your feedback!