We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

NJ

SUMMARY

  • Highly experienced Data Engineer with Around 2 years experience inData Extraction, Data Modelling, Data Wrangling, Statistical Modeling,Data Mining, Machine LearningandData Visualization.
  • Comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive).
  • Extensively worked on Spark for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
  • Excellent Programming skills at a higher level of abstraction using Python.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Experienced in working with in - memory processing framework like Spark Transformations, SparkQL, MLib and Spark Streaming.
  • Proficient inMachine Learning algorithmandPredictive modelingincludingRegression Models,Decision Tree.
  • Proficient inStatistical MethodologiesincludingHypothetical Testing,ANOVA,Time Series,Principal Component Analysis,Factor Analysis,Cluster Analysis,Discriminant Analysis.
  • Knowledge onNatural Language Processing (NLP)algorithm andText Mining.
  • Worked inlarge scale databaseenvironment likeHadoopandMap Reduce, with workingmechanismof Hadoop clusters, nodes andHadoop Distributed File System (HDFS).
  • Strong experience withPython (2.x,3.x)to developanalytic modelsandsolutions.
  • Working experience inHadoop ecosystemandApache Spark frameworksuch asHDFS,Map Reduce,HiveQL,Spark SQL, PySpark.
  • Proficient in data visualization tools such asTableau,Python Matplotlib,R Shinyto create visually powerful and actionable interactive reports and dashboards.
  • ExcellentTableau Developer, expertise in building, publishing customized interactivereportsanddashboardswith customized parameters and user - filters usingTableau(9.x/10.x).
  • Experienced inAgilemethodology andSCRUMprocess.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

TECHNICAL SKILLS

Languages: Python, R, C++, Java, SQL, JavaScript, HTML, CSS

Databases: MySQL, SQL Server, MongoDB

Tools: Tableau, SAS, Rapid Miner, TensorFlow, MapReduce/Hadoop, Power BI, JIRA, GitHub, Scikit-learn, AWS

Data Analytics &Machine Learning: Linear Regression, Logistic Regression, K-Mean, KNN, Decision Trees, Cluster Analysis, Neural Networks, Naïve Bayes

Statistical Tools: Time Series, Regression models, principal component analysis, Hypothesis Testing, A/B Testing, Confidence Intervals, T-test, ANOVA, NLP

Operating System: Windows, Linux, Mac

Methodologies: Agile, SDLC, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, NJ

Data Engineer

Responsibilities:

  • Utilized the Waterfall methodology of development for Requirements, Planning, Design, and Deployment.
  • Worked on data cleaning and ensure data quality, consistency, and integrity using NumPy, SciPy, and Pandas.
  • Worked in using collections in Python for manipulating and looping through different user-defined objects.
  • Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
  • UtilizedODBC for connectivity to Teradata via MS Excel to retrieve automatically from Teradata Database.
  • Data Analysis, logical, physical, and Data Modeling for OLAP systems.
  • Designed and deployed reports with Drill Down, Drill Through, and Drop-down menu option and Parameterized and Linked reports using Tableau.
  • Proficient with Python 3.x including Numpy, Scikit-learn, NLP, Pandas, Matplotlib and Seaborn.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Data warehousing principles like Fact Tables, Dimensional Tables, Dimensional Data Modelling - Star Schema, and Snow Flake Schema.
  • Ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server and MySQL.
  • Data validation and cleansing of staged input records was performed before loading into Data Warehouse.
  • Holding a knowledge working with Source Tree with GitHub.

Confidential, NJ

Data Engineer (Intern)

Responsibilities:

  • Experienced working in Agile Methodologies.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Red Shift.
  • Explored and analysed the customer specific features by using Spark SQL.
  • Performed data imputation using Scikit-learn package in Python.
  • Responsible for ETL development with successful design, development, and integration of components within the Talend ETL Platform and Java Technology.
  • Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
  • Worked in creating complex stored procedures,SSIS packages, triggers, cursors, tables, and viewsand other SQL joins and statements for applications.
  • Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
  • Designed and implemented recommender systems which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
  • Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Confidential

Data Engineer

Responsibilities:

  • Understand the data visualization requirements from the Business Users.
  • Writing SQL queries to extract data from the Sales data marts as per the requirements.
  • Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
  • Tuning Oracle databases and tuning applications (SQL), for tuning SQL working closely with development Team.
  • Developing containment scripts for data reconciliation using SQL and Python.
  • Performed data analysis and data profiling using complex SQL on various sources systems including MySQL and Teradata.
  • Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica)
  • Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.

We'd love your feedback!