We provide IT Staff Augmentation Services!

Data Engineer Resume

Fairfax, VA


  • Over 7+ years of experience in Data Analysis, Decision Trees, Data Profiling, Data Integration, Migration and Metadata Management, Master Data Management and Configuration Management
  • Experience in various phases of the Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Data Validation, Test Plans, Source to Target mappings, SQL Joins, Data Cleansing.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Proficient in data mining tools like R, SAS, Python, SQL, Excel, Big Data Hadoop eco - systems Staff leadership and development
  • Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents
  • Patterns within data, analyzing data and interpreting results
  • Excellent knowledge on Perl & UNIX and expertise lies in Data Modeling, Database design and implementation of Oracle, AWS Redshift databases and Administration, Performance tuning etc.
  • Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy
  • Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing.
  • Possess functional knowledge in the areas of business process study, requirement capture, analysis, project documentation and training
  • Highly competent at researching, visualizing and analyzing raw data to identify recommendations for meeting organizational challenges.
  • Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice versa
  • Expert in data flow between primary DB and various reporting tools, Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly.
  • Experienced working with Excel Pivot and VBA macros for various business scenarios and involved in data Transformation using Pig scripts in AWS EMR, AWS RDS.
  • Ability to use dimensionality reduction techniques and regularization techniques.
  • Expertise in Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Proficient in data mining tools like R, SAS, Python, SQL, Excel, Big Data Hadoop eco-systems.
  • Extracting and modeling datasets from verity of data sources like Hadoop (using Pig, Hive, Spark), Teradata and Snowflakes for ad-hoc analysis and have fair understanding of AGILE methodology and practice.
  • Working knowledge on Application design, architecture and development.
  • Experienced in complete SDLC and STLC with end-user interaction for functional specification, system analysis, and unit regression testing; participated in system integration testing
  • Experienced in working in a team environment to deliver on demand service; ability to deliver appropriate quality solutions under pressure; pro-active and strong analytical problem-solving skills
  • Participated in portfolio meetings; experienced in preparing hi-level design documents, low-level design documents, and detailed technical design documents using case scenarios.


Bigdata Tools &Frameworks: Apache Spark, Spark MLIB

Data Modeling tools: Erwin, ER/Studio, Star-Schema Modelling, Snowflake-Schema Modelling, FACT and dimension tables, Pivot Tables.

Open Source Libraries: Scikit-Learn, Pandas, Numpy, Matplotlib

Data Mining: Data reduction, Clustering, Classification, Anomaly detection, Text mining

Machine Learning: Regression (Linear, Ridge, Lasso and Elastic Net), Classification (Decision Trees, Logistic Regression, Naïve Bayes, k-nearest neighbors), Ensemble methods (Bagging, Adaboost, Functional Gradient Boosting), Clustering (K-means, Mixture Models)

Deep Learning: Artificial Neural Networks, Recurrent Neural Networks (Gated Recurrent Units, LSTMs), Convolutional Neural Networks

Programming Languages: Data structures, Algorithms, Python, R., Tensor Flow, SQL, Spark


Confidential - Fairfax, VA

Data Engineer


  • Worked as Data Scientist in extraction data and preparing data according to business requirements.
  • Understand customer business use cases and be able to translate them to analytical data applications and models with a vision on how to implement.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Worked on data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and preparing data sets.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau and extracted the data from MySQL, AWS into HDFS using Sqoop.
  • Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
  • Extracted Text data from XML files and performed topic modeling on top of it.
  • Working on AWS and architecting a solution to load data create data models and run BI on it and developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Developed Map Reduce/Spark modules for machine learning & predictive analytics in Hadoop.
  • Similarly extracted the useful data from CSV files and performed analytics on them.
  • Merged and matched different data sets from different data sources.
  • Worked extensively with python in optimization of the code for better performance.
  • Evaluate the performance of various Topic modeling algorithms using Text analytics/Mining.
  • Used the Agile Scrum methodology to build the different phases of Software development life cycle.
  • Communicate with team members, leadership, and Director on findings to ensure models are well understood and incorporated into business processes.

Environment: Python, R, Hadoop, Hive, Pig, AWS, Apache Spark, SQL Server 2014, Tableau Desktop, Microsoft Excel, Pyspark, Linux, Azure

Confidential - South Plainfield, NJ

Data Engineer/Data Scientist


  • Data mining using state-of-the-art methods.
  • Extending company's data with third party sources of information when needed.
  • Enhancing data collection procedures to include information that is relevant for building analytic systems
  • Processing, cleansing, and verifying the integrity of data used for analysis.
  • Doing ad-hoc analysis and presenting results in a clear manner.
  • Creating automated anomaly detection systems and constant tracking of its performance.
  • Strong command of data architecture and data modelling techniques.
  • Hands on experience with commercial data mining tools such as R and Python depending on job requirements.
  • Worked on AWS utilities such as EMR, S3 and Cloud watch to run and monitor jobs on AWS.
  • Utilizing NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
  • Knowledge in ML& Statistical libraries (e.g. Scikit-learn, Pandas).
  • Having knowledge to build predictive models to forecast risks for product launches and operations and help predict workflow.
  • Coordinated with Data Architects on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
  • Having experience with visualization technologies such as Tableau.
  • Draw inferences and conclusions, and create dashboards and visualizations of processed data, identify trends, anomalies.
  • Generation of TLFs and summary reports, etc. ensuring on-time quality delivery.
  • Participated in client meetings, teleconferences and video conferences to keep track of project requirements, commitments made and the delivery thereof.
  • Solved analytical problems, and effectively communicate methodologies and results.
  • Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.
  • Foster culture of continuous engineering improvement through mentoring, feedback, and metrics

Environment: R Programming, Python, Jupyter, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, SQL Server Management Studio, Hadoop, AWS, Business Intelligence Development Studio, SAP Business Objects and Business Intelligence.


Data Scientist


  • Used Tableau to automatically generate reports. Worked with partially adjudicated insurance flat files, internal records, 3rd party data sources, JSON, XML and more.
  • Experienced in building models by using Spark (Pyspark, SparkSQL, Spark MLLib, and Spark ML).
  • Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with big data tools, solve the data storage issue and work on deployment solution.
  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, space-time.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Knowledge extraction from Notes using NLP (Python, NLTK, MLLIB, PySpark,)
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Built and optimized data mining pipelines of NLP, and text analytic to extract information.
  • Coded R functions to interface with Caffe Deep Learning Framework
  • Working in Amazon Web Services cloud computing environment
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • Established Data architecture strategy, best practices, standards, and roadmaps.
  • Performed data cleaning and imputation of missing values using R.
  • Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forward/Reverse Engineered Databases.
  • Built and optimized data mining pipelines of NLP, and text analytic to extract information.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
  • Creating customized business reports and sharing insights to the management.
  • Take up ad-hoc requests based on different departments and locations.
  • Used Hive to store the data and perform data cleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, Qlikview, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential - Morristown, NJ

Data Analyst


  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Designed, implemented and automated modeling and analysis procedures on existing and experimentally created data.
  • Implemented a job which leads an electronic medical record, extract data into Oracle Database and generate an output.
  • Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
  • Parsed data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format. Developed clustering models for customer segmentation using Python.
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Reverse Engineering the reports and identified Data Elements (in the source system) . Dimensions, Facts and Measures required for reports.
  • Developed logical and Physical data models using ER winto design OLTP system for different applications.
  • Worked with DBA group to create Best-Fit Physical Data Model with DDL from the Logical Data Model using Forward engineering.
  • Analyzed the data and provide the insights about the customers using Tableau.
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW)
  • Used Teradata utilities such as Fast Export, Multi LOAD for handling various tasks.
  • Created dynamic linear models to perform trend analysis on customer transactional data in Python.
  • Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
  • Created entity process association matrices using Zachman Framework, functional decomposition diagrams and data flow diagrams from business requirements documents.
  • Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Generated ad-hoc repots using Crystal Reports 9 and SQL Server Reporting Services (SSRS).

Environment: Teradata SQL Assistant, Teradata Loading utilities (Bteq, FastLoad, MultiLoad), Python, UNIX, Tableau, MS Excel, MS Power Point, Business Objects, Oracle

Confidential - Oldsmar, FL

Data Analyst


  • Applied Forward Elimination and Backward Elimination for data sets to identify most statically significant variables for Data analysis.
  • Utilized Label Encoders and One-Hot Encoder in Python to create dummy variables for geographic locations to identify their impact on pre-acquisition and post acquisitions by using 2 sample paired t test.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Worked with ETL SQL Server Integration Services (SSIS) for data investigation and mapping to extract data and applied fast parsing and enhanced efficiency.
  • Developed Data Science content involving Data Manipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL for Data Extraction.
  • Built Analytical systems, data structures, gather and manipulate data, using statistical techniques.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Designing suite of Interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
  • Provided and created data presentation to reduce biases and telling true story of people by pulling millions of rows of data using SQL and performed Exploratory Data Analysis.
  • Applied breadth of knowledge in programming (R, Python), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop).
  • Migrated data from Heterogeneous Data Sources and legacy system (DB2, Access, Excel) to centralized SQL Server databases using SQL Server Integration Services (SSIS).
  • Applied Descriptive statistics and Inferential Statistics on varies data attributes using SPSS to draw insights of data regarding providing products and services for patients.
  • Developed and utilized various machine learning algorithms such as Logistic Regression, Decision trees, Neural Network models, Hybrid recommendation model and NLP for data analysis.
  • Integrated SAS datasets into Excel using Dynamic Data Exchange, using SAS to analyze data, statistical tables, listings and graphs for reports.
  • Performed data analysis on the datasets using Proc Print, Proc Sort, Proc Transpose, Proc Means, Proc Summary, Proc Tabulate, Proc Univariate And Proc Freq.
  • Performed Data management like Merging, concatenating, interleaving of SAS datasets using MERGE, UNION and SET statements in DATA step and PROC SQL.
  • Experience in using SAS to read, write, import and export to another data file formats including delimited files, spreadsheet, Microsoft excel and access tables.
  • Utilized data reduction techniques such as Factor analysis to identify most correlated values to underlying factors of the data and categorized the variable according to factors.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS by using HQL queries in Hadoop.
  • Performance Tuning: Analyze the requirements and fine tune the stored procedures/queries to improve the performance of the application.
  • Created experimental design for validating correct implementation of RNN and custom TensorFlow optimizers.
  • Developed various Tableau9.4 Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.

Environment: R Programming, Python, Jupyter, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, SQL Server Management Studio, Tensorflow, Hadoop, Business Intelligence Development Studio, SAP Business Objects and Business Intelligence.

Hire Now