We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Bentonville, AR


  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining, Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization and reporting solutions that scales across massive volume of structured and unstructured Data.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, features scaling, features engineering, statistical modeling (Decision Trees, Regression Models, Neural Networks, Support Vector Machine (SVM), Clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and Data Visualization.
  • Experienced on R and Python (pandas, numpy, scikit-learn) for statistical computing. Also experience with MLLib (Spark), MATLAB, Excel, Minitab, SPSS, and SAS.
  • Experienced on Implementing Service Oriented Architecture (SOA) using Web Services and JMS (Java Messaging Service). Experienced in MVC (ModelViewController) architecture and various J2EE design patterns like singleton and factory design patterns.
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, BI, Client/Server applications and writing ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
  • Proficient in Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, SVM, KNN, K-mean clustering.
  • Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods including describe Data contents, compute descriptive statistics of Data, regex, split and combine, remap, merge, subset, reindex, melt and reshape. Good understanding of web design based on HTML5, CSS3, and JavaScript.
  • Hands on experience with Big Data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSQL. Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary Data for analysis from different sources, prepared Data for Data exploration using Data munging.
  • Deep understanding of MapReduce with Hadoop and Spark. Good knowledge of Big Data ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Spark (SparkSQL, Spark MLLib, Spark Streaming).
  • Excellent performance in building, publishing customized interactive reports and dashboards with customized parameters including producing tables, graphs, listings using various procedures and tools such as Tableau and user-filters using Tableau.


Databases: Oracle, MySQL, SQLite, NO SQL, RDBMS, SQL Server 2014, HBase 1.2, MongoDB 3.2. Teradata, Netezza. Cassandra

Database Tools: PL/SQL Developer, Toad, SQL Loader.

Web Programming: Html, CSS, Xml, JavaScript.

Programming Languages: R, Python, SQL, Scala, UNIX, C

DWH BI Tools: Data Stage 9.1, 11.3, Tableau Desktop, D3.js

Machine Learning: Regression, clustering, SVM, Decision trees, Classification, Recommendation systems, Association Rules, Survival Analysis etc

Data Visualization: Qlikview, Tableau9.4/9.2, ggplot2 (R), D3, Zeeplin

Bigdata Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon EC2, S3 and Red Shift), Spark, Storm, Impala, Kafka.

Technologies/Tools: Azure Machine Learning, SPSS, Rattle, Caffe, Tensor flow, Informatica, Elastic Search, NIFI, Apache Theano, Torch, Keras, NumPy.

Modeling tools: Autosys, Control - M. Rational Suite, System Architect, Sentry, Toad Suite, ERWin, ER Studio, PowerDesigner

Operating Systems: AIX, LINUX, UNIX, HP-UX, Windows; Azure, AWS; VMWare, EMC, Solaris

Methodologies: ER/AOM/Dimensional/MOLAP/ROLAP/HOLAP; IAAS/SAAS/PAAS; DevOps/Agile, Kimball/Inmon, TOGAF/DODAF/FEAF/Zachman/ITIL; SOA/Microservices; RUP/UML/JEE/.NET


Confidential, Bentonville, AR

Sr. DATA Engineer


  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Bigdata technologies. Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.
  • Involved in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams with all related entities and relationship with each entity based on the rules provided by the business manager using ERWIN r9.6.
  • Analysis of functional and non-functional categorized data elements for data profiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
  • Responsible for gathering requirements, system analysis, design, development, testing and deployment and Responsible to manipulate HTML5, CSS3 in jQuery and also provided dynamic functionality using AJAX, XML and JSON. Configured DNS, DHCP, APACHE, and Send mail, Bugzilla, CVS and SAMBA on Linux servers on different Linux versions.
  • Accomplished financial tests to ensure compliance with CCAR, BASEL, Dodd-Frank and Sarbanes-Oxley using SQL, Oracle, SAS, DB2, Teradata, and MS Access; being proficient with business intelligence tools such as SSIS, SSRS, TOAD, SAS Enterprise Guide, Teradata SQL Assistant, VBA, Tableau, and Actimize.
  • Involved in development of the enterprise social network application using Python, Twisted, and Cassandra and responsible for setting up Python REST API framework and spring frame work using Django.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Built a customer master data management system with Machine Learning driven probabilistic matching and rule based deterministic matching with input data from 11 channels.
  • Evaluated big data technologies and prototype solutions to improve our data processing architecture. Data modeling, development and administration of relational and NoSQL databases (Big Query, Elastic Search)
  • Used Kafka for live streaming data and performed analytics on it. Worked on Sqoop to transfer the data from relational database and Hadoop.
  • Troubleshoot, debug and upgrade existing applications. Develop unit test plans and perform execution of unit tests, provide support to automate system testing and user acceptance testing.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Participated in all phases of Machine Learning and Data Mining; data collection, data cleaning, developing models, validation, visualization. Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services). Using Informatica & SSIS, SPSS, and SAS to extract transform & load source data from transaction systems.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Involved with writing scripts in Oracle, SQL Server and Netezza databases to extract data for reporting and analysis and Worked in importing and cleansing of data from various sources like DB2, Oracle, flat files onto SQL Server with high volume data
  • Migrated the DataStage ETL code of Fixed Index Annuities of different admin systems to Oracle packages to streamline the process for better maintenance/ease of support with no impact to performance.
  • Developed automated data pipelines from various external data sources (web pages, API etc) to internal data warehouse (SQL server, AWS), then export to reporting tools.
  • Used Informatica power center for (ETL) extraction, transformation and loading data from heterogeneous source systems and studied and reviewed application of Kimball data warehouse methodology as well as SDLC across various industries to work successfully with data-handling scenarios, such as data
  • Connected to AWS RedShift through Tableau to extract live data for real time analysis and worked on Normalization and De-normalization concepts and design methodologies like Ralph Kimball Data Warehouse methodology. Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLLib libraries.
  • Worked on ERwin for developing data model using star schema methodologies and collaborated with other data modeling team members to ensure design consistency and integrity.

Confidential, Bothell, WA

DATA Scientist


  • Architected, Designed and Developed Business applications and Data marts for reporting. Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Developed Big Data solutions focused on pattern matching and predictive modeling. Worked on Amazon Redshift and AWS a solution to load data, create data models and run BI on it.
  • Developed various operational Drill-through and Drill-down reports using SSRS. Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS)
  • Used advanced features of T-SQL in order to design and tune T-SQL to interface with the Database
  • Designed OLTP system environment and maintained documentation of Metadata. Used forward engineering approach for designing and creating databases for OLAP model.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts. Developed Hive and MapReduce tools to design and manage HDFS data blocks and data distribution methods.
  • Worked closely with the ETL Developers in designing and planning the ETL requirements for reporting, as well as with business and IT management in the dissemination of project progress updates, risks, and issues.
  • Worked on AWS S3 bucket integration for application and development projects. Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
  • Created HBase tables to store various data formats of PII data coming from different portfolios. Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS. Worked with MDM systems team with respect to technical aspects and generating reports. Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra.
  • Designed both 3NF data models for OLTP systems and dimensional data models using star and snow flake Schemas.
  • Participated in Normalization /De-normalization, Normal Form and database design methodology. Expertise in using data modeling tools like MS Visio and Erwin Tool for logical and physical design of databases.
  • Involved in Planning, Defining and Designing data base using Erwin on business requirement and provided documentation.
  • Used of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Participated in several facets of MDM implementations including Data Profiling, metadata acquisition and data migration.
  • Develop consumer based features and applications using Python, Django, HTML, behavior Driven Development (BDD) and pair based programming.
  • Designed and developed components using Python with Django framework. Implemented code in python to retrieve and manipulate data.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Derived insights from machine learning algorithm using SAS to analyze web log files and campaign data to recommend/improve promotional opportunities
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, Power BI and Smart View.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs. Worked in tuning Hive and Pig scripts to improve performance.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast. Extracted the data from Teradata into HDFS using Sqoop.
  • Developed TWS workflow for scheduling and orchestrating the ETL process. Functional, non-functional and performance testing of key systems prior to cutover to AWS
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Confidential, Irving, TX

Data Engineer


  • Migrated the Django database from SQLite to MySQL to PostgreSQL with complete data integrity and Designed, developed and deployed CSV Parsing using the big data approach on AWS EC2.
  • Developed tools using Python 3.6/3.4.6, Shell scripting, XML to automate some of the menial tasks. Interfacing with supervisors, artists, systems administrators and production to ensure production deadlines are met. Developed frontend and backend modules using Python on Django including Tasty Pie Web Framework using Git.
  • Supported various client projects and internal efforts involving AWS Engineering, managing the Innovation Center specifically the Big Data Platform & Analytics tools-platforms available on Hortonworks.
  • Administered and monitored multi Data center Cassandra cluster based on the understanding of the Cassandra Architecture.
  • Created automated archive process to remove unused tables to ensure optimal database speed. Implemented 3rd party data transformation process using Redshift, Lambda S3, Kinesis & EDI Exchange software reducing integration time by a factor of 10.
  • Involved in general application development using Python, JavaScript, HTML/CSS, and AngularJS, with strong integration with Cloud Technologies.
  • Involved in the migration from Sqlite3 to Apache Cassandra database. Cassandra data model designing, implementation, maintaining and monitoring.
  • Configured various big data workflows to run on top of Hadoop and these workflows comprise of heterogeneous jobs like MapReduce and Involve in evaluating existing server and virtualization environments for needed and useful upgrade opportunities.
  • Designed Spark based real-time data ingestion and real-time analytics, Wrote Kafka producer to synthesize alarms using Scala also used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled structured data using Spark SQL.
  • Built Single Page Applications (SPA), Responsive Web Design (RWD) UI, Rich Restful Service Applications, and HTML Wireframes using HTML5 Grid Structures/Layouts, CSS3 Media Queries, Ajax, AngularJS and Bootstrap.
  • Built a new CI pipeline. Testing and deployment automation with Docker, Jenkins and Puppet. Utilized continuous integration and automated deployments with Jenkins and Docker.
  • Involved in development of Python APIs to dump the array structures in the Processor at the failure point for debugging, used Django APIs for database access.
  • Used ETL warehouse IBM DataStage for filtering and visualize the raw data, Created Server instances on AWS and installed Swagger for deploying Microservices.
  • Used Amazon Web Services (AWS) for improved efficiency of storage and fast access and Working on Development & testing of many features for dashboard using Python, Java, Bootstrap, CSS3, JavaScript and JQuery.
  • Analyzed data and identify leading SaaS, PaaS or IaaS solutions for clients. Involved in front end and utilized Bootstrap and Angular.js for page design and using the advanced python packages like numpy, Scipy for various sophisticated numerical and scientific calculations.
  • Developed ETL (Extraction, Transformation and Loading) procedures and Data Conversion Scripts using Pre-Stage, Stage, Pre-Target and Target tables.
  • Developed web applications implementing MVT/MVC architecture using Django, Flask, Webapp2 and spring web application frameworks.
  • Dynamic implementation of SQL server work on website using SQL developer tool and Experience with continuous integration and automation using Jenkins and Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
  • Designed and implemented remote upgrade system mostly in Clojure and Deploy and monitor scalable infrastructure on Amazon web services (AWS) & configuration management using puppet.


Data Engineer


  • Wrote complex PL/SQL queries, scripts and stored procedures to support data integrity issues for large Oracle database applications. Analyzes, identifies and resolves data issues by creating complex scripts to resolve data conditions and anomalies
  • Involved in last phases of Software Development Life Cycle including Code Re-Design, Implementation, Bug-fixing, Performance Testing, Penetration Testing, Debugging and Documentation.
  • Involved in generating the reports of the results of the scripts to analyze the necessities by using data visualization toll tableau.
  • Involved in developing web applications using Django Framework to implement the model view control architecture.
  • Designed and coded Hibernate Plug-In for Spring ORM mapping and implemented HQLs by creating DAO, which connects to Oracle DB, to persist and retrieve data.
  • Implemented spring security for SQL injunction and user access privileges, Used various Java, J2EE design patterns like DAO, DTO, Singleton etc.
  • Involved in Developing a Restful service using Python Flask framework.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Involved on the Exposure of Multi-Threading factory to distribute learning process back-testing and the into various worker processes.
  • Worked on Unix Socket is used in a client-server application framework and worked on Linux server virtualization by creating Linux VM's for server consolidations.
  • Created entire application using Python, Django, MySQL and Linux and Created data pipelines using Apache Spark, a big-data processing and computing framework.
  • Developed the presentation layer using HTML, CSS, JavaScript, JQuery and AJAX and Used JQuery libraries for all client side JavaScript manipulations.
  • Designed and created backend data access modules using PL/SQL stored procedures and Oracle.
  • Created PDF reports using Golang and XML documents to send it to all customers at the end of month.
  • Developed a fully automated continuous integration system using Git, Gerrit, Jenkins, MySQL and custom tools developed in Python and Bash.
  • Designed object model, data model, tables, constraints, necessary stored procedures, functions, triggers, and packages for Oracle Database.
  • Designed and created backend data access modules using PL/SQL stored procedures and Oracle along Used SAX/DOM Parser for parsing the data to Oracle Database.
  • Designed object model, data model, tables, constraints, necessary stored procedures, functions, triggers, and packages for Oracle Database.
  • Automated the existing scripts for performance calculations using Numpy and SQLAlchemy.
  • Interacted with QA to develop test plans from high-level design documentation.
  • Involved in Using AWS Cloud Watch performed Monitoring, customized metrics and file logging.
  • Participated in requirement gathering and worked closely with the architect in designing and modeling.
  • Worked on development of SQL and stored procedures on MYSQL and Designed and developed a horizontally scalable APIs using Python Flask.

Data Analyst



  • Worked with Data Warehouse team in developing Dimensional Model and analyzing the ER-Diagrams
  • Identified and analyze stakeholders and subject areas.
  • Participated in Business Analysis, talking to business Users and determining the entities and attributes for Data Model.
  • Identified and determined physical attributes and their relationships through cross-analysis of functional areas.
  • Identified and analyzed source data coming from Oracle, SQL server and flat files.
  • Extensively used ERWIN to design and restructure Logical and Physical Data Models.
  • Evaluated and enhanced current data model as per the requirements
  • Performed forward and reverse engineering, applying DDLs to database in restructuring the existing data Model using ERWIN
  • Designed ETL specification documents to load the data in target using various transformations according to the business requirements.
  • Used Informatica- Power center for extracting, transforming and loading
  • Performed Data profiling, Validation and Integration.
  • Created materialized views to improve performance and tuned the database design.
  • Involved in Data migration and Data distribution testing.
  • Performed testing, knowledge transfer and mentored other team members.

Technical Environment: Informatica, Repository Manager, Workflow Manager, ERWIN 3.0, Oracle 10g/9i, Teradata, TOAD, UNIX and Shell scripting.

Hire Now