We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Dallas, TX


  • Over 9 year of experience as Sr.Data Engineer including designing, developing and implementation of datamodels for enterprise - level applications and systems.
  • Design the Data Pipeline which can capture data from streaming web data as well as RDBMS source data
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
  • Vast experience in building Azure stack(including but not limited to Key Vault, ADW, Data Factory, CosmosDB,Event Hub, Stream Analytics)
  • Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Management Systems (RDBMS) and from RDBMS to HDFS.
  • Extensive experience on Hadoop, HDFS, Hive, Pig, MapReduce, Spark
  • Expertise in designing Star schema, Snowflake schema for Data Warehouse, by using tools like Erwin data modeler, Power Designer, and E-R Studio.
  • Responsible for Data cleansing & transformation of timeseries data with PySpark
  • Extensively using open source languages Perl, Python and Scala
  • Extensive experience in development of OLAP, OLTP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
  • Experience in developing Map Reduce Programs using ApacheHadoop for analyzing the big data as per the requirement.
  • Good understanding of AWS, big data concepts and Hadoop ecosystem.
  • Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
  • Good working on real time data integration using Kafka, Spark streaming and HBase.
  • Designing the Data Marts in dimensionaldatamodeling using star and snowflakeschemas.
  • Extensive experience working with XML, Schema Designing and XML data.
  • Excellent knowledge and extensively using NOSQL databases (HBase).
  • Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS).
  • Experience in setting up connections to different RDBMSDatabases like Oracle, Teradata according to users requirement.
  • Knowledge in job workflow scheduling and monitoring tools like Oozie.
  • Excellent Knowledge of Ralph Kimball and Bill Inmon's approaches to Data Warehousing.
  • Good knowledge of Data Marts, Dimensional Data Modeling with RalphKimball Methodology using Analysis Services.
  • Experiencein Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation in ETL
  • Experience with Tableau in analysis and creation of dashboard and user stories.
  • Experience with SQL Server and T-SQL in constructing Temporary Tables, Table variables, Triggers, userfunctions, views, Stored Procedures.
  • Strong experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, DataImport, and Data Export
  • Experience in writing and executing unit, system, integration and UATscripts in a datawarehouseprojects.
  • Experienced in writing UNIX shell scripting and hands on experienced with scheduling of shell scripts using Control-M.
  • Experience in Batch processes, Import, Export, Backup, Database Monitoring tools and Application support.


Data Modeling Tools: Erwin 9.8/9.7, Sybase Power Designer, Oracle Designer, ER/Studio V17

Big Data tools: Hadoop 3.0, HDFS, Hive 2.3, Pig 0.17, Scala, HBase 1.2, Sqoop 1.4, Kafka1.1, Oozie4.3

Database Tools: Oracle 12c/11g, Teradata 15/14 and

ETL Tools: SSIS, Informatica v10.

Programming Languages: SQL, T-SQL, Python, UNIX shells scripting, PL/SQL.

Operating Systems: Microsoft Windows 10/8/7, UNIX, and Linux

Project Execution Methodologies: JAD, Agile, SDLC.


Confidential - Dallas, TX

Sr. Data Engineer


  • Implemented datapipeline infrastructure to support machine learning systems
  • Responsibility included the full SDLC management for designing, analyzing, developing, testing, Implementation and application support
  • Initiated and conducted JADsessions inviting various teams to finalize the required datafields and their formats.
  • Createddata pipelines in cloud using Azure Data Factory.
  • Authored Python (Pyspark) Scripts for custom UDF’s for Row/ Column manipulations, merges,aggregations, stacking, data labeling and for all Cleaning and conforming tasks.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Implemented DimensionalModeling using Star and SnowFlakeSchema, Identifying Facts and Dimensions, Physical and logical data modeling using Erwin
  • Created Data Dictionary and Data Mapping from Sources to the Target in MDM Data Model.
  • Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Data Lake, Data Factory, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Performed Hiveprogramming for applications that were migrated to Bigdata using Hadoop
  • Built and maintained SQLscripts, indexes, and complex queries for data analysis and extraction.
  • Assisted in the oversight for compliance to the Enterprise Data Standards, datagovernance and data quality.
  • Provided analytical reports to end users using tableau in integration with Hadoop.
  • Converted SQLprocedures in SQL Server to SparkSQL
  • Replaced the existing MapReduce programs and HiveQueries into Spark application using Scala.
  • Defined Key control management using key vault API Azure services
  • Implemented of Azurecloudsolution using HDInsight, Event Hubs, CosmosDB, cognitive services and KeyVault.
  • Developed Scalaapplications on Hadoop and Spark SQL for high - volume and real-time data processing.
  • Designed and developed a Data Lake using Hadoop for processing raw and processed claims via Hive.
  • Built solutions for real time data processing using Kafkamirroring, Kafkaqueues, Key Vault, AzureDataWarehouse, Power BI, Analysis Services, Cosmos DB
  • Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL access on Hadoop data
  • Preprocessed data using pig scripts so that it can be used for data analysis.
  • Used Oozie to automate the process of loading data into HDFS.
  • Developed and automated multiple departmental Reports using Tableau and MSExcel.
  • Worked and transformed structured, semi structured and unstructured data and loaded into Hbase.
  • Moved data onto HDFS from local system and vice versa.
  • Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data.
  • Used SSRS to create reports, customized Reports, on-demand reports, ad-hoc reports and involved in analyzing multi-dimensional reports in SSRS.

Environment: Erwin9.8, Azure, ETL, Kafka 1.1,MDM, SQL, Hive2.3, Python, Pyspark, Big data3.0, Hadoop3.0, Spark SQL, Tableau, Scala, CosmosDB, HDInsight, MapReduce, HDFS, MS Excel, HBase1.2, T-SQL, Oozie4.3,Pig0.17 SSRS.

Confidential - Nashville, TN

Data Engineer


  • Used the Agile Scrum methodology to build the different phases of Softwaredevelopmentlifecycle.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Transformed data which is moved onto HDFS into single file using PigScripts, Python.
  • Designed the LogicalDataModel using Erwin with the entities and attributes for each subject areas
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Designed new SAS programs by analyzing requirements, constructing workflow charts and diagrams, studying system capabilities and writing specifications.
  • Exported and imported data from RDBMS in different countries to Hadoop using sqoop.
  • Created ComplexSQLQueries using Views, Indexes, Triggers, Roles, Stored procedures and User Defined Functions Worked with different methods of logging in SSIS.
  • Performed Verification, Validation, and Transformations on the Input data (Text files, XML files) before loading into target database.
  • Extracted the source data from Oracle tables, sequential files and excel sheets.
  • Loaded data into Hive Tables from HadoopDistributed File System (HDFS) to provide SQLaccess on Hadoopdata
  • Extensively used Kafka along with HBase and Apache Hive.
  • Tested the ETLprocess for both before data validation and after data validation process
  • Formed numerous Volatile, Global, Set, Multi-Set tables on Teradata
  • Generated parameterized queries for generating tabular reports using global variables, expressions,functions, and stored procedures using SSRS.
  • Created Hivequeries that helped analysts spot emerging trends by comparing fresh datawith EDW reference tables and historical metrics.
  • Developed and automated multiple departmental Reports using TableauSoftware.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from NoSQL and a variety of portfolios.
  • Developed and implemented data cleansing, data security, data profiling and datamonitoring processes.
  • Involved in using MSAccess to dump the data and analyze based on business needs.

Environment: Erwin9.7, Agile, Python, Pig0.17, AWS, MapReduce, SAS, SQL, SSIS, XML, Hadoop3.0 Hive2.3, Kafka1.1,HBase1.2, ETL, Teradata15, SSRS, Access,Tableau.

Confidential - Houston, TX

Data Modeler/Data Engineer


  • Extensively performed Data analysis using Python Pandas.
  • Worked in a Scrum Agile process& Writing Stories with two week iterations delivering product for each iteration
  • Extracted Mega Data from Amazon Redshift using SQLQueries to create reports.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Worked on data integration and workflow application on SSIS platform and responsible for testing all new and existing ETL data warehouse components.
  • Performed data analysis and dataprofiling using complexSQL on various sources systems including OraclePL/SQL.
  • Enforced referential integrity in the OLAPandOLTP data model for consistent relationship between tables and efficient database design.
  • Created tables in Hive and loaded the structured (resulted from Map Reduce jobs) data
  • Implemented Star Schemamethodologies in modeling and designing the logicaldatamodel into DimensionalModels.
  • SAP Power Designer is the data modeling tool used for creating physical data models and effective model management of sharing and reusing model information.
  • Designed and implemented in areas related to Teradatautilities such as Fast Export and MLOAD for handling numerous tasks.
  • Moved data from Hivetables into HBase for real time analytics on Hive tables.
  • Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
  • Used Data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Used SQLProfiler for monitoring and troubleshooting performance issues in T-SQL code and storedprocedures.
  • Performed UAT testing before Production Phase of the database components being built.
  • Manipulated, cleansing & processing data using Excel, and SQL. Responsible for loading, extracting and validation of client data.

Environment: SAPPowerDesigner, Agile, Python, Redshift, SQL, HDFS, Hive, Map Reduce, SSIS, ETL, PL/SQL, Oracle, OLAP, OLTP, HBase, Excel, T-SQL, Teradata.

Confidential - Atlanta, GA

Data Analyst/Data Modeler


  • Performed Data Modeling, Database Design, and Data Analysis with the extensive use of ER/Studio.
  • Developed Pythonprograms for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Developed normalized Logical and Physicaldatabasemodels for designing an OLTPapplication.
  • Involved in extensive Data validation by writing several complexSQLqueries and Involved in back-end testing and worked with data quality issues.
  • Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.
  • Created data flow, process documents and ad-hoc reports to derive requirements for existing system enhancements.
  • Performed extensive data analysis to identify various data quality issues with the data coming from external systems
  • Synthesized and translated Business data needs into creative visualizations in Tableau
  • Introduced a Data Dictionary for the process, which simplified a lot of the work around the project.
  • Created Stored Procedures to transform the Data & worked extensively in T-SQL for various needs of the transformations while loading the data.
  • Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down, Drill through using SSRS.
  • Created Informatica Mapping, writing UNIX shell scripts and also modifying and changing the PL/SQL scripts.
  • Developed complex Stored Procedures for SSRS (SQL Server Reporting Services) and created database objects like tables, indexes etc.
  • Designed different type of Starschemas for detailed data marts and plan data marts in the OLAP
  • Extensively worked on Shellscripts for running SSIS programs in batch mode on UNIX.
  • Wrote multiple SQL queries to analyze the data and presented the results using Excel and Crystalreports.
  • Worked on SQL stored procedures, functions and packages in Oracle.

Environment: ER/Studio, Python, Teradata, OLTP, SQL, Tableau, SSRS, T-SQL, OLAP, PL/SQL, UNIX, SSIS, Oracle, Excel

Hire Now