We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

New York City, NY


  • Total 10 years of competitive work experience in Design, Development and Maintenance of Data Engineering applications (Data Warehouse/ETL/Hadoop).
  • Industry experience with AWS Cloud services (EC2, S3, RDS, Redshift, CFT, Athena, Spectrum, Glue) and data migration to cloud.
  • Worked with python packages like pandas, numpy, json for data analysis.
  • Installation, configuration and working knowledge on open source data ingestion tool - Apache NiFi
  • Experience in designing and deploying AWS Solutions using EC2, S3, RDS, Redshift, EBS, Elastic Load balancer (ELB).
  • Worked on data migration to Hadoop and hive query optimization.
  • Proficiency in Hadoop data formats like Avro/Parquet.
  • Working knowledge of Extract Transform Load (ETL) tool - AbInitio, Informatica, Pentaho PDI
  • Exposure to Spark and implemented a poc with spark sql.
  • Exposure to Tableau and different data visualization capabilities within it.
  • Working knowledge of configuration management tool - Ansible.
  • Sound database experience in Teradata (along with utilities), Postgres, Oracle and performance tuning.
  • Experience in Control-M/Autosys scheduling tool and running of ETL jobs through it.
  • Well conversant with Data Warehousing, Business Intelligence and Data modeling concepts.
  • Proficient with UNIX Shell Scripting and creating them for ETL job execution/automation.
  • Experience with Iterative development methodologies such as Agile-Scrum and Kanban. Experience working with Jira.


Cloud Platform: AWS (EC2, S3, RDS, Redshift, CFT, Athena, Spectrum)

Big Data: Hadoop, Apache NiFi, Hue, Hive, Zookeeper, Spark, Elasticsearch

Integration Tools: Pentaho PDI 8.1, Informatica 9.x, Ab Initio 3.2/3.1/2.15

Languages/Scripting: Unix Shell Scripting, Python, PL/SQL

RDBMS: Teradata, Postgres, Mysql, Oracle 11g, SQL Server 2008

Source Control: Git, Bitbucket

Scheduling Tool: Control-M, Autosys

Domain Experience: Banking and credit cards, Insurance, Home loan and mortgage, Healthcare


Confidential, New York City, NY

Sr Data Engineer


  • Redesigning data pipeline and moving from Pentaho/MySQL to Pentaho/Redshift for its data transformation pipeline.
  • Fetched data from various upstream applications and made it available for reporting in Redshift.
  • Hands-on experience with Redshift Spectrum and AWS Athena query services for reading the data from S3.
  • Created Python and UNIX shell scripts while interacting with different AWS services.
  • Used Python Packages for processing JSON and HDF5 file formats.
  • Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements.
  • Working knowledge of bitbucket (as version control repository) and bamboo (for CICD).

Environment: AWS (S3, Redshift, EC2, Spectrum, Athena, CFT, Glue) Pentaho Data Integration 8.1, Shell Scripting, Python, Bitbucket, Bamboo, MySQL.

Confidential, Mclean, VA

Sr Data Engineer (AWS)


  • Developed the data ingestion pipeline in Apache NiFi.
  • Performed data analysis of large data sets using python pandas, numpy and other libraries for data processing.
  • Knowledge in managing S3 data layers and databases including Redshift and Postgres.
  • Created functions in pg/pgsql and upsert sql in postgres for implementing business logic and custom validations.
  • Experience with different processors in NiFi for building data flow/pipeline and used it for batch/stream processing use cases.
  • Worked with Amazon Athena query service to analyze data from S3.
  • Devops activities - spinning up aws resources/infrastructure for team using Ansible playbooks and setup of RDS instance using CFT.
  • Rehydrating AMI’s of EC2 machines to the latest release AMI every 60 days.
  • Worked on DR Strategy by Failover to US-West from US-East and falling back to US-East.
  • Exposure with writing logs to ElasticSearch for analysis.

Environment: Apache NiFi, AWS services - (EC2, S3, RDS, Redshift, CFT, Athena), Postgres 9.5/9.6, ElasticSearch, Hive, Spark, Shell Scripting, Python, Ansible, Git

Confidential Mclean, VA

Sr Data Engineer (Hadoop)


  • Copy of file over to HDFS. Once the file is on HDFS, data quality is enforced on the dataset.
  • Clean/validated dataset is made available to data analyst group by loading into hive tables.
  • Exposure with transforming and analyzing data using Hive QL. Implemented Hive external tables
  • Implemented Partitioning, Dynamic Partitions and Buckets in hive for efficient data access.
  • Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
  • Data set registration in metadata registry.
  • Good understanding of hadoop architecture, components in hadoop eco-system and hdfs directory structure.

Environment: Hadoop, hdfs, Hive, Sqoop, Hue, Zookeeper, Shell Scripting, Python, Git

Confidential, Mclean, VA

Sr Data warehouse Developer


  • Developed ETL mappings using different transform components.
  • Created shell scripts to run the ETL mappings in prod environment.
  • Implementation and validation of ETL production release.
  • Coded Teradata upsert sql for implementing SCD type 2 logic.
  • Working knowledge of Control-M scheduler for running ETL flows.
  • Worked on defining the logical and physical data model with data modeler and created ETL mapping documents.
  • Production support to L2/L3 support teams in case of control-m job failure due to data/code issues.
  • Writing the BTEQ scripts in Teradata for loading PL and reporting tables
  • Created reports in Tableau and exposure to different functionality in Tableau.
  • Coding the data quality rules in IDQ as per requirement
  • Running the DQ rules against data for accuracy and correctness.
  • Building the scorecards for customer in IDQ

Environment: Ab Initio 3.2, Unix, Teradata, Oracle 11g, SQL Server 2008, Control-M, Informatica Data Quality (IDQ), Tableau

Confidential, Herndon, VA

Senior ETL Engineer


  • Defining the design approach and integration strategies within selected technology stack
  • Implemented Data parallelism using Multi-files, Partition and De-partition components to improve the overall performance.
  • Worked in a Sandbox environment while extensively interacting with EME to maintain version control on objects.
  • Implemented performance analysis and SQL query tuningin Oracle
  • Code scheduling with Autosys and validating the jobs by running them indev/test environment.

Environment: Ab Initio 3.0.3, Oracle 11g, SQL Server 2008, Unix, Autosys

Confidential, Phoenix, AZ

Data/ETL Engineer


  • Worked on ELT approach for data movement and loading into Data Warehouse.
  • Migration of Sybase stored procedures to Teradata. Extensive use of Teradata functions.
  • Operations performed in database:
  • Coding of stored procedures in Teradata.
  • Creating DDL’s of target and staging table.
  • Perform data loads and validate the data for accuracy/error/rejects.
  • Performance improvement through tuning of Teradata stored procedures.
  • Worked on BTEQ scripts and other Teradata utilities for loading data into tables.

Environment: Ab Initio 2.15, Sybase, Teradata Utilities, BTEQ scripts, Unix

Confidential, Phoenix, AZ

Data Engineer / ETL Developer


  • Usage of Abinitio date and string functions in transform rules.
  • Worked with COBOL Copybook and converting data from EBCIDIC format into ASCII.
  • Worked with the Data Profilerfor developing business validation rules for the required fields.
  • Created parameter sets (psets) for running of generic graphs with different values.
  • Improved the performance of graphs by using mechanisms like Check Pointing and Phasing.
  • Prepared wrapper script in UNIX for productionisation of ETL mappings/data flow.
  • Building the ETL functionality in Informatica and checking performance of application.
  • Migrating the tags from Dev to QA region and Checking-out the code into different QA sandboxes.

Environment: Ab-Initio, Informatica, UNIX, Mainframe, Control-M, Flat files, CSV, Python

Confidential, Hartford, CT

Software Engineer


  • To decide which type of data is valid or invalid for a data type.
  • Generic Ab-Initio validation functions were used for validation and cleansing of data.
  • For reading mainframe data, EBCDIC DML needs to be created. Optimization of DML for accurate interpretation of data was done.
  • Developed shell scripts for automation and ease of migration between environments.

Environment: Ab-Initio, UNIX, Shell Scripting, Mainframe

Hire Now