Sr Data Engineer Resume New York City, NY - Hire IT People

SUMMARY:

Total 10 years of competitive work experience in Design, Development and Maintenance of Data Engineering applications (Data Warehouse/ETL/Hadoop).
Industry experience with AWS Cloud services (EC2, S3, RDS, Redshift, CFT, Athena, Spectrum, Glue) and data migration to cloud.
Worked with python packages like pandas, numpy, json for data analysis.
Installation, configuration and working knowledge on open source data ingestion tool - Apache NiFi
Experience in designing and deploying AWS Solutions using EC2, S3, RDS, Redshift, EBS, Elastic Load balancer (ELB).
Worked on data migration to Hadoop and hive query optimization.
Proficiency in Hadoop data formats like Avro/Parquet.
Working knowledge of Extract Transform Load (ETL) tool - AbInitio, Informatica, Pentaho PDI
Exposure to Spark and implemented a poc with spark sql.
Exposure to Tableau and different data visualization capabilities within it.
Working knowledge of configuration management tool - Ansible.
Sound database experience in Teradata (along with utilities), Postgres, Oracle and performance tuning.
Experience in Control-M/Autosys scheduling tool and running of ETL jobs through it.
Well conversant with Data Warehousing, Business Intelligence and Data modeling concepts.
Proficient with UNIX Shell Scripting and creating them for ETL job execution/automation.
Experience with Iterative development methodologies such as Agile-Scrum and Kanban. Experience working with Jira.

TECHNICAL SKILLS:

Cloud Platform: AWS (EC2, S3, RDS, Redshift, CFT, Athena, Spectrum)

Big Data: Hadoop, Apache NiFi, Hue, Hive, Zookeeper, Spark, Elasticsearch

Integration Tools: Pentaho PDI 8.1, Informatica 9.x, Ab Initio 3.2/3.1/2.15

Languages/Scripting: Unix Shell Scripting, Python, PL/SQL

RDBMS: Teradata, Postgres, Mysql, Oracle 11g, SQL Server 2008

Source Control: Git, Bitbucket

Scheduling Tool: Control-M, Autosys

Domain Experience: Banking and credit cards, Insurance, Home loan and mortgage, Healthcare

WORK EXPERIENCE:

Confidential, New York City, NY

Sr Data Engineer

Responsibilities:

Redesigning data pipeline and moving from Pentaho/MySQL to Pentaho/Redshift for its data transformation pipeline.
Fetched data from various upstream applications and made it available for reporting in Redshift.
Hands-on experience with Redshift Spectrum and AWS Athena query services for reading the data from S3.
Created Python and UNIX shell scripts while interacting with different AWS services.
Used Python Packages for processing JSON and HDF5 file formats.
Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements.
Working knowledge of bitbucket (as version control repository) and bamboo (for CICD).

Environment: AWS (S3, Redshift, EC2, Spectrum, Athena, CFT, Glue) Pentaho Data Integration 8.1, Shell Scripting, Python, Bitbucket, Bamboo, MySQL.

Confidential, Mclean, VA

Sr Data Engineer (AWS)

Responsibilities:

Developed the data ingestion pipeline in Apache NiFi.
Performed data analysis of large data sets using python pandas, numpy and other libraries for data processing.
Knowledge in managing S3 data layers and databases including Redshift and Postgres.
Created functions in pg/pgsql and upsert sql in postgres for implementing business logic and custom validations.
Experience with different processors in NiFi for building data flow/pipeline and used it for batch/stream processing use cases.
Worked with Amazon Athena query service to analyze data from S3.
Devops activities - spinning up aws resources/infrastructure for team using Ansible playbooks and setup of RDS instance using CFT.
Rehydrating AMI’s of EC2 machines to the latest release AMI every 60 days.
Worked on DR Strategy by Failover to US-West from US-East and falling back to US-East.
Exposure with writing logs to ElasticSearch for analysis.

Environment: Apache NiFi, AWS services - (EC2, S3, RDS, Redshift, CFT, Athena), Postgres 9.5/9.6, ElasticSearch, Hive, Spark, Shell Scripting, Python, Ansible, Git

Confidential Mclean, VA

Sr Data Engineer (Hadoop)

Responsibilities:

Copy of file over to HDFS. Once the file is on HDFS, data quality is enforced on the dataset.
Clean/validated dataset is made available to data analyst group by loading into hive tables.
Exposure with transforming and analyzing data using Hive QL. Implemented Hive external tables
Implemented Partitioning, Dynamic Partitions and Buckets in hive for efficient data access.
Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
Data set registration in metadata registry.
Good understanding of hadoop architecture, components in hadoop eco-system and hdfs directory structure.

Environment: Hadoop, hdfs, Hive, Sqoop, Hue, Zookeeper, Shell Scripting, Python, Git

Confidential, Mclean, VA

Sr Data warehouse Developer

Responsibilities:

Developed ETL mappings using different transform components.
Created shell scripts to run the ETL mappings in prod environment.
Implementation and validation of ETL production release.
Coded Teradata upsert sql for implementing SCD type 2 logic.
Working knowledge of Control-M scheduler for running ETL flows.
Worked on defining the logical and physical data model with data modeler and created ETL mapping documents.
Production support to L2/L3 support teams in case of control-m job failure due to data/code issues.
Writing the BTEQ scripts in Teradata for loading PL and reporting tables
Created reports in Tableau and exposure to different functionality in Tableau.
Coding the data quality rules in IDQ as per requirement
Running the DQ rules against data for accuracy and correctness.
Building the scorecards for customer in IDQ

Environment: Ab Initio 3.2, Unix, Teradata, Oracle 11g, SQL Server 2008, Control-M, Informatica Data Quality (IDQ), Tableau

Confidential, Herndon, VA

Senior ETL Engineer

Responsibilities:

Defining the design approach and integration strategies within selected technology stack
Implemented Data parallelism using Multi-files, Partition and De-partition components to improve the overall performance.
Worked in a Sandbox environment while extensively interacting with EME to maintain version control on objects.
Implemented performance analysis and SQL query tuningin Oracle
Code scheduling with Autosys and validating the jobs by running them indev/test environment.

Environment: Ab Initio 3.0.3, Oracle 11g, SQL Server 2008, Unix, Autosys

Confidential, Phoenix, AZ

Data/ETL Engineer

Responsibilities:

Worked on ELT approach for data movement and loading into Data Warehouse.
Migration of Sybase stored procedures to Teradata. Extensive use of Teradata functions.
Operations performed in database:
Coding of stored procedures in Teradata.
Creating DDL’s of target and staging table.
Perform data loads and validate the data for accuracy/error/rejects.
Performance improvement through tuning of Teradata stored procedures.
Worked on BTEQ scripts and other Teradata utilities for loading data into tables.

Environment: Ab Initio 2.15, Sybase, Teradata Utilities, BTEQ scripts, Unix

Confidential, Phoenix, AZ

Data Engineer / ETL Developer

Responsibilities:

Usage of Abinitio date and string functions in transform rules.
Worked with COBOL Copybook and converting data from EBCIDIC format into ASCII.
Worked with the Data Profilerfor developing business validation rules for the required fields.
Created parameter sets (psets) for running of generic graphs with different values.
Improved the performance of graphs by using mechanisms like Check Pointing and Phasing.
Prepared wrapper script in UNIX for productionisation of ETL mappings/data flow.
Building the ETL functionality in Informatica and checking performance of application.
Migrating the tags from Dev to QA region and Checking-out the code into different QA sandboxes.

Environment: Ab-Initio, Informatica, UNIX, Mainframe, Control-M, Flat files, CSV, Python

Confidential, Hartford, CT

Software Engineer

Responsibilities:

To decide which type of data is valid or invalid for a data type.
Generic Ab-Initio validation functions were used for validation and cleansing of data.
For reading mainframe data, EBCDIC DML needs to be created. Optimization of DML for accurate interpretation of data was done.
Developed shell scripts for automation and ease of migration between environments.

Environment: Ab-Initio, UNIX, Shell Scripting, Mainframe

We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship