Sr Data Engineer Resume
New York City, NY
SUMMARY:
- Total 10 years of competitive work experience in Design, Development and Maintenance of Data Engineering applications (Data Warehouse/ETL/Hadoop).
- Industry experience with AWS Cloud services (EC2, S3, RDS, Redshift, CFT, Athena, Spectrum, Glue) and data migration to cloud.
- Worked with python packages like pandas, numpy, json for data analysis.
- Installation, configuration and working knowledge on open source data ingestion tool - Apache NiFi
- Experience in designing and deploying AWS Solutions using EC2, S3, RDS, Redshift, EBS, Elastic Load balancer (ELB).
- Worked on data migration to Hadoop and hive query optimization.
- Proficiency in Hadoop data formats like Avro/Parquet.
- Working knowledge of Extract Transform Load (ETL) tool - AbInitio, Informatica, Pentaho PDI
- Exposure to Spark and implemented a poc with spark sql.
- Exposure to Tableau and different data visualization capabilities within it.
- Working knowledge of configuration management tool - Ansible.
- Sound database experience in Teradata (along with utilities), Postgres, Oracle and performance tuning.
- Experience in Control-M/Autosys scheduling tool and running of ETL jobs through it.
- Well conversant with Data Warehousing, Business Intelligence and Data modeling concepts.
- Proficient with UNIX Shell Scripting and creating them for ETL job execution/automation.
- Experience with Iterative development methodologies such as Agile-Scrum and Kanban. Experience working with Jira.
TECHNICAL SKILLS:
Cloud Platform: AWS (EC2, S3, RDS, Redshift, CFT, Athena, Spectrum)
Big Data: Hadoop, Apache NiFi, Hue, Hive, Zookeeper, Spark, Elasticsearch
Integration Tools: Pentaho PDI 8.1, Informatica 9.x, Ab Initio 3.2/3.1/2.15
Languages/Scripting: Unix Shell Scripting, Python, PL/SQL
RDBMS: Teradata, Postgres, Mysql, Oracle 11g, SQL Server 2008
Source Control: Git, Bitbucket
Scheduling Tool: Control-M, Autosys
Domain Experience: Banking and credit cards, Insurance, Home loan and mortgage, Healthcare
WORK EXPERIENCE:
Confidential, New York City, NY
Sr Data Engineer
Responsibilities:
- Redesigning data pipeline and moving from Pentaho/MySQL to Pentaho/Redshift for its data transformation pipeline.
- Fetched data from various upstream applications and made it available for reporting in Redshift.
- Hands-on experience with Redshift Spectrum and AWS Athena query services for reading the data from S3.
- Created Python and UNIX shell scripts while interacting with different AWS services.
- Used Python Packages for processing JSON and HDF5 file formats.
- Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements.
- Working knowledge of bitbucket (as version control repository) and bamboo (for CICD).
Environment: AWS (S3, Redshift, EC2, Spectrum, Athena, CFT, Glue) Pentaho Data Integration 8.1, Shell Scripting, Python, Bitbucket, Bamboo, MySQL.
Confidential, Mclean, VA
Sr Data Engineer (AWS)
Responsibilities:
- Developed the data ingestion pipeline in Apache NiFi.
- Performed data analysis of large data sets using python pandas, numpy and other libraries for data processing.
- Knowledge in managing S3 data layers and databases including Redshift and Postgres.
- Created functions in pg/pgsql and upsert sql in postgres for implementing business logic and custom validations.
- Experience with different processors in NiFi for building data flow/pipeline and used it for batch/stream processing use cases.
- Worked with Amazon Athena query service to analyze data from S3.
- Devops activities - spinning up aws resources/infrastructure for team using Ansible playbooks and setup of RDS instance using CFT.
- Rehydrating AMI’s of EC2 machines to the latest release AMI every 60 days.
- Worked on DR Strategy by Failover to US-West from US-East and falling back to US-East.
- Exposure with writing logs to ElasticSearch for analysis.
Environment: Apache NiFi, AWS services - (EC2, S3, RDS, Redshift, CFT, Athena), Postgres 9.5/9.6, ElasticSearch, Hive, Spark, Shell Scripting, Python, Ansible, Git
Confidential Mclean, VA
Sr Data Engineer (Hadoop)
Responsibilities:
- Copy of file over to HDFS. Once the file is on HDFS, data quality is enforced on the dataset.
- Clean/validated dataset is made available to data analyst group by loading into hive tables.
- Exposure with transforming and analyzing data using Hive QL. Implemented Hive external tables
- Implemented Partitioning, Dynamic Partitions and Buckets in hive for efficient data access.
- Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
- Data set registration in metadata registry.
- Good understanding of hadoop architecture, components in hadoop eco-system and hdfs directory structure.
Environment: Hadoop, hdfs, Hive, Sqoop, Hue, Zookeeper, Shell Scripting, Python, Git
Confidential, Mclean, VA
Sr Data warehouse Developer
Responsibilities:
- Developed ETL mappings using different transform components.
- Created shell scripts to run the ETL mappings in prod environment.
- Implementation and validation of ETL production release.
- Coded Teradata upsert sql for implementing SCD type 2 logic.
- Working knowledge of Control-M scheduler for running ETL flows.
- Worked on defining the logical and physical data model with data modeler and created ETL mapping documents.
- Production support to L2/L3 support teams in case of control-m job failure due to data/code issues.
- Writing the BTEQ scripts in Teradata for loading PL and reporting tables
- Created reports in Tableau and exposure to different functionality in Tableau.
- Coding the data quality rules in IDQ as per requirement
- Running the DQ rules against data for accuracy and correctness.
- Building the scorecards for customer in IDQ
Environment: Ab Initio 3.2, Unix, Teradata, Oracle 11g, SQL Server 2008, Control-M, Informatica Data Quality (IDQ), Tableau
Confidential, Herndon, VA
Senior ETL Engineer
Responsibilities:
- Defining the design approach and integration strategies within selected technology stack
- Implemented Data parallelism using Multi-files, Partition and De-partition components to improve the overall performance.
- Worked in a Sandbox environment while extensively interacting with EME to maintain version control on objects.
- Implemented performance analysis and SQL query tuningin Oracle
- Code scheduling with Autosys and validating the jobs by running them indev/test environment.
Environment: Ab Initio 3.0.3, Oracle 11g, SQL Server 2008, Unix, Autosys
Confidential, Phoenix, AZ
Data/ETL Engineer
Responsibilities:
- Worked on ELT approach for data movement and loading into Data Warehouse.
- Migration of Sybase stored procedures to Teradata. Extensive use of Teradata functions.
- Operations performed in database:
- Coding of stored procedures in Teradata.
- Creating DDL’s of target and staging table.
- Perform data loads and validate the data for accuracy/error/rejects.
- Performance improvement through tuning of Teradata stored procedures.
- Worked on BTEQ scripts and other Teradata utilities for loading data into tables.
Environment: Ab Initio 2.15, Sybase, Teradata Utilities, BTEQ scripts, Unix
Confidential, Phoenix, AZ
Data Engineer / ETL Developer
Responsibilities:
- Usage of Abinitio date and string functions in transform rules.
- Worked with COBOL Copybook and converting data from EBCIDIC format into ASCII.
- Worked with the Data Profilerfor developing business validation rules for the required fields.
- Created parameter sets (psets) for running of generic graphs with different values.
- Improved the performance of graphs by using mechanisms like Check Pointing and Phasing.
- Prepared wrapper script in UNIX for productionisation of ETL mappings/data flow.
- Building the ETL functionality in Informatica and checking performance of application.
- Migrating the tags from Dev to QA region and Checking-out the code into different QA sandboxes.
Environment: Ab-Initio, Informatica, UNIX, Mainframe, Control-M, Flat files, CSV, Python
Confidential, Hartford, CT
Software Engineer
Responsibilities:
- To decide which type of data is valid or invalid for a data type.
- Generic Ab-Initio validation functions were used for validation and cleansing of data.
- For reading mainframe data, EBCDIC DML needs to be created. Optimization of DML for accurate interpretation of data was done.
- Developed shell scripts for automation and ease of migration between environments.
Environment: Ab-Initio, UNIX, Shell Scripting, Mainframe