We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

Beaverton, OR

OBJECTIVE

  • Goal - oriented, dedicated to high levels of customer satisfaction and meeting aggressive business goals. Passionate, motivated wif a drive for excellence knowledge in Big data and Hadoop eco systems.

SUMMARY

  • Overall 7+ years of experience in IT wif 4 years in Hadoop Ecosystem, 3 years in ETL and data wif tools such as Teradata, Informatica and related automation tools.
  • Worked as Business Systems Analyst, which involved tasks such as data validation, data checks, manipulations, make teh data available wif right grain by use of sql queries. Basically, looking for answers in teh data dat makes difference.
  • Strong technical background on teh analysis of large datasets for teh understanding of complex dynamics and their impact on business.
  • Has performed activities such as development, cleaning and transforming data and descriptive and predictive analysis and visualizing results to ensure actionable insights.
  • Comfortable wif technologies such as Hive, Sqoop, Spark, Python, AWS, S3, EMR, Snowflake
  • Worked in Agile as well as waterfall methodologies.

TECHNICAL SKILLS

Bigdata: Hadoop, HDFS, MapReduce, Hive, Spark, AWS S3, EMR, EC2, Airflow

Tools: Jenkins, Quality Center, UFT (QTP), Selenium, Informatica, Autosys, Alteryx

Domain: Benefits, Finance, Digital

Languages: Python, Scala, C#, Java, HTML, SQL, Java Script, VB Script, Shell Scripting

Databases: Oracle, Teradata, MySQL, SQL Server, Snowflake

Version Control: Bitbucket, GITHUB.

PROFESSIONAL EXPERIENCE

Confidential, Beaverton, OR

Data Engineer

Responsibilities:

  • Work wif Product team to understand requirements.
  • Developed data ingestion pipelines using Airflow and pyspark and loaded teh data into Hive and snowflake tables.
  • Implemented workflows using Airflow for managing and scheduling Hadoop jobs.
  • Worked on GitHub/Bit Bucket as code repository and Jenkins as deployment to different environments.
  • Ingested data from API response (json files) and parsed teh json files to store in S3/Hive tables.
  • Experience in building real time data ingestion pipelines using NSP ( Confidential Streaming Platform).
  • Developed Jenkins pipeline for deployments.
  • Developed and designed SPARK jobs for data ingestion/aggregates.
  • Used HDFS and AWS S3 as storage to build Hive tables.
  • Developed copy scripts to move aggregated/integrated tables data to Snowflake and Atana.
  • Experience in working wif multiple data formats - Parquet, Avro, Json, Xml, Csv
  • Using Confluence for Project Documentation.

Environment: Spark, AWS, EMR, S3, Hive, NSP, KAFKA, Spark Streaming, Python, airflow, Teradata, Presto, Atana, snowflake

Confidential, Beaverton, OR

Data Engineer/Snowflake Engineer

Responsibilities:

  • Work wif Product team to understand requirements.
  • Good experience on Airflow dag creation.
  • Very Good experience on Hive and Spark jobs creation.
  • Monitor, support and troubleshoot Airflow and Spark jobs.
  • Hands on experience on Bitbucket and GitHub.
  • Hands on experience using Jenkins.
  • Creating Spark jobs efficiently wif data cache, coalesce, repartition methods to improve performance.
  • Good experience on Agile.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake

Confidential, Hillsboro, OR

Data Engineer/Snowflake Engineer

Responsibilities:

  • Develop, monitoring and troubleshoot Hive & Spark on AWS EMR.
  • Working on ingesting data from Teradata to snowflake.
  • Snowflake user support.
  • Co-ordinated wif report developers to refresh tableau data sources.
  • Using GitHub/Bit Bucket as code repository.
  • Sending airflow production jobs status report to stakeholders.
  • Update confluence page wif job status.
  • Building Jenkin pipelines as part of DevOps model.
  • Developed Canary queries and integrated wif MCF to find issues before business users.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake

Confidential, Beaverton, OR

Data Engineer

Responsibilities:

  • Monitoring and troubleshoot Hive & Spark on AZURE.
  • Developed wrapper scripts in Shell and Python.
  • Co-ordinated wif report developers to refresh tableau data sources.
  • Using GitHub/Bit Bucket as code repository.
  • Sending airflow production jobs status report to stakeholders.
  • Prepare production jobs daily status report.
  • Prepare failed jobs status report and reason for failures.
  • Update confluence page wif job status.
  • Using Confluence for Project Documentation.
  • Building Jenkin pipelines as part of DevOps model.
  • Developed Canary queries and integrated wif MCF to find issues before business users.

Confidential, Beaverton, OR

Product support Analyst

Responsibilities:

  • Collaborate wif users and technical teams to understand requirements.
  • Monitoring and troubleshoot Hive & Spark on AWS EMR.
  • Scheduled daily jobs in Airflow.
  • Developed DAG’s for daily production run.
  • Developed wrapper scripts in Shell and Python.
  • Co-ordinated wif report developers to refresh tableau data sources.
  • Using GitHub/Bit Bucket as code repository.
  • Helped teh tableau team integrate wif HiveServer2 for reporting.
  • Prepare production jobs daily status report.
  • Prepare failed jobs status report and reason for failures.
  • Update confluence page wif job status.
  • Using Confluence for Project Documentation.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, Snowflake

Confidential, MN

Hadoop Developer

Responsibilities:

  • Data ingesting from Teradata, Oracle and SAP, GSS JSON files into AWS S3.
  • ETL process implemented in Spark using data frames on EMR.
  • Each etl process implemented wif bop and eop quantities to avoid multiple query executions while reporting.
  • Creating Airflow DAGs for backfill, history and incremental loads.
  • Creating Spark jobs efficiently wif data cache, coalesce, repartition methods to improve performance.
  • Creating HDFS staging tables to improve throughput to write into S3 parquet files.
  • Developing Unit test functions in Spark.
  • Building Jenkin pipelines as part of DevOps model.
  • Developed PyUnit framework for unit testing.
  • Developed Canary queries and integrated wif MCF to find issues before business users.

Environment: CDH, Spark, AWS, EMR, S3, Hive, Python, Zookeeper, Oracle, Teradata, Sqoop

Confidential

Informatica developer

Responsibilities:

  • Handled batch processing for export/import functionalities into various heterogeneous systems.
  • Design and development of Informatica mappings to move data from SAP to Teradata.
  • Created unit test cases for testing Informatica mappings.
  • Involved in batch design.
  • Migrating code from one environment to other using labels query and deployment group.
  • Helped in setting up development assurance process.
  • Created complex sql’s for data validation as well as for data analysis.
  • Provide database administration for testing and development database servers. Creation of database objects like tables, indexes, stored procedures and views etc.
  • Wrote extensive SQL and PL/SQL Procedures, Functions.
  • Monitored database activities, performance, and size. Increase teh size if required and identified and resolved performance issues.
  • Tuned & Optimized SQL queries using indexes and table partitioning.
  • Analyzed teh database tables and indexes and tan rebuilt teh indexes if there is fragmentation in indexes.
  • Planned Backup and Recovery using Cold Backup. Responsible for building & deploying teh DB Utilities required in QA Environment
  • Built teh Client Restore/back-up utility for QA Clients to enable QA Automation Executions by bringing teh clients to desired App State.
  • Responsible for applying teh DB dumps in to Development Environment from Production (typically based on Customer Tickets/requests).
  • Responsible for applying DB Scripts/Patches as part of Hot Fixes/Version Migrations.

Environment: Teradata R13, Informatica 9.1, unix, SAP BW, Oracle

We'd love your feedback!