We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Beaverton, OR

OBJECTIVE

  • Goal - oriented, dedicated to high levels of customer satisfaction and meeting aggressive business goals.
  • Passionate, motivated with a drive for excellence knowledge in Big data and Hadoop eco systems.

SUMMARY

  • Overall 7+ years of experience in IT with 4 years in Hadoop Ecosystem, 3 years in ETL and data with tools such as Teradata, Informatica and related automation tools.
  • Worked as Business Systems Analyst, which involved tasks such as data validation, data checks, manipulations, make the data available with right grain by use of sql queries.
  • Basically, looking for answers in the data that makes difference.
  • Strong technical background on the analysis of large datasets for the understanding of complex dynamics and their impact on business.
  • Have performed activities such as development, cleaning and transforming data and descriptive and predictive analysis and visualizing results to ensure actionable insights.
  • Comfortable with technologies such as Hive, Sqoop, Spark, Python, AWS, S3, EMR, Snowflake
  • Worked in Agile as well as waterfall methodologies.

TECHNICAL SKILLS

Bigdata: Hadoop, HDFS, MapReduce, Hive, Spark, AWS S3, EMR, EC2, Airflow

Tools: Jenkins, Quality Center, UFT (QTP), Selenium, Informatica, Autosys, Alteryx

Domain: Benefits, Finance, Digital

Languages: Python, Scala, C#, Java, HTML, SQL, Java Script, VB Script, Shell Scripting

Databases: Oracle, Teradata, MySQL, SQL Server, Snowflake

Version Control: Bitbucket, GITHUB.

PROFESSIONAL EXPERIENCE

Confidential, Beaverton, OR

Data Engineer

Responsibilities:

  • Work with Product team to understand requirements.
  • Developed data ingestion pipelines using Airflow and pyspark and loaded the data into Hive and snowflake tables.
  • Implemented workflows using Airflow for managing and scheduling Hadoop jobs.
  • Worked on GitHub/Bit Bucket as code repository and Jenkins as deployment to different environments.
  • Ingested data from API response (json files) and parsed the json files to store in S3/Hive tables.
  • Experience in building real time data ingestion pipelines using NSP ( Confidential Streaming Platform).
  • Developed Jenkins pipeline for deployments.
  • Developed and designed SPARK jobs for data ingestion/aggregates.
  • Used HDFS and AWS S3 as storage to build Hive tables.
  • Developed copy scripts to move aggregated/integrated tables data to Snowflake and Athena.
  • Experience in working with multiple data formats - Parquet, Avro, Json, Xml, Csv
  • Using Confluence for Project Documentation.

Environment: Spark, AWS, EMR, S3, Hive, NSP, KAFKA, Spark Streaming, Python, airflow, Teradata, Presto, Athena, snowflake

Confidential, Beaverton, OR

Data Engineer/Snowflake Engineer

Responsibilities:

  • Work with Product team to understand requirements.
  • Good experience on Airflow dag creation.
  • Very Good experience on Hive and Spark jobs creation.
  • Monitor, support and troubleshoot Airflow and Spark jobs.
  • Hands on experience on Bitbucket and GitHub.
  • Hands on experience using Jenkins.
  • Creating Spark jobs efficiently with data cache, coalesce, repartition methods to improve performance.
  • Good experience on Agile.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake

Confidential, Hillsboro, OR

Data Engineer/Snowflake Engineer

Responsibilities:

  • Develop, monitoring and troubleshoot Hive & Spark on AWS EMR.
  • Working on ingesting data from Teradata to snowflake.
  • Snowflake user support.
  • Co-ordinated with report developers to refresh tableau data sources.
  • Using GitHub/Bit Bucket as code repository.
  • Sending airflow production jobs status report to stakeholders.
  • Update confluence page with job status.
  • Building Jenkin pipelines as part of DevOps model.
  • Developed Canary queries and integrated with MCF to find issues before business users.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake

Confidential, Beaverton, OR

Data Engineer

Responsibilities:

  • Monitoring and troubleshoot Hive & Spark on AZURE.
  • Developed wrapper scripts in Shell and Python.
  • Co-ordinated with report developers to refresh tableau data sources.
  • Using GitHub/Bit Bucket as code repository.
  • Sending airflow production jobs status report to stakeholders.
  • Prepare production jobs daily status report.
  • Prepare failed jobs status report and reason for failures.
  • Update confluence page with job status.
  • Using Confluence for Project Documentation.
  • Building Jenkin pipelines as part of DevOps model.
  • Developed Canary queries and integrated with MCF to find issues before business users.

Confidential, Beaverton, OR

Product support Analyst

Responsibilities:

  • Collaborate with users and technical teams to understand requirements.
  • Monitoring and troubleshoot Hive & Spark on AWS EMR.
  • Scheduled daily jobs in Airflow.
  • Developed DAG’s for daily production run.
  • Developed wrapper scripts in Shell and Python.
  • Co-ordinated with report developers to refresh tableau data sources.
  • Using GitHub/Bit Bucket as code repository.
  • Helped the tableau team integrate with HiveServer2 for reporting.
  • Prepare production jobs daily status report.
  • Prepare failed jobs status report and reason for failures.
  • Update confluence page with job status.
  • Using Confluence for Project Documentation.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, Snowflake

Confidential, MN

Hadoop Developer

Responsibilities:

  • ETL process implemented in Spark using data frames on EMR.
  • Each etl process implemented with bop and eop quantities to avoid multiple query executions while reporting.
  • Creating Airflow DAGs for backfill, history and incremental loads.
  • Creating Spark jobs efficiently with data cache, coalesce, repartition methods to improve performance.
  • Creating HDFS staging tables to improve throughput to write into S3 parquet files.
  • Developing Unit test functions in Spark.
  • Building Jenkin pipelines as part of DevOps model.
  • Developed PyUnit framework for unit testing.
  • Developed Canary queries and integrated with MCF to find issues before business users.

Environment: CDH, Spark, AWS, EMR, S3, Hive, Python, Zookeeper, Oracle, Teradata, Sqoop

Confidential

Informatica developer

Responsibilities:

  • Handled batch processing for export/import functionalities into various heterogeneous systems.
  • Design and development of Informatica mappings to move data from SAP to Teradata.
  • Created unit test cases for testing Informatica mappings.
  • Involved in batch design.
  • Migrating code from one environment to other using labels query and deployment group.
  • Helped in setting up development assurance process.
  • Created complex sql’s for data validation as well as for data analysis.
  • Provide database administration for testing and development database servers.
  • Creation of database objects like tables, indexes, stored procedures and views etc.
  • Wrote extensive SQL and PL/SQL Procedures, Functions.
  • Monitored database activities, performance, and size. Increase the size if required and identified and resolved performance issues.
  • Tuned & Optimized SQL queries using indexes and table partitioning.
  • Analyzed the database tables and indexes and then rebuilt the indexes if there is fragmentation in indexes.
  • Planned Backup and Recovery using Cold Backup. Responsible for building & deploying the DB Utilities required in QA Environment
  • Built the Client Restore/back-up utility for QA Clients to enable QA Automation Executions by bringing the clients to desired App State.
  • Responsible for applying the DB dumps in to Development Environment from Production (typically based on Customer Tickets/requests).
  • Responsible for applying DB Scripts/Patches as part of Hot Fixes/Version Migrations.

Environment: Teradata R13, Informatica 9.1, unix, SAP BW, Oracle

We'd love your feedback!