Data Engineer Resume Beaverton, OR - Hire IT People

OBJECTIVE

Goal - oriented, dedicated to high levels of customer satisfaction and meeting aggressive business goals.
Passionate, motivated with a drive for excellence knowledge in Big data and Hadoop eco systems.

SUMMARY

Overall 7+ years of experience in IT with 4 years in Hadoop Ecosystem, 3 years in ETL and data with tools such as Teradata, Informatica and related automation tools.
Worked as Business Systems Analyst, which involved tasks such as data validation, data checks, manipulations, make the data available with right grain by use of sql queries.
Basically, looking for answers in the data that makes difference.
Strong technical background on the analysis of large datasets for the understanding of complex dynamics and their impact on business.
Have performed activities such as development, cleaning and transforming data and descriptive and predictive analysis and visualizing results to ensure actionable insights.
Comfortable with technologies such as Hive, Sqoop, Spark, Python, AWS, S3, EMR, Snowflake
Worked in Agile as well as waterfall methodologies.

TECHNICAL SKILLS

Bigdata: Hadoop, HDFS, MapReduce, Hive, Spark, AWS S3, EMR, EC2, Airflow

Tools: Jenkins, Quality Center, UFT (QTP), Selenium, Informatica, Autosys, Alteryx

Domain: Benefits, Finance, Digital

Languages: Python, Scala, C#, Java, HTML, SQL, Java Script, VB Script, Shell Scripting

Databases: Oracle, Teradata, MySQL, SQL Server, Snowflake

Version Control: Bitbucket, GITHUB.

PROFESSIONAL EXPERIENCE

Confidential, Beaverton, OR

Data Engineer

Responsibilities:

Work with Product team to understand requirements.
Developed data ingestion pipelines using Airflow and pyspark and loaded the data into Hive and snowflake tables.
Implemented workflows using Airflow for managing and scheduling Hadoop jobs.
Worked on GitHub/Bit Bucket as code repository and Jenkins as deployment to different environments.
Ingested data from API response (json files) and parsed the json files to store in S3/Hive tables.
Experience in building real time data ingestion pipelines using NSP ( Confidential Streaming Platform).
Developed Jenkins pipeline for deployments.
Developed and designed SPARK jobs for data ingestion/aggregates.
Used HDFS and AWS S3 as storage to build Hive tables.
Developed copy scripts to move aggregated/integrated tables data to Snowflake and Athena.
Experience in working with multiple data formats - Parquet, Avro, Json, Xml, Csv
Using Confluence for Project Documentation.

Environment: Spark, AWS, EMR, S3, Hive, NSP, KAFKA, Spark Streaming, Python, airflow, Teradata, Presto, Athena, snowflake

Confidential, Beaverton, OR

Data Engineer/Snowflake Engineer

Responsibilities:

Work with Product team to understand requirements.
Good experience on Airflow dag creation.
Very Good experience on Hive and Spark jobs creation.
Monitor, support and troubleshoot Airflow and Spark jobs.
Hands on experience on Bitbucket and GitHub.
Hands on experience using Jenkins.
Creating Spark jobs efficiently with data cache, coalesce, repartition methods to improve performance.
Good experience on Agile.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake

Confidential, Hillsboro, OR

Data Engineer/Snowflake Engineer

Responsibilities:

Develop, monitoring and troubleshoot Hive & Spark on AWS EMR.
Working on ingesting data from Teradata to snowflake.
Snowflake user support.
Co-ordinated with report developers to refresh tableau data sources.
Using GitHub/Bit Bucket as code repository.
Sending airflow production jobs status report to stakeholders.
Update confluence page with job status.
Building Jenkin pipelines as part of DevOps model.
Developed Canary queries and integrated with MCF to find issues before business users.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake

Confidential, Beaverton, OR

Data Engineer

Responsibilities:

Monitoring and troubleshoot Hive & Spark on AZURE.
Developed wrapper scripts in Shell and Python.
Co-ordinated with report developers to refresh tableau data sources.
Using GitHub/Bit Bucket as code repository.
Sending airflow production jobs status report to stakeholders.
Prepare production jobs daily status report.
Prepare failed jobs status report and reason for failures.
Update confluence page with job status.
Using Confluence for Project Documentation.
Building Jenkin pipelines as part of DevOps model.
Developed Canary queries and integrated with MCF to find issues before business users.

Confidential, Beaverton, OR

Product support Analyst

Responsibilities:

Collaborate with users and technical teams to understand requirements.
Monitoring and troubleshoot Hive & Spark on AWS EMR.
Scheduled daily jobs in Airflow.
Developed DAG’s for daily production run.
Developed wrapper scripts in Shell and Python.
Co-ordinated with report developers to refresh tableau data sources.
Using GitHub/Bit Bucket as code repository.
Helped the tableau team integrate with HiveServer2 for reporting.
Prepare production jobs daily status report.
Prepare failed jobs status report and reason for failures.
Update confluence page with job status.
Using Confluence for Project Documentation.

Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, Snowflake

Confidential, MN

Hadoop Developer

Responsibilities:

ETL process implemented in Spark using data frames on EMR.
Each etl process implemented with bop and eop quantities to avoid multiple query executions while reporting.
Creating Airflow DAGs for backfill, history and incremental loads.
Creating Spark jobs efficiently with data cache, coalesce, repartition methods to improve performance.
Creating HDFS staging tables to improve throughput to write into S3 parquet files.
Developing Unit test functions in Spark.
Building Jenkin pipelines as part of DevOps model.
Developed PyUnit framework for unit testing.
Developed Canary queries and integrated with MCF to find issues before business users.

Environment: CDH, Spark, AWS, EMR, S3, Hive, Python, Zookeeper, Oracle, Teradata, Sqoop

Confidential

Informatica developer

Responsibilities:

Handled batch processing for export/import functionalities into various heterogeneous systems.
Design and development of Informatica mappings to move data from SAP to Teradata.
Created unit test cases for testing Informatica mappings.
Involved in batch design.
Migrating code from one environment to other using labels query and deployment group.
Helped in setting up development assurance process.
Created complex sql’s for data validation as well as for data analysis.
Provide database administration for testing and development database servers.
Creation of database objects like tables, indexes, stored procedures and views etc.
Wrote extensive SQL and PL/SQL Procedures, Functions.
Monitored database activities, performance, and size. Increase the size if required and identified and resolved performance issues.
Tuned & Optimized SQL queries using indexes and table partitioning.
Analyzed the database tables and indexes and then rebuilt the indexes if there is fragmentation in indexes.
Planned Backup and Recovery using Cold Backup. Responsible for building & deploying the DB Utilities required in QA Environment
Built the Client Restore/back-up utility for QA Clients to enable QA Automation Executions by bringing the clients to desired App State.
Responsible for applying the DB dumps in to Development Environment from Production (typically based on Customer Tickets/requests).
Responsible for applying DB Scripts/Patches as part of Hot Fixes/Version Migrations.

Environment: Teradata R13, Informatica 9.1, unix, SAP BW, Oracle

We provide IT Staff Augmentation Services!

Data Engineer Resume

Beaverton, OR

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship