Data Engineer Resume
Beaverton, OR
OBJECTIVE
- Goal - oriented, dedicated to high levels of customer satisfaction and meeting aggressive business goals.
- Passionate, motivated with a drive for excellence knowledge in Big data and Hadoop eco systems.
SUMMARY
- Overall 7+ years of experience in IT with 4 years in Hadoop Ecosystem, 3 years in ETL and data with tools such as Teradata, Informatica and related automation tools.
- Worked as Business Systems Analyst, which involved tasks such as data validation, data checks, manipulations, make the data available with right grain by use of sql queries.
- Basically, looking for answers in the data that makes difference.
- Strong technical background on the analysis of large datasets for the understanding of complex dynamics and their impact on business.
- Have performed activities such as development, cleaning and transforming data and descriptive and predictive analysis and visualizing results to ensure actionable insights.
- Comfortable with technologies such as Hive, Sqoop, Spark, Python, AWS, S3, EMR, Snowflake
- Worked in Agile as well as waterfall methodologies.
TECHNICAL SKILLS
Bigdata: Hadoop, HDFS, MapReduce, Hive, Spark, AWS S3, EMR, EC2, Airflow
Tools: Jenkins, Quality Center, UFT (QTP), Selenium, Informatica, Autosys, Alteryx
Domain: Benefits, Finance, Digital
Languages: Python, Scala, C#, Java, HTML, SQL, Java Script, VB Script, Shell Scripting
Databases: Oracle, Teradata, MySQL, SQL Server, Snowflake
Version Control: Bitbucket, GITHUB.
PROFESSIONAL EXPERIENCE
Confidential, Beaverton, OR
Data Engineer
Responsibilities:
- Work with Product team to understand requirements.
- Developed data ingestion pipelines using Airflow and pyspark and loaded the data into Hive and snowflake tables.
- Implemented workflows using Airflow for managing and scheduling Hadoop jobs.
- Worked on GitHub/Bit Bucket as code repository and Jenkins as deployment to different environments.
- Ingested data from API response (json files) and parsed the json files to store in S3/Hive tables.
- Experience in building real time data ingestion pipelines using NSP ( Confidential Streaming Platform).
- Developed Jenkins pipeline for deployments.
- Developed and designed SPARK jobs for data ingestion/aggregates.
- Used HDFS and AWS S3 as storage to build Hive tables.
- Developed copy scripts to move aggregated/integrated tables data to Snowflake and Athena.
- Experience in working with multiple data formats - Parquet, Avro, Json, Xml, Csv
- Using Confluence for Project Documentation.
Environment: Spark, AWS, EMR, S3, Hive, NSP, KAFKA, Spark Streaming, Python, airflow, Teradata, Presto, Athena, snowflake
Confidential, Beaverton, OR
Data Engineer/Snowflake Engineer
Responsibilities:
- Work with Product team to understand requirements.
- Good experience on Airflow dag creation.
- Very Good experience on Hive and Spark jobs creation.
- Monitor, support and troubleshoot Airflow and Spark jobs.
- Hands on experience on Bitbucket and GitHub.
- Hands on experience using Jenkins.
- Creating Spark jobs efficiently with data cache, coalesce, repartition methods to improve performance.
- Good experience on Agile.
Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake
Confidential, Hillsboro, OR
Data Engineer/Snowflake Engineer
Responsibilities:
- Develop, monitoring and troubleshoot Hive & Spark on AWS EMR.
- Working on ingesting data from Teradata to snowflake.
- Snowflake user support.
- Co-ordinated with report developers to refresh tableau data sources.
- Using GitHub/Bit Bucket as code repository.
- Sending airflow production jobs status report to stakeholders.
- Update confluence page with job status.
- Building Jenkin pipelines as part of DevOps model.
- Developed Canary queries and integrated with MCF to find issues before business users.
Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, snowflake
Confidential, Beaverton, OR
Data Engineer
Responsibilities:
- Monitoring and troubleshoot Hive & Spark on AZURE.
- Developed wrapper scripts in Shell and Python.
- Co-ordinated with report developers to refresh tableau data sources.
- Using GitHub/Bit Bucket as code repository.
- Sending airflow production jobs status report to stakeholders.
- Prepare production jobs daily status report.
- Prepare failed jobs status report and reason for failures.
- Update confluence page with job status.
- Using Confluence for Project Documentation.
- Building Jenkin pipelines as part of DevOps model.
- Developed Canary queries and integrated with MCF to find issues before business users.
Confidential, Beaverton, OR
Product support Analyst
Responsibilities:
- Collaborate with users and technical teams to understand requirements.
- Monitoring and troubleshoot Hive & Spark on AWS EMR.
- Scheduled daily jobs in Airflow.
- Developed DAG’s for daily production run.
- Developed wrapper scripts in Shell and Python.
- Co-ordinated with report developers to refresh tableau data sources.
- Using GitHub/Bit Bucket as code repository.
- Helped the tableau team integrate with HiveServer2 for reporting.
- Prepare production jobs daily status report.
- Prepare failed jobs status report and reason for failures.
- Update confluence page with job status.
- Using Confluence for Project Documentation.
Environment: Spark, AWS, EMR, S3, Hive, Python, airflow, Teradata, Sqoop, Snowflake
Confidential, MNHadoop Developer
Responsibilities:
- ETL process implemented in Spark using data frames on EMR.
- Each etl process implemented with bop and eop quantities to avoid multiple query executions while reporting.
- Creating Airflow DAGs for backfill, history and incremental loads.
- Creating Spark jobs efficiently with data cache, coalesce, repartition methods to improve performance.
- Creating HDFS staging tables to improve throughput to write into S3 parquet files.
- Developing Unit test functions in Spark.
- Building Jenkin pipelines as part of DevOps model.
- Developed PyUnit framework for unit testing.
- Developed Canary queries and integrated with MCF to find issues before business users.
Environment: CDH, Spark, AWS, EMR, S3, Hive, Python, Zookeeper, Oracle, Teradata, Sqoop
Confidential
Informatica developer
Responsibilities:
- Handled batch processing for export/import functionalities into various heterogeneous systems.
- Design and development of Informatica mappings to move data from SAP to Teradata.
- Created unit test cases for testing Informatica mappings.
- Involved in batch design.
- Migrating code from one environment to other using labels query and deployment group.
- Helped in setting up development assurance process.
- Created complex sql’s for data validation as well as for data analysis.
- Provide database administration for testing and development database servers.
- Creation of database objects like tables, indexes, stored procedures and views etc.
- Wrote extensive SQL and PL/SQL Procedures, Functions.
- Monitored database activities, performance, and size. Increase the size if required and identified and resolved performance issues.
- Tuned & Optimized SQL queries using indexes and table partitioning.
- Analyzed the database tables and indexes and then rebuilt the indexes if there is fragmentation in indexes.
- Planned Backup and Recovery using Cold Backup. Responsible for building & deploying the DB Utilities required in QA Environment
- Built the Client Restore/back-up utility for QA Clients to enable QA Automation Executions by bringing the clients to desired App State.
- Responsible for applying the DB dumps in to Development Environment from Production (typically based on Customer Tickets/requests).
- Responsible for applying DB Scripts/Patches as part of Hot Fixes/Version Migrations.
Environment: Teradata R13, Informatica 9.1, unix, SAP BW, Oracle