Data Engineer Resume

SUMMARY

Having 3 years of Data engineering experience including cloud (AWS), and on - prem clusters.
Good Knowledge on both ETL and ELT Ingestion frameworks.
Experience in Different ingestion strategies like Full Refresh, Incremental and SCD Type2 .
Exposure to E2E pipeline starting from ingestion to consumption.
Experience in both Batch and Real time data ingestion using Spark & Sqoop.
Worked in both Migration projects like On-prem to Cloud and new initiatives like build features from scratch.
Good Knowledge on translating Business requirement to SQL Queries and building insights.
Experience with ingesting RDBMS systems like MySQL, Postgres and Oracle using Sqoop and Spark JDBC.
Implemented Cleansing, Data Quality check and applying business rules using Shell script and Spark.
Implemented Spark DataFrame transformations to Map business rules and apply actions on top of transformations.
Experience in working with Sensitive data and data encryption.
Written complex SQL queries to build KPI tables using Aggregations, Analytical and Window functions
Strong knowledge on Hadoop YARN architecture and tools like Sqoop, Hive, Spark and Oozie.
Good Knowledge on Git flow and branching Strategy.
Knowledge on Git Ops (CI/CD) pipelines and bit exposure in writing and executing CI/CD pipelines.
Good Knowledge on SDLC life cycle.
Experience with Agile process and worked in both Scrum and Kanban methodologies

TECHNICAL SKILLS

On-Prem: Hadoop (CDH), Hive, YARN, HDFS, MapReduce, Sqoop, Oozie, Spark

AWS: S3, Lambda Functions, EC2, Airflow, Redshift, EMR, Dynamo DB

Databases: MySQL, Postgres, Oracle

Data Warehouse: Hive, Redshift, Snowflake

Messaging: Kafka

No-SQL: HBase

Scheduler: Oozie, Airflow

Languages: Python, pandas, core java, SQL, Shell script

Version Control: Git, Bitbucket, Gitlab

CI/CD: Jenkins, Gitlab

Project Management: Jira, Confluence

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential

Responsibilities:

Understanding Requirements and creating low-level design for the modules.
Writing lambda functions using python for different use cases like creating dynamic dag, Event Triggers
Writing Spark application using Spark Core and SQL for data processing.
Writing SQL queries on redshift to enrich (Join), aggregate and build KPI tables.
Creating Custom UDF functions in redshift.
Creating scripts for infrastructure creation using AWS CLI.
Creating Airflow dags for job Orchestration.
Worked in Migration of Hadoop components (Hive, Sqoop and Spark) on AWS platform.
Created configuration files for building ETL pipelines.
Writing spark code for different ingestion strategies like Full Refresh, SCD Type2, Incremental
Source data analysis and preparing target data model.
Performing Unit test and integration test.
Creating documentation on Confluence and Updating Jira stories on timely basis.

Environment: PySpark, Redshift, Shell script, EMR, EC2, S3, Lambda Function, Python, SQL, AWS-CLI, Confluence.

Big Data Engineer

Confidential

Responsibilities:

Writing Sqoop Jobs to ingest data from RDBMS systems like Oracle,
Create different Hive type of tables including Managed and External tables.
Analyzing the data and Business use cases and creating Data modeling including Partitioning, bucketing and Skewing
Create HQL files to build Aggregated tables, data enrichment and UDF functions.
Writing PySpark Applications to read the HDFS files, Perform data clean up.
Writing Spark code to perform complex transformation using Spark SQL and UDF functions.
Submitting Spark applications in lower environments and analyzing the logs.
Monitoring and analyzing the issues in Production.
Loading Data into Hbase using Saprk.
Writing oozie workflows for orchestrating the jobs and scheduling.
Creating Client specific Data marts to share with the Stakeholders

Environment: Cloudera (CDH), Hadoop, Spark, Sqoop, Hive, HDFS, Oozie, Hbase.