Data Engineer Resume
SUMMARY
- Having 3 years of Data engineering experience including cloud (AWS), and on - prem clusters.
- Good Knowledge on both ETL and ELT Ingestion frameworks.
- Experience in Different ingestion strategies like Full Refresh, Incremental and SCD Type2 .
- Exposure to E2E pipeline starting from ingestion to consumption.
- Experience in both Batch and Real time data ingestion using Spark & Sqoop.
- Worked in both Migration projects like On-prem to Cloud and new initiatives like build features from scratch.
- Good Knowledge on translating Business requirement to SQL Queries and building insights.
- Experience with ingesting RDBMS systems like MySQL, Postgres and Oracle using Sqoop and Spark JDBC.
- Implemented Cleansing, Data Quality check and applying business rules using Shell script and Spark.
- Implemented Spark DataFrame transformations to Map business rules and apply actions on top of transformations.
- Experience in working with Sensitive data and data encryption.
- Written complex SQL queries to build KPI tables using Aggregations, Analytical and Window functions
- Strong knowledge on Hadoop YARN architecture and tools like Sqoop, Hive, Spark and Oozie.
- Good Knowledge on Git flow and branching Strategy.
- Knowledge on Git Ops (CI/CD) pipelines and bit exposure in writing and executing CI/CD pipelines.
- Good Knowledge on SDLC life cycle.
- Experience with Agile process and worked in both Scrum and Kanban methodologies
TECHNICAL SKILLS
On-Prem: Hadoop (CDH), Hive, YARN, HDFS, MapReduce, Sqoop, Oozie, Spark
AWS: S3, Lambda Functions, EC2, Airflow, Redshift, EMR, Dynamo DB
Databases: MySQL, Postgres, Oracle
Data Warehouse: Hive, Redshift, Snowflake
Messaging: Kafka
No-SQL: HBase
Scheduler: Oozie, Airflow
Languages: Python, pandas, core java, SQL, Shell script
Version Control: Git, Bitbucket, Gitlab
CI/CD: Jenkins, Gitlab
Project Management: Jira, Confluence
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential
Responsibilities:
- Understanding Requirements and creating low-level design for the modules.
- Writing lambda functions using python for different use cases like creating dynamic dag, Event Triggers
- Writing Spark application using Spark Core and SQL for data processing.
- Writing SQL queries on redshift to enrich (Join), aggregate and build KPI tables.
- Creating Custom UDF functions in redshift.
- Creating scripts for infrastructure creation using AWS CLI.
- Creating Airflow dags for job Orchestration.
- Worked in Migration of Hadoop components (Hive, Sqoop and Spark) on AWS platform.
- Created configuration files for building ETL pipelines.
- Writing spark code for different ingestion strategies like Full Refresh, SCD Type2, Incremental
- Source data analysis and preparing target data model.
- Performing Unit test and integration test.
- Creating documentation on Confluence and Updating Jira stories on timely basis.
Environment: PySpark, Redshift, Shell script, EMR, EC2, S3, Lambda Function, Python, SQL, AWS-CLI, Confluence.
Big Data Engineer
Confidential
Responsibilities:
- Writing Sqoop Jobs to ingest data from RDBMS systems like Oracle,
- Create different Hive type of tables including Managed and External tables.
- Analyzing the data and Business use cases and creating Data modeling including Partitioning, bucketing and Skewing
- Create HQL files to build Aggregated tables, data enrichment and UDF functions.
- Writing PySpark Applications to read the HDFS files, Perform data clean up.
- Writing Spark code to perform complex transformation using Spark SQL and UDF functions.
- Submitting Spark applications in lower environments and analyzing the logs.
- Monitoring and analyzing the issues in Production.
- Loading Data into Hbase using Saprk.
- Writing oozie workflows for orchestrating the jobs and scheduling.
- Creating Client specific Data marts to share with the Stakeholders
Environment: Cloudera (CDH), Hadoop, Spark, Sqoop, Hive, HDFS, Oozie, Hbase.