We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Having 3 years of Data engineering experience including cloud (AWS), and on - prem clusters.
  • Good Knowledge on both ETL and ELT Ingestion frameworks.
  • Experience in Different ingestion strategies like Full Refresh, Incremental and SCD Type2 .
  • Exposure to E2E pipeline starting from ingestion to consumption.
  • Experience in both Batch and Real time data ingestion using Spark & Sqoop.
  • Worked in both Migration projects like On-prem to Cloud and new initiatives like build features from scratch.
  • Good Knowledge on translating Business requirement to SQL Queries and building insights.
  • Experience with ingesting RDBMS systems like MySQL, Postgres and Oracle using Sqoop and Spark JDBC.
  • Implemented Cleansing, Data Quality check and applying business rules using Shell script and Spark.
  • Implemented Spark DataFrame transformations to Map business rules and apply actions on top of transformations.
  • Experience in working with Sensitive data and data encryption.
  • Written complex SQL queries to build KPI tables using Aggregations, Analytical and Window functions
  • Strong knowledge on Hadoop YARN architecture and tools like Sqoop, Hive, Spark and Oozie.
  • Good Knowledge on Git flow and branching Strategy.
  • Knowledge on Git Ops (CI/CD) pipelines and bit exposure in writing and executing CI/CD pipelines.
  • Good Knowledge on SDLC life cycle.
  • Experience with Agile process and worked in both Scrum and Kanban methodologies

TECHNICAL SKILLS

On-Prem: Hadoop (CDH), Hive, YARN, HDFS, MapReduce, Sqoop, Oozie, Spark

AWS: S3, Lambda Functions, EC2, Airflow, Redshift, EMR, Dynamo DB

Databases: MySQL, Postgres, Oracle

Data Warehouse: Hive, Redshift, Snowflake

Messaging: Kafka

No-SQL: HBase

Scheduler: Oozie, Airflow

Languages: Python, pandas, core java, SQL, Shell script

Version Control: Git, Bitbucket, Gitlab

CI/CD: Jenkins, Gitlab

Project Management: Jira, Confluence

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential

Responsibilities:

  • Understanding Requirements and creating low-level design for the modules.
  • Writing lambda functions using python for different use cases like creating dynamic dag, Event Triggers
  • Writing Spark application using Spark Core and SQL for data processing.
  • Writing SQL queries on redshift to enrich (Join), aggregate and build KPI tables.
  • Creating Custom UDF functions in redshift.
  • Creating scripts for infrastructure creation using AWS CLI.
  • Creating Airflow dags for job Orchestration.
  • Worked in Migration of Hadoop components (Hive, Sqoop and Spark) on AWS platform.
  • Created configuration files for building ETL pipelines.
  • Writing spark code for different ingestion strategies like Full Refresh, SCD Type2, Incremental
  • Source data analysis and preparing target data model.
  • Performing Unit test and integration test.
  • Creating documentation on Confluence and Updating Jira stories on timely basis.

Environment: PySpark, Redshift, Shell script, EMR, EC2, S3, Lambda Function, Python, SQL, AWS-CLI, Confluence.

Big Data Engineer

Confidential

Responsibilities:

  • Writing Sqoop Jobs to ingest data from RDBMS systems like Oracle,
  • Create different Hive type of tables including Managed and External tables.
  • Analyzing the data and Business use cases and creating Data modeling including Partitioning, bucketing and Skewing
  • Create HQL files to build Aggregated tables, data enrichment and UDF functions.
  • Writing PySpark Applications to read the HDFS files, Perform data clean up.
  • Writing Spark code to perform complex transformation using Spark SQL and UDF functions.
  • Submitting Spark applications in lower environments and analyzing the logs.
  • Monitoring and analyzing the issues in Production.
  • Loading Data into Hbase using Saprk.
  • Writing oozie workflows for orchestrating the jobs and scheduling.
  • Creating Client specific Data marts to share with the Stakeholders

Environment: Cloudera (CDH), Hadoop, Spark, Sqoop, Hive, HDFS, Oozie, Hbase.

We'd love your feedback!