Data Engineer Resume

TECHNICAL SKILLS

Languages: Python| SQL

BigData: Spark, PySpark, Hive, Hadoop.

Cloud: AWS

Databases: SQL

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

Developing Lambda functions to trigger the Confidential which performs the Data Cataloging and masking operations.
Built spark data pipelines with various optimization techniques using python.

Confidential

Data Engineer

Responsibilities:

Worked on writing different RDD transformations and actions using Pyspark.
Developed a POC to handle the patient data files which would be coming into the S3 buckets and written a code that cleanse it and enrich the data and update the previous day file and store the resultant file back into another S3, and the code has been deployed in AWS EC2 that would start up in the specified time and scheduled the script to automate it.

Confidential

Project Engineer

Responsibilities:

Experienced in working with spark eco system using PySpark and Spark SQL on different formats like Text file, CSV file
Created Spark jobs using Pyspark to clean and process the large amount of data
Experience working for EMR cluster in AWS cloud and working with S3.
Experience in working with Snowflake for exposing data through API for external vendors.

Confidential

Project Engineer

Responsibilities:

Involved in creating Hive tables, loading, and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Bucketing in Hive.
Loaded the structured data resulting from MapReduce jobs into the Hive tables.
Developed mappings using Informatica to load data from sources such as Relational tables into the target system.
Developed Hive user defined functions to extend the core functionalities.
Worked on building pipelines using snowflake for extensive data aggregations.