Data Engineer Resume
0/5 (Submit Your Rating)
TECHNICAL SKILLS
Languages: Python| SQL
BigData: Spark, PySpark, Hive, Hadoop.
Cloud: AWS
Databases: SQL
Tools: & others Git | OOPS | Machine Learning | Algorithms | Data Structures | Scrum/Agile | Windows
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Developing Lambda functions to trigger the Confidential which performs the Data Cataloging and masking operations.
- Built spark data pipelines with various optimization techniques using python.
Confidential
Data Engineer
Responsibilities:
- Worked on writing different RDD transformations and actions using Pyspark.
- Developed a POC to handle the patient data files which would be coming into the S3 buckets and written a code that cleanse it and enrich the data and update the previous day file and store the resultant file back into another S3, and the code has been deployed in AWS EC2 that would start up in the specified time and scheduled the script to automate it.
Confidential
Project Engineer
Responsibilities:
- Experienced in working with spark eco system using PySpark and Spark SQL on different formats like Text file, CSV file
- Created Spark jobs using Pyspark to clean and process the large amount of data
- Experience working for EMR cluster in AWS cloud and working with S3.
- Experience in working with Snowflake for exposing data through API for external vendors.
Confidential
Project Engineer
Responsibilities:
- Involved in creating Hive tables, loading, and analyzing data using hive scripts.
- Implemented Partitioning, Dynamic Partitions, Bucketing in Hive.
- Loaded the structured data resulting from MapReduce jobs into the Hive tables.
- Developed mappings using Informatica to load data from sources such as Relational tables into the target system.
- Developed Hive user defined functions to extend the core functionalities.
- Worked on building pipelines using snowflake for extensive data aggregations.