Data Engineer Resume

PROFESSIONAL SUMMARY:

6+ years of IT Industry and 2+ years of experience in Big Data Application Development with analyzing, designing, developing, testing and deploying EDW application and data sources for the application as per the requirement.
Great working exposure on Apache Spark, Scala, Hive, Hadoop, Pig, HBase, Sqoop, python.
Good exposure in executing multiple end - to-end Big Data development projects.
Hands on experience in developing Spark applications by using Scala language.
Experience in troubleshooting of jobs and addressing production issues like data issues, environment issues, performance tuning and enhancements.
Experience in scheduling sequence and parallel jobs using Unix scripts, scheduling tools like Autosys.
Good understanding of all the phases of Software Development Life Cycle (SDLC), including requirements gathering, specification documentation, design, construction, testing and maintenance.
Extensive experience in Unit Testing, Functional Testing, User Acceptance Testing (UAT) and Performance Testing.
Experience in AWS services like S3, Glue, Redshift, Athena etc., creating glue jobs, debugging, monitoring, fixing job failures.
Good hands-on experience on Open Text Stream Serve.
Experienced in SDLC with focus on design and development using Agile, Scrum methodology.
Strong communication, organizational, inter-personal, problem solving and analytical skills, proactive and hardworking with the ability to meet tight schedules.

TECHNICAL SKILLS:

Big Data Technologies: Spark, Hadoop, Pig, Sqoop, Hive, HBase, Cassandra

Programming Languages: Scala, Python, Shell Scripting

NoSQL Database: Hive (HQL), HBase, Cassandra

SQL Database: MySQL, ORACLE

Operating Systems: Linux, Windows

Code Checkin Tools: GIT

Could: AWS

OpenText Products: Stream Serve

PROFESSIONAL EXPERIENCE:

Confidential

Data Engineer

Environment: AWS, Python, PySpark, SQL, Azure DevOps, GIT

Responsibilities:

As a Data Engineer, we are responsible for migrating data from CDL to AWS.
Identify the STTM and map the source tables with target tabled by using business use cases.
Midstream squad is responsible for creating 11 ESA reports by migrating data using AWS SDLF.
Developed AWS glue jobs to migrate the data along with business logic.
With PySpark as primary data migration tool, we created the PySpark jobs which are deployed to AWS and glue jobs are created then executed in cloud.
Performed Data Quality checks and UAT for the developed PySpark jobs.
Performed the end-to-end Development like from Dev environment to Prod environment.
The source data is of 200GB and we analyzed the job history and made some code level changes to achieve optimal performance to migrate the data in live environments.
Used SparkSQl, Spark optimization techniques like Broadcasting, filtering data at the source level etc.
The migrated data was used for Tableau dashboards for the business users.

Confidential

Big Data Developer

Environment: Spark, Scala, HDFS, Shell Scripting, Hive, Autosys

Responsibilities:

Worked on new enhancements on top of the existing functionality by using Spark and Scala.
Performing data validations for any missed objects during production downtime activities.
Need to find the impacted objects during the changes if any and resolve those.
For any Data load/ Refresh activities occurs, we need to identify the data load frequency, impact analysis, implementation routing, promoting/ demoting new tables if any.
Promoting the new changes to live environments using GDT deployment.
Creating/Promoting the required AutoSys jobs to copy the data from File system to Tera Data and Hadoop.

Confidential

Spark Developer

Environment: Spark, Hive, HDFS, Sqoop

Responsibilities:

Analyze the requirements to develop the framework.
Developed Spark application using Scala API’s to process data in memory distributed manner.
Performed transformations and actions by using Spark-core API’s.