Data Engineer Resume
3.00/5 (Submit Your Rating)
PROFESSIONAL SUMMARY:
- 6+ years of IT Industry and 2+ years of experience in Big Data Application Development with analyzing, designing, developing, testing and deploying EDW application and data sources for the application as per the requirement.
- Great working exposure on Apache Spark, Scala, Hive, Hadoop, Pig, HBase, Sqoop, python.
- Good exposure in executing multiple end - to-end Big Data development projects.
- Hands on experience in developing Spark applications by using Scala language.
- Experience in troubleshooting of jobs and addressing production issues like data issues, environment issues, performance tuning and enhancements.
- Experience in scheduling sequence and parallel jobs using Unix scripts, scheduling tools like Autosys.
- Good understanding of all the phases of Software Development Life Cycle (SDLC), including requirements gathering, specification documentation, design, construction, testing and maintenance.
- Extensive experience in Unit Testing, Functional Testing, User Acceptance Testing (UAT) and Performance Testing.
- Experience in AWS services like S3, Glue, Redshift, Athena etc., creating glue jobs, debugging, monitoring, fixing job failures.
- Good hands-on experience on Open Text Stream Serve.
- Experienced in SDLC with focus on design and development using Agile, Scrum methodology.
- Strong communication, organizational, inter-personal, problem solving and analytical skills, proactive and hardworking with the ability to meet tight schedules.
TECHNICAL SKILLS:
Big Data Technologies: Spark, Hadoop, Pig, Sqoop, Hive, HBase, Cassandra
Programming Languages: Scala, Python, Shell Scripting
NoSQL Database: Hive (HQL), HBase, Cassandra
SQL Database: MySQL, ORACLE
Operating Systems: Linux, Windows
Code Checkin Tools: GIT
Could: AWS
OpenText Products: Stream Serve
PROFESSIONAL EXPERIENCE:
Confidential
Data Engineer
Environment: AWS, Python, PySpark, SQL, Azure DevOps, GIT
Responsibilities:
- As a Data Engineer, we are responsible for migrating data from CDL to AWS.
- Identify the STTM and map the source tables with target tabled by using business use cases.
- Midstream squad is responsible for creating 11 ESA reports by migrating data using AWS SDLF.
- Developed AWS glue jobs to migrate the data along with business logic.
- With PySpark as primary data migration tool, we created the PySpark jobs which are deployed to AWS and glue jobs are created then executed in cloud.
- Performed Data Quality checks and UAT for the developed PySpark jobs.
- Performed the end-to-end Development like from Dev environment to Prod environment.
- The source data is of 200GB and we analyzed the job history and made some code level changes to achieve optimal performance to migrate the data in live environments.
- Used SparkSQl, Spark optimization techniques like Broadcasting, filtering data at the source level etc.
- The migrated data was used for Tableau dashboards for the business users.
Confidential
Big Data Developer
Environment: Spark, Scala, HDFS, Shell Scripting, Hive, Autosys
Responsibilities:
- Worked on new enhancements on top of the existing functionality by using Spark and Scala.
- Performing data validations for any missed objects during production downtime activities.
- Need to find the impacted objects during the changes if any and resolve those.
- For any Data load/ Refresh activities occurs, we need to identify the data load frequency, impact analysis, implementation routing, promoting/ demoting new tables if any.
- Promoting the new changes to live environments using GDT deployment.
- Creating/Promoting the required AutoSys jobs to copy the data from File system to Tera Data and Hadoop.
Confidential
Spark Developer
Environment: Spark, Hive, HDFS, Sqoop
Responsibilities:
- Analyze the requirements to develop the framework.
- Developed Spark application using Scala API’s to process data in memory distributed manner.
- Performed transformations and actions by using Spark-core API’s.