We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • 6+ years of IT Industry and 2+ years of experience in Big Data Application Development with analyzing, designing, developing, testing and deploying EDW application and data sources for the application as per the requirement.
  • Great working exposure on Apache Spark, Scala, Hive, Hadoop, Pig, HBase, Sqoop, python.
  • Good exposure in executing multiple end - to-end Big Data development projects.
  • Hands on experience in developing Spark applications by using Scala language.
  • Experience in troubleshooting of jobs and addressing production issues like data issues, environment issues, performance tuning and enhancements.
  • Experience in scheduling sequence and parallel jobs using Unix scripts, scheduling tools like Autosys.
  • Good understanding of all the phases of Software Development Life Cycle (SDLC), including requirements gathering, specification documentation, design, construction, testing and maintenance.
  • Extensive experience in Unit Testing, Functional Testing, User Acceptance Testing (UAT) and Performance Testing.
  • Experience in AWS services like S3, Glue, Redshift, Athena etc., creating glue jobs, debugging, monitoring, fixing job failures.
  • Good hands-on experience on Open Text Stream Serve.
  • Experienced in SDLC with focus on design and development using Agile, Scrum methodology.
  • Strong communication, organizational, inter-personal, problem solving and analytical skills, proactive and hardworking with the ability to meet tight schedules.

TECHNICAL SKILLS:

Big Data Technologies: Spark, Hadoop, Pig, Sqoop, Hive, HBase, Cassandra

Programming Languages: Scala, Python, Shell Scripting

NoSQL Database: Hive (HQL), HBase, Cassandra

SQL Database: MySQL, ORACLE

Operating Systems: Linux, Windows

Code Checkin Tools: GIT

Could: AWS

OpenText Products: Stream Serve

PROFESSIONAL EXPERIENCE:

Confidential

Data Engineer

Environment: AWS, Python, PySpark, SQL, Azure DevOps, GIT

Responsibilities:

  • As a Data Engineer, we are responsible for migrating data from CDL to AWS.
  • Identify the STTM and map the source tables with target tabled by using business use cases.
  • Midstream squad is responsible for creating 11 ESA reports by migrating data using AWS SDLF.
  • Developed AWS glue jobs to migrate the data along with business logic.
  • With PySpark as primary data migration tool, we created the PySpark jobs which are deployed to AWS and glue jobs are created then executed in cloud.
  • Performed Data Quality checks and UAT for the developed PySpark jobs.
  • Performed the end-to-end Development like from Dev environment to Prod environment.
  • The source data is of 200GB and we analyzed the job history and made some code level changes to achieve optimal performance to migrate the data in live environments.
  • Used SparkSQl, Spark optimization techniques like Broadcasting, filtering data at the source level etc.
  • The migrated data was used for Tableau dashboards for the business users.

Confidential

Big Data Developer

Environment: Spark, Scala, HDFS, Shell Scripting, Hive, Autosys

Responsibilities:

  • Worked on new enhancements on top of the existing functionality by using Spark and Scala.
  • Performing data validations for any missed objects during production downtime activities.
  • Need to find the impacted objects during the changes if any and resolve those.
  • For any Data load/ Refresh activities occurs, we need to identify the data load frequency, impact analysis, implementation routing, promoting/ demoting new tables if any.
  • Promoting the new changes to live environments using GDT deployment.
  • Creating/Promoting the required AutoSys jobs to copy the data from File system to Tera Data and Hadoop.

Confidential

Spark Developer

Environment: Spark, Hive, HDFS, Sqoop

Responsibilities:

  • Analyze the requirements to develop the framework.
  • Developed Spark application using Scala API’s to process data in memory distributed manner.
  • Performed transformations and actions by using Spark-core API’s.

We'd love your feedback!