We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

2.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY

  • 7+ years of commendable experience in the IT industry with proven expertise in Data Lakes, Platform, Big Data Analytics, and Development.
  • Having working experience on AWS Cloud Platform, Cloudera Data Platform using VMware Player, Cent OS 7 Linux environment. Strong experience on Hadoop distributions Cloudera and Hortonworks.
  • Experience in installation, configuration, Management, supporting and monitoringHadoop cluster using various distributions such as Apache Hadoop, Spark, Cloudera and AWS Services.
  • Worked on EC2, EMR, Data pipeline, MSK, AWS Glue, CloudWatch, Lambda, Athena and Sage maker.
  • Worked on Managing cluster with 150 nodes, Installing Ambari HDP and HDF and one responsible for upgrading it whenever required.
  • Worked with both Scala and Python, Created frameworks for processing data pipelines through Spark, SparkSQL.
  • Experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience with GIT, Git Bash, and bit bucket.
  • Experience working with Build tools like Maven and SBT.
  • Experienced in both Waterfall and Agile Development (SCRUM) methodologies.
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

PROFESSIONAL EXPERIENCE

Confidential

Senior Data Engineer

Responsibilities:

  • Writing pyspark scripts for daily workloads based on the business requirements.
  • Scheduling them in Autosys in different environments DEV, SIT, UAT, PROD .
  • Worked on Ec2, S3, EMR, Data pipeline, RDS, and Redshift.
  • Worked on Databricks and snowflake data pipelines creation and scheduling.
  • Creating Data pipelines using EMR and data pipeline services on AWS
  • Worked on Monitoring and alerting using CloudWatch and CloudTrail AWS services.
  • Fine tuning the spark jobs and Hive jobs whenever required using different performance optimizing techniques.
  • Committing the code repos to Git and merge them whenever required.
  • Constant testing and architecture improvements to increase the performance and time.

Confidential, San Jose, CA

Hadoop Developer

Responsibilities:

  • Worked as a Spark Expert and performance Optimizer.
  • Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
  • Well experienced in handling Data Skewness in Spark-SQL.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Designed and Maintained Tez workflows to manage the flow of jobs in the cluster.
  • Worked with the testing teams to fix bugs and ensure smooth and error-free code.
  • Involved in preparation of docs like Functional Specification document and Deployment Instruction documents.
  • Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Confidential

Data Engineer

Responsibilities:

  • Working as a Specialist in Data Engineering team, one who is responsible for daily production jobs and fine tuning them whenever required.
  • Taking care of the Cloudera platform and perform necessary actions whenever required.
  • Working on both batch as well as real-time streaming jobs.
  • Implemented pyspark framework, using RDD’s as well as Dataframes.
  • Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
  • Handled importing data from different data sources into HDFS using Sqoop/Nifi and performing transformations using Hive, Spark and then loading data into final application layer databases.
  • Responsible for quality and production deployments.

We'd love your feedback!