Senior Data Engineer Resume
2.00/5 (Submit Your Rating)
San Jose, CA
SUMMARY
- 7+ years of commendable experience in the IT industry with proven expertise in Data Lakes, Platform, Big Data Analytics, and Development.
- Having working experience on AWS Cloud Platform, Cloudera Data Platform using VMware Player, Cent OS 7 Linux environment. Strong experience on Hadoop distributions Cloudera and Hortonworks.
- Experience in installation, configuration, Management, supporting and monitoringHadoop cluster using various distributions such as Apache Hadoop, Spark, Cloudera and AWS Services.
- Worked on EC2, EMR, Data pipeline, MSK, AWS Glue, CloudWatch, Lambda, Athena and Sage maker.
- Worked on Managing cluster with 150 nodes, Installing Ambari HDP and HDF and one responsible for upgrading it whenever required.
- Worked with both Scala and Python, Created frameworks for processing data pipelines through Spark, SparkSQL.
- Experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experience with GIT, Git Bash, and bit bucket.
- Experience working with Build tools like Maven and SBT.
- Experienced in both Waterfall and Agile Development (SCRUM) methodologies.
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
PROFESSIONAL EXPERIENCE
Confidential
Senior Data Engineer
Responsibilities:
- Writing pyspark scripts for daily workloads based on the business requirements.
- Scheduling them in Autosys in different environments DEV, SIT, UAT, PROD .
- Worked on Ec2, S3, EMR, Data pipeline, RDS, and Redshift.
- Worked on Databricks and snowflake data pipelines creation and scheduling.
- Creating Data pipelines using EMR and data pipeline services on AWS
- Worked on Monitoring and alerting using CloudWatch and CloudTrail AWS services.
- Fine tuning the spark jobs and Hive jobs whenever required using different performance optimizing techniques.
- Committing the code repos to Git and merge them whenever required.
- Constant testing and architecture improvements to increase the performance and time.
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- Worked as a Spark Expert and performance Optimizer.
- Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Well experienced in handling Data Skewness in Spark-SQL.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Designed and Maintained Tez workflows to manage the flow of jobs in the cluster.
- Worked with the testing teams to fix bugs and ensure smooth and error-free code.
- Involved in preparation of docs like Functional Specification document and Deployment Instruction documents.
- Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Confidential
Data Engineer
Responsibilities:
- Working as a Specialist in Data Engineering team, one who is responsible for daily production jobs and fine tuning them whenever required.
- Taking care of the Cloudera platform and perform necessary actions whenever required.
- Working on both batch as well as real-time streaming jobs.
- Implemented pyspark framework, using RDD’s as well as Dataframes.
- Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
- Handled importing data from different data sources into HDFS using Sqoop/Nifi and performing transformations using Hive, Spark and then loading data into final application layer databases.
- Responsible for quality and production deployments.