Senior Data Engineer Resume San Jose, CA - Hire IT People

SUMMARY

7+ years of commendable experience in the IT industry with proven expertise in Data Lakes, Platform, Big Data Analytics, and Development.
Having working experience on AWS Cloud Platform, Cloudera Data Platform using VMware Player, Cent OS 7 Linux environment. Strong experience on Hadoop distributions Cloudera and Hortonworks.
Experience in installation, configuration, Management, supporting and monitoringHadoop cluster using various distributions such as Apache Hadoop, Spark, Cloudera and AWS Services.
Worked on EC2, EMR, Data pipeline, MSK, AWS Glue, CloudWatch, Lambda, Athena and Sage maker.
Worked on Managing cluster with 150 nodes, Installing Ambari HDP and HDF and one responsible for upgrading it whenever required.
Worked with both Scala and Python, Created frameworks for processing data pipelines through Spark, SparkSQL.
Experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Experience with GIT, Git Bash, and bit bucket.
Experience working with Build tools like Maven and SBT.
Experienced in both Waterfall and Agile Development (SCRUM) methodologies.
Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

PROFESSIONAL EXPERIENCE

Confidential

Senior Data Engineer

Responsibilities:

Writing pyspark scripts for daily workloads based on the business requirements.
Scheduling them in Autosys in different environments DEV, SIT, UAT, PROD .
Worked on Ec2, S3, EMR, Data pipeline, RDS, and Redshift.
Worked on Databricks and snowflake data pipelines creation and scheduling.
Creating Data pipelines using EMR and data pipeline services on AWS
Worked on Monitoring and alerting using CloudWatch and CloudTrail AWS services.
Fine tuning the spark jobs and Hive jobs whenever required using different performance optimizing techniques.
Committing the code repos to Git and merge them whenever required.
Constant testing and architecture improvements to increase the performance and time.

Confidential, San Jose, CA

Hadoop Developer

Responsibilities:

Worked as a Spark Expert and performance Optimizer.
Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
Well experienced in handling Data Skewness in Spark-SQL.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Designed and Maintained Tez workflows to manage the flow of jobs in the cluster.
Worked with the testing teams to fix bugs and ensure smooth and error-free code.
Involved in preparation of docs like Functional Specification document and Deployment Instruction documents.
Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Confidential

Data Engineer

Responsibilities:

Working as a Specialist in Data Engineering team, one who is responsible for daily production jobs and fine tuning them whenever required.
Taking care of the Cloudera platform and perform necessary actions whenever required.
Working on both batch as well as real-time streaming jobs.
Implemented pyspark framework, using RDD’s as well as Dataframes.
Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
Handled importing data from different data sources into HDFS using Sqoop/Nifi and performing transformations using Hive, Spark and then loading data into final application layer databases.
Responsible for quality and production deployments.