We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

TX

SUMMARY

  • 5 years of experience in developing application using technologies like Hadoop, Spark, Hive and Sqoop
  • Experience in developing Spark jobs in Python to process huge data as per the requirement
  • Worked on PySpark and Spark - shell scripts for processing large dataset
  • Experienced in analyzing large datasets and understood the concept of finding patterns and insights within structured, semi structured and unstructured data
  • Experience in designing and developing applications in Spark using Python to compare the performance of Spark with Hive and SQL
  • Experience with Google Cloud Platform services like Data Proc and Buckets to Plan, Configure, Deploy and Operate a cloud solution
  • Worked on performing reads/writes from Spark to MongoDB and vice versa
  • Have good experience in working with the Spark ecosystem using Spark SQL and Scala queries on different formats like Text file, CSV, Avro, Sequence file and XML files
  • Good in querying mongo and wrote a program to convert the mongo collections into hive tables
  • Experienced in cluster planning, designing, deploying, performance tuning and monitoring Hadoop ecosystem
  • Hadoop performed advanced analytical application by making use of Spark with Hive and SQL
  • Excellent Programming skills at a higher level of abstraction using Scala and Python
  • Experience in developing Spark applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL
  • Good experience in optimizing Map Reduce algorithms using Mappers, Reducers and partitioned to deliver the best results for the large datasets
  • In depth understanding/knowledge of Hadoop Architecture
  • Experienced in Software Design, Development and Implementation of Client/Server Web based
  • Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced and Independent Decisions
  • Wrote Spark programs to perform data cleansing, transformation, and joins
  • Commitment to development best practices including coding, naming convention, commenting, code modularization, reuse and possesses strong analytical / problem solving skills
  • Decisive problem solver able to execute innovative solutions and process improvement to meet defined business goals
  • Good knowledge on the SDLC process

TECHNICAL SKILLS

Bigdata Technologies: HDFS, Hive, Sqoop, Spark, Zookeeper, Map-Reduce, Spark-SQL

Programming Languages: Scala, Python, HiveQL, PL/SQL

Operating Systems: Windows, Linux

RDBMS: MySQL, SQL Server

NOSQL Databases: MongoDB

Database Tools: SQL Developer, Robo3T, PyCharm

PROFESSIONAL EXPERIENCE

Confidential, TX

Data Engineer

Responsibilities:

  • Develop and Engineer data from various data sources such as SQL Server, MySQL to transform the existing raw metadata to the reporting requirement needs using Spark, Kafka, Python, and Hive
  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Responsible for building scalable distributed data solutions using Spark
  • Involved in creating Hive tables, loading data and writing hive queries which will run internally as Map Reduce jobs
  • Creating Hive external tables and partitioned tables using Hive Index and used HQL to make ease of data analytics.
  • Implemented partitioning, dynamic partition buckets in Hive for optimization of queries
  • Wrote Spark programs to perform data cleansing, transformation and joins.
  • Wrote ETL jobs using Spark and worked on tuning the performance of Hive queries.
  • Design and Developed spark jobs with python to implement end to end data pipeline for batch processing and used Spark UI to monitor job processing
  • Developed Sqoop scripts and Sqoop jobs to inject data from client provided database in batch fashion on incremental basis
  • Wrote python program for Spark transformation of data frames and RDD
  • Worked on storing data in MongoDB collection
  • Analyzed various types of files like ORC, Parquet, json, csv, xml with python in Spark Submit jobs and spark submission executed in Dataproc cluster
  • Experience in development of full life cycle implementation of ETL using SQL Server and helped with designing the Data warehouse by defining Facts, Dimensions, and relationships between them and applied the Corporate Standards in Naming Conventions
  • Worked closely with the QA and Prod support team by providing components, documentation, validation and Knowledge transfer on new projects and debugging on issues

Environment: Hadoop Ecosystem (HDFS, Hive, Sqoop), Spark, Scala, Python, MongoDB, Linux OS, Windows OS, Google Cloud Platform (Data Proc and Buckets)

Confidential

Data Engineer

Responsibilities:

  • Designed, planned, and developed programs to perform automated extract, transform and load data between data sources when working with large data sets.
  • Responsible for building scalable distributed solution using Hadoop
  • Developed and built data pipelines for deploying Hadoop applications and assist the team to manage compliance to the standards with the organization’s Hadoop ecosystem.
  • Transformed, Analyzed and validated large datasets and built automated and optimized pipelines.
  • Experience in Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Worked on Big Data infrastructure for batch processing and real-time processing
  • Responsible for building scalable distributed data solutions using the Hadoop ecosystem
  • Responsible for interpreting the requirements of Big Data Analytic Use Cases and Scenarios and driving the design and implementation of specific data models to ultimately help better business decisions
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data
  • Experienced in working with the spark ecosystem using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Developed Spark scripts by using Python shell commands as per the requirement
  • Worked with Prod support team by providing components, documentation, validation and Knowledge transfer on new projects and debugging on issues

Environment: Hadoop, Hive, Sqoop, HDFS, HBase, MySQL, MongoDB

We'd love your feedback!