We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Pittsburg, PA

SUMMARY

  • Have 6+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
  • Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
  • Extensive experience in installing, configuring and using ecosystem components like HadoopMapReduce, HDFS, Sqoop, Pig, Hive, Impala & Spark.
  • Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.

TECHNICAL SKILLS

Programming: Python 3, Scala, PLSQL, PowerShell scripting

Data Technologies: SQL, Apache Spark, HDFS, Sqoop, Flume, Kafka, MangoDB

Machine learning: Supervised/Unsupervised Learning, Feature Engineering, Text Analysis

Tools: & IDES: PySpark, MySQL, Talend Studio, AWS(Redshift, S3, EMR, EC2)

BI tools: Tableau

PROFESSIONAL EXPERIENCE

Confidential, Pittsburg PA

Data Engineer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import datafrom Relational Databases like MySQL, Oracle .
  • Involved in importing structured and unstructured datainto HDFS .
  • Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real-time weblogs and ingested the datato Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables .
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN .
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into a spark for faster processing of data .
  • Loaded the data into Spark RDD and do in-memory dataComputation to generate the Output response.
  • Implemented dataquality checks using Spark Streaming and arranged passable and bad flags on the data .
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS .
  • Involved in dataQuerying and Summarization using Hive and Pig and created UDF's, UDAF's and UDTF's.
  • Implemented Sqoop jobs for large dataexchanges between RDBMS and Hive clusters.
  • Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs .
  • Developed traits and case classes etc. in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Experienced in loading the real-time datato the NoSQL database like Cassandra.
  • Well versed in using dataManipulations, Compactions, in Cassandra.
  • Experience in retrieving the datapresent in Cassandra cluster by running queries in CQL (Cassandra Query Language).
  • Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2 ) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
  • Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
  • Configured workflows that involve Hadoop actions using Oozie .
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint

Confidential

Data Engineer

Responsibilities:

  • Implemented Spring boot microservices to process the messages into the Kafka cluster setup
  • Closely worked with Kafka Admin team to set up Kafka cluster setup on the QA and Production environments.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
  • Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark
  • Context, Spark -SQL , Data Frame, PairRDD's, Spark YARN .
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop .

Confidential

Data Engineer

Responsibilities:

  • Involved in processing of audit data from a FTP location and pushing into HDFS using Flume and process the data using MapReduc e and PIG Job and extensively working in writing Pig data transform scripts to transform data from several data sources into forming baseline data
  • Created and worked Sqoop jobs with incremental load from MySQL to populate Hive External tables and use of Partitions, buckets using Hive .
  • Creating S3 buckets also managing policies for S3 buckets and utilized S3 bucket for storage and backup on AWS.
  • Worked on EC2
  • Involved in development and maintenance of Oracle database using SQL for infra-net management system
  • Worked on python Dataframes(Numpy, Pandas, Matplotlib/Seaborn ) to work with data analyst team to describe datasets and create a report on it.
  • Worked on interactive dashboards for building story and presenting using Tableau.

We'd love your feedback!