Data Engineer Resume
2.00/5 (Submit Your Rating)
Pittsburg, PA
SUMMARY
- Have 6+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
- Extensive experience in installing, configuring and using ecosystem components like HadoopMapReduce, HDFS, Sqoop, Pig, Hive, Impala & Spark.
- Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
TECHNICAL SKILLS
Programming: Python 3, Scala, PLSQL, PowerShell scripting
Data Technologies: SQL, Apache Spark, HDFS, Sqoop, Flume, Kafka, MangoDB
Machine learning: Supervised/Unsupervised Learning, Feature Engineering, Text Analysis
Tools: & IDES: PySpark, MySQL, Talend Studio, AWS(Redshift, S3, EMR, EC2)
BI tools: Tableau
PROFESSIONAL EXPERIENCE
Confidential, Pittsburg PA
Data Engineer
Responsibilities:
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Used Sqoop to import datafrom Relational Databases like MySQL, Oracle .
- Involved in importing structured and unstructured datainto HDFS .
- Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
- Worked on Kafka to import real-time weblogs and ingested the datato Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Worked on Hive to implement Web Interfacing and stored the data in Hive tables .
- Migrated Map Reduce programs into Spark transformations using Spark and Scala.
- Experienced with Spark Context, Spark-SQL, Spark YARN .
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into a spark for faster processing of data .
- Loaded the data into Spark RDD and do in-memory dataComputation to generate the Output response.
- Implemented dataquality checks using Spark Streaming and arranged passable and bad flags on the data .
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS .
- Involved in dataQuerying and Summarization using Hive and Pig and created UDF's, UDAF's and UDTF's.
- Implemented Sqoop jobs for large dataexchanges between RDBMS and Hive clusters.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs .
- Developed traits and case classes etc. in Scala.
- Developed Spark scripts using Scala shell commands as per the business requirement.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Experienced in loading the real-time datato the NoSQL database like Cassandra.
- Well versed in using dataManipulations, Compactions, in Cassandra.
- Experience in retrieving the datapresent in Cassandra cluster by running queries in CQL (Cassandra Query Language).
- Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2 ) and Amazon Simple Storage Service (S3).
- Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
- Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
- Configured workflows that involve Hadoop actions using Oozie .
- Used Python for pattern matching in build logs to format warnings and errors.
- Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint
Confidential
Data Engineer
Responsibilities:
- Implemented Spring boot microservices to process the messages into the Kafka cluster setup
- Closely worked with Kafka Admin team to set up Kafka cluster setup on the QA and Production environments.
- Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
- Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark
- Context, Spark -SQL , Data Frame, PairRDD's, Spark YARN .
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop .
Confidential
Data Engineer
Responsibilities:
- Involved in processing of audit data from a FTP location and pushing into HDFS using Flume and process the data using MapReduc e and PIG Job and extensively working in writing Pig data transform scripts to transform data from several data sources into forming baseline data
- Created and worked Sqoop jobs with incremental load from MySQL to populate Hive External tables and use of Partitions, buckets using Hive .
- Creating S3 buckets also managing policies for S3 buckets and utilized S3 bucket for storage and backup on AWS.
- Worked on EC2
- Involved in development and maintenance of Oracle database using SQL for infra-net management system
- Worked on python Dataframes(Numpy, Pandas, Matplotlib/Seaborn ) to work with data analyst team to describe datasets and create a report on it.
- Worked on interactive dashboards for building story and presenting using Tableau.