Data Engineer Resume Pittsburg PA - Hire IT People

SUMMARY

Have 6+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
Extensive experience in installing, configuring and using ecosystem components like HadoopMapReduce, HDFS, Sqoop, Pig, Hive, Impala & Spark.
Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.

TECHNICAL SKILLS

Programming: Python 3, Scala, PLSQL, PowerShell scripting

Data Technologies: SQL, Apache Spark, HDFS, Sqoop, Flume, Kafka, MangoDB

Machine learning: Supervised/Unsupervised Learning, Feature Engineering, Text Analysis

Tools: & IDES: PySpark, MySQL, Talend Studio, AWS(Redshift, S3, EMR, EC2)

BI tools: Tableau

PROFESSIONAL EXPERIENCE

Confidential, Pittsburg PA

Data Engineer

Responsibilities:

Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
Used Sqoop to import datafrom Relational Databases like MySQL, Oracle .
Involved in importing structured and unstructured datainto HDFS .
Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
Worked on Kafka to import real-time weblogs and ingested the datato Spark Streaming.
Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
Worked on Hive to implement Web Interfacing and stored the data in Hive tables .
Migrated Map Reduce programs into Spark transformations using Spark and Scala.
Experienced with Spark Context, Spark-SQL, Spark YARN .
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into a spark for faster processing of data .
Loaded the data into Spark RDD and do in-memory dataComputation to generate the Output response.
Implemented dataquality checks using Spark Streaming and arranged passable and bad flags on the data .
Implemented Hive Partitioning and Bucketing on the collected data in HDFS .
Involved in dataQuerying and Summarization using Hive and Pig and created UDF's, UDAF's and UDTF's.
Implemented Sqoop jobs for large dataexchanges between RDBMS and Hive clusters.
Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs .
Developed traits and case classes etc. in Scala.
Developed Spark scripts using Scala shell commands as per the business requirement.
Worked on Cloudera distribution and deployed on AWS EC2 Instances.
Experienced in loading the real-time datato the NoSQL database like Cassandra.
Well versed in using dataManipulations, Compactions, in Cassandra.
Experience in retrieving the datapresent in Cassandra cluster by running queries in CQL (Cassandra Query Language).
Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2 ) and Amazon Simple Storage Service (S3).
Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
Configured workflows that involve Hadoop actions using Oozie .
Used Python for pattern matching in build logs to format warnings and errors.
Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint

Confidential

Data Engineer

Responsibilities:

Implemented Spring boot microservices to process the messages into the Kafka cluster setup
Closely worked with Kafka Admin team to set up Kafka cluster setup on the QA and Production environments.
Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark
Context, Spark -SQL , Data Frame, PairRDD's, Spark YARN .
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop .

Confidential

Data Engineer

Responsibilities:

Involved in processing of audit data from a FTP location and pushing into HDFS using Flume and process the data using MapReduc e and PIG Job and extensively working in writing Pig data transform scripts to transform data from several data sources into forming baseline data
Created and worked Sqoop jobs with incremental load from MySQL to populate Hive External tables and use of Partitions, buckets using Hive .
Creating S3 buckets also managing policies for S3 buckets and utilized S3 bucket for storage and backup on AWS.
Worked on EC2
Involved in development and maintenance of Oracle database using SQL for infra-net management system
Worked on python Dataframes(Numpy, Pandas, Matplotlib/Seaborn ) to work with data analyst team to describe datasets and create a report on it.
Worked on interactive dashboards for building story and presenting using Tableau.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Pittsburg, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship