We provide IT Staff Augmentation Services!

Data Engineer Resume

Morrisville, NC

SUMMARY:

  • Overall 4+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects. Around 3+ years of experience in Big Data in implementing end - to-end Hadoop solutions.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, Kafka, Storm.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig and Solr.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
  • Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMWare and Amazon Web Services ( AWS ) using an EC2 instances.
  • Experience configuring and working on AWS EMR Instances.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Usage of different Talend Hadoop Component like Hive, Pig and Sqoop.
  • Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Good understanding of NoSQL databases like Cassandra and HBase.
  • Experience on developing REST Services.
  • Written Apache Spark streaming API on Big Data distribution in the active cluster environment.
  • Expertise in core Java, J2EE, Multithreading, JDBC, spring, Shell Scripting and proficient in using Java API’s for application development.
  • Experience in Scrapping the Data Using Kapow and Blue Prism.
  • Used Teradata SQL Assistant to view HIVE tables.
  • Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedure’s PL/ SQL.
  • Experience coding and testing the Crawlers, Standardization, Normalization, Load, Extract and AVRO models to filter/massage the data and its validation.
  • Excellent domain knowledge in Insurance and Finance. Excellent interpersonal, analytical, verbal and written communications skills.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Worked on various operating systems like UNIX/Linux, MAC OS and Windows.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, SparkSQL, HBase, Kafka, Storm, Sqoop, Spark Streaming, Spark SQL, Oozie, Zookeeper.

Cloud: AWS

IDE Tools: Eclipse, IntelliJ IDEA.

Programming languages: Java, Linux shell scripts, Scala.

Web Frameworks: Spring, Hibernate.

Scrapping Tools: Kapow, Blue Prism.

Database: Oracle, MySQL, HBase, Cassandra.

Web Technologies: HTML, XML, JavaScript.

PROFESSIONAL SKILLS:

Confidential, Morrisville, NC

Data Engineer

Responsibilities:

  • Responsible for designing and implementing End to End data pipeline using Big Data tools including HDFS, Hive, Sqoop, HBase, Kafka & Spark.
  • Extracting, Parsing, Cleaning and ingesting the incoming web feed data and server logs into the HDFS by handling structured and unstructured data.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems into Hadoop.
  • Used scrapping tools like Kapow and Blue Prism to scrap the data from Third party website and store the data in S3.
  • Implemented Partitioning, Dynamic Partitioning, Buckets in HIVE.
  • Developed spark applications for data transformations and loading into HDFS using RDD, DataFrames and Datasets.
  • Developed shell scripts for ingesting the data to HDFS and partitioned the data over Hive.
  • Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
  • Created Data frames using SparkSQL and worked on loading the data into NoSQL Database.
  • Worked on SparkSQL and created data warehouse for both Spark and hive.
  • Created view table in Impala on the same Hive table for querying the data.
  • Used Spark SQL to process the huge amount of structured data.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Experienced with batch processing of Data sources using Apache Spark.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Implemented rapid-provisioning and life-cycle management for using Amazon EC2 and custom Bash scripts.

Environment: Hadoop (CDH 5.12), HDFS, Spark, Spark SQL, Git, AWS, Kafka. Hive, Java, Scala, HBase, Maven, UNIX Shell Scripting, Kapow, Blue Prism.

Confidential, Memphis, TN

Data Engineer

Responsibilities:

  • Designed solution for Streaming data applications using Apache Storm.
  • Extensively worked on Kafka and Storm integration to score the PMML (Predictive Model Markup Language) Models.
  • Applied transformation Using Spark on the dataset.
  • Created HBase tables to store data depending on column families.
  • Extensively written Hive queries for data analysis to meet the business requirement.
  • Involved in adding and decommissioning the data nodes.
  • Responsible for analyzing using Spark SQL queries result with Hive queries.
  • Involved in requirement and design phase to implementing real time streaming using Kafka and Storm.
  • Using the Maven for the deployments and Processed structured, semi structured such as XML and unstructured data as well.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Implemented rapid-provisioning and life-cycle management for using Amazon EC2 and custom Bash scripts.
  • Experience in developing/consuming Web Services (REST, SOAP, JSON) and APIs (Service-oriented architectures).
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.

Environment: Hadoop (HDP 2.5.3), HDFS, Spark, Spark SQL, Git, Storm, AWS, Kafka. Hive, Java, Scala, HBase, Maven, UNIX Shell Scripting, Ranger, Kerberos, Cassandra, REST API.

Confidential, Boston, MA

Big Data Engineer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Developed simple to complex Map Reduce job using Hive.
  • Used Sqoop extensively to import data from RDMS sources into HDFS. Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
  • Developed Pig UDFs to pre-process data for analysis.
  • Implemented test cases for Spark and Ignite functions using Scala as language.
  • Data was pre-processed and fact tables were created using Hive.
  • Designed and implemented Apache Spark -Streaming Application.
  • Hands on experience with Apache Spark using Scala. Implemented spark solution to enable real time report from Cassandra data.
  • Used Sqoop to import the data on to Cassandra tables from different relational databases like Oracle, MySQL.
  • Automated all the jobs from pulling data from databases to loading data into SQL server using shell scripts.
  • Created databases table, tables and views in Hive QL.
  • Develop a RESTful API to provide access to data in HBase.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Wrote pig UDF’s.
  • Wrote unit test cases using MR Unit.
  • Fix the code review comments; Build the Jenkins and support for the code deployment into the production. Fix the postproduction defects to perform the Map/Reduce code to work as expected.

Environment: Hadoop (MAPR 5.0), HDFS, Spark, Map Reduce, Cassandra, Jira, Hive, Sqoop, HBase, Java, UNIX Shell Scripting.

Confidential Roswell, GA

Hadoop/Java Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Load and transform large sets of structured, semi structured and unstructured data with map reduce and pig.
  • Wrote pig UDF’s.
  • Developed HIVE queries for the analysts.
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR jobs and PIG/Hive using Oozie (Work Flow management).
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.

Environment: Hadoop (CDH 4), HDFS, Map Reduce, Hive, Java, Pig, Sqoop, Oozie, REST Web Services, HBase, UNIX Shell Scripting.

Confidential

Hadoop Developer

Responsibilities:

  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Involved in unit testing using MR unit for Map Reduce jobs.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured data.
  • Responsible to manage data coming from different sources.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, HBase, Sqoop, LINUX, Java

Confidential

Java Developer

Responsibilities:

  • Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
  • Involved in Daily Scrum meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
  • Experience in writing PL/SQL Stored procedures, Functions, Triggers, Oracle reports and Complex SQL’s.
  • Designed and developed entire application implementing MVC Architecture.
  • Developed frontend of application using BootStrap (Model, View, and Controller), Java Script, and Angular.js framework.
  • Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security.
  • Proactively found the issues and resolved them.
  • Established efficient communication between teams to resolving the issues.
  • Gave an innovative for logging for all interdepends application.
  • Successfully delivered all product deliverables that resulted with zero defects.

Environment: Java, Junit, MySQL, Spring, Struts, Web Services (SOAP, RESTFUL 4.0), Java Script.

Hire Now