We provide IT Staff Augmentation Services!

Big Data Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • Over 5+ Years of IT experience and Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud based platforms.
  • Working knowledge of Hadoop, Hive, Sqoop, Pig, HBase, Kafka & Oozie in real time environment and worked on many modules for performance improvements and data pipe - line designing.
  • Experience in layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive) & Engineering (Jobs and Workflows) for developing ETL processes to load data from multiple data sources to HDFS using Sqoop, Pig & Oozie for automating work flow.
  • Strong experience creating real time data streaming solutions using Apache Spark Core, Spark SQL & Data Frames, Spark Streaming.
  • Experience in extracting source data from Sequential files, XML files, CSV files, transforming and loading it into the target Data warehouse.
  • Configured a 20-30 node (Amazon EC2 spot Instance) Hadoop cluster to transfer the data from Amazon S3 to HDFS and HDFS to Amazon S3 and to direct input and output to the Hadoop MapReduce framework.
  • Worked on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines with Spark batch process on Kafka-Spark streaming process.
  • Hands on experience with Spark architecture and its integrations like Spark SQL, Data Frames.
  • Worked on Spark Machine Learning library for Recommendations, Coupons Recommendations, Rules Engine.
  • Good knowledge on converting complex RDMS (Oracle, MySQL & Teradata ) queries into Hive query language.
  • Excellent understanding /knowledge on Hadoop (Gen-1 and Gen-2) and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
  • Experience in administering large scale Hadoop environments including design, configuration, installation, performance tuning and monitoring of the cluster using Cloudera manager.
  • Experience in developing and scheduling ETL workflows in Hadoop using Oozie with the help of deployment and managing Hadoop cluster using Cloudera and Horton works .
  • Experienced in installation, configuration, administration, troubleshooting, tuning, security, backup, recovery and upgrades of Cent OS, Red Hat Enterprise Linux (RHEL) operating systems in a large environment.

TECHNICAL SKILLS:

Big Data: Spark, Hadoop, Sqoop, Oozie Ambari, Hue, Hive, Pig, HBase, Zoo Keeper and Kafka

Data Base: Oracle 11g, My SQL and Teradata

Scheduling and Ticketing: Autosys, Control M, JIRA, Service Now

Amazon Web Services: S3, EMR, EC2

Distribution: Cloudera, Hortonworks

Operating Systems: RHEL, Cent OS, Ubuntu, Windows

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Developer

Responsibilities:

  • Analyzing the existing data flow to the warehouses and taking the similar approach to migrate the data into HDFS.
  • Successfully written Spark Core RDD application to read auto generated 1 billion records and compare with measure performance of Apache Ignite RDD and Apache Spark RDD.
  • Worked on Spark and created Spark Core RDD’s using Scala to work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Experience on Kafka and Spark integration for real time data processing by using Kafka Producer and Consumer components for real time data processing.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
  • Used incremental imports on tables from RDBMS and importing them into Hive for the transformations, aggregations.
  • Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the hive queries decreased the time of execution from hours to minutes.
  • Involved in data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on importing data from My SQL DB to HDFS and vice-versa using Sqoop to configure Hive metastore with MySQL, which stores the metadata for Hive tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Developed workflows in Oozie and scheduling jobs in Mainframes by preparing data refresh strategy document & Capacity planning documents required for project development and support.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
  • Mastered major Hadoop distributes like Horton Works and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.

Environment: Hadoop, Hive, Sqoop, HBase, Pig, Oozie, My SQL, Jenkins, Git hub, Agile, Unix/Linux, Quality Center, Control-m, etc...

Confidential

Hadoop Developer

Responsibilities:

  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to HBase.
  • Installed and configured Hive, Pig, Sqoop, Oozie on the Hadoop cluster by Setting up and benchmarked Hadoop clusters for internal use.
  • Implemented data acquisition of Jobs using Python that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Created Data extract scripts and determine naming standards for schemas and tables in Hadoop DL.
  • Perform data validation against source system data for analyzing the existing database source files and tables to ingest data into Hadoop Data Lake.
  • Handled importing of data from various data sources, performed transformations using Hive and MapReduce for loading data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data into HDFS to perform data analytics.
  • Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action & Java action.
  • Analyzed substantial amounts of data sets to determine optimal way to aggregate and report on it.

Environment: MapReduce, Hive, Spark, Pig, My SQL, Cloudera Manager, Sqoop, Oozie, Kafka, No SQL, Eclipse, etc...

Confidential

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
  • Understand Business requirement and involved in preparing Design document preparation according to client requirement.
  • Analyzed Teradata procedure and imported all the data from Teradata to My SQL Database for Hive QL queries information for developing Hive Queries which consist of UDF’s where we don’t have some of the default functions in Hive.
  • Converted complex oracle code into HQL and developed UDF in Hive to reflect some keyword in Hive like Pivot and Unpivot.
  • Implemented Dynamic Partition and Bucketing in Hive as part of performance tuning for the workflow and co-ordination files using Oozie framework to automate tasks.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using PIG by importing data using Sqoop to load and export data from My SQL to HDFS and NoSQL Databases on regular basis for designing and developing PIG scripts to process data in a batch to perform trend analysis of data.
  • Developed data pipelines using Sqoop , Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
  • Developed Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
  • Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
  • Created Hive base script for analyzing requirements and for processing data by designing cluster to handle huge amount of data for cross examining data loaded in Hive and Map Reduce jobs .
  • Worked close with DevOps team to understand, design and develop end to end flow requirements by utilizing Oozie workflow to do Hadoop jobs.
  • Assisted with data capacity planning and node forecasting and collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Oozie, Cloudera, MySQL, CentOS, Apache HBase, HDFS, MapReduce, Hue, Hive, PIG, Sqoop, SQL, Windows, Linux

We'd love your feedback!