We provide IT Staff Augmentation Services!

Big Data Developer Resume

Miami, FL

SUMMARY

  • Over 5+ Years of IT experience and Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud - based platforms.
  • Working knowledge of Hadoop, Hive, Sqoop, Pig, HBase & Oozie in real time environment and worked on many modules for performance improvements and architecture designing.
  • Experience in layers ofHadoopFramework - Storage (HDFS), Analysis (Pig and Hive) &Engineering (Jobs and Workflows) for developing ETL processes to load data from multiple data sources to HDFS using Sqoop, Pig & Oozie for automating work flow.
  • Perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/ reporting tools.
  • Hands on performance for improving techniques for data processing in Hive, Impala, Spark, Pig & Map-Reduce using methods dynamic partitioning, bucketing & file compression.
  • Experience in extracting source data from Sequential files, XML files, CSV files, transforming and loading it into the target Data warehouse.
  • Good knowledge on converting complex RDMS (Oracle, MySQL & Teradata) queries into Hive query language.
  • My area of expertise has been on performing duties such as Analytics, Design, Data warehouse Modeling, Development, Implementation, Maintenance, Migration and Production support of large-scale Enterprise Data Warehouses.
  • Well Exposure on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines with Spark batch process and did POC on Kafka-Spark streaming process.
  • Hands on experience with Spark architecture and its integrations like Spark SQL, Data Frames.
  • Experience in developing and scheduling ETL workflows in Hadoop using Oozie with the help of deployment and managing Hadoop cluster using Cloudera and Horton works.
  • Experienced in installation, configuration, administration, troubleshooting, tuning, security, backup, recovery and upgrades of Cent OS, Red Hat Enterprise Linux (RHEL) operating systems in a large environment.

TECHNICAL SKILLS

Big Data: Hadoop, Sqoop, Oozie Ambari, Hue, Hive, Pig, HBase and Kafka

Data Base: Oracle 11g, My SQL and Teradata

AWS Family: S3, EMR, EC2, Cloud Watch

Scheduling Tools: Autosys, Control M

Ticketing Tools: JIRA, Service Now

Distribution: Cloudera, Hortonworks, Map R

Operating Systems: RHEL, Cent OS, Ubuntu, Windows

PROFESSIONAL EXPERIENCE

Confidential, Miami, FL

Big Data Developer

Responsibilities:

  • Analyzing the existing data flow to the warehouses and taking the similar approach to migrate the data into HDFS.
  • Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the hive queries decreased the time of execution from hours to minutes.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
  • Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and Slave) for an AWS EMR Cluster.
  • Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
  • Exposure on Spark Architecture and how RDD’s work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
  • Involved in data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on importing data from My SQL DB to HDFS and vice-versa using Sqoop to configure Hive metastore with MySQL, which stores the metadata for Hive tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Experience on Kafka and Spark integration for real time data processing by using Kafka Producer and Consumer components for real time data processing by setting up Kafka mirror maker for data replication across the cluster's.
  • Created custom UDF’s for Spark and Kafka procedure for some of non-working functionalities in custom UDF into Scala in production environment.
  • Developed workflows in Oozie and scheduling jobs in Mainframes by preparing data refresh strategy document & Capacity planning documents required for project development and support.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
  • Mastered major Hadoop distributes like Horton Works and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.

Environment: Hadoop, Hive, Sqoop, Kafka, HBase, Pig, Oozie, My SQL, S3, EMR, Spark, Jenkins, Git hub, Agile, Unix/Linux, Quality Center, Control-m, etc...

Confidential, Boise, ID

Hadoop Developer

Responsibilities:

  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
  • Installed and configured Hive, Pig, Sqoop, Oozie on the Hadoop cluster by Setting up and benchmarked Hadoop clusters for internal use.
  • Implemented data acquisition of Jobs using Python that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Creating Data extract scripts and determine naming standards for schemas and tables in Hadoop DL.
  • Perform data validation against source system data for analyzing the existing database source files and tables to ingest data into Hadoop Data Lake.
  • Handled importing of data from various data sources, performed transformations using Hive and MapReduce for loading data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data into HDFS to perform data analytics.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to HBase.
  • Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action & Java action.
  • Analyzed substantial amounts of data sets to determine optimal way to aggregate and report on it.

Environment: MapReduce, Hive, Spark, Pig, My SQL, Cloudera Manager, Sqoop, Oozie, Kafka, No SQL, Eclipse, etc...

Confidential, Richmond, VA

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
  • Understand Business requirement and involved in preparing Design document preparation according to client requirement.
  • Analyzed Teradata procedure and imported all the data from Teradata to My SQL Database for Hive QL queries information for developing Hive Queries which consist of UDF’s where we don’t have some of the default functions in Hive.
  • Converted complex oracle code into HQL and developed UDF in Hive to reflect some keyword in Hive like Pivot and Unpivot.
  • Implemented Dynamic Partition and Bucketing in Hive as part of performance tuning for the workflow and co-ordination files using Oozie framework to automate tasks.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using PIG by importing data using Sqoop to load and export data from My SQL to HDFS and NoSQL Databases on regular basis for designing and developing PIG scripts to process data in a batch to perform trend analysis of data.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
  • Developed Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
  • Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
  • Created Hive base script for analyzing requirements and for processing data by designing cluster to handle huge amount of data for cross examining data loaded in Hive and Map Reduce jobs.
  • Worked close with DevOps team to understand, design and develop end to end flow requirements by utilizing Oozie workflow to do Hadoop jobs.
  • Assisted with data capacity planning and node forecasting and collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Oozie, Cloudera, MySQL, CentOS, Apache HBase, HDFS, MapReduce, Hue, Hive, PIG, Sqoop, SQL, Windows, Linux

Hire Now