Big Data Engineer Resume

SUMMARY:

Over 5+ Years of IT experience and Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud based platforms.
Excellent Working knowledge of Hadoop, Hive, Sqoop, pig, HBase & Oozie in real time environment and worked on many modules for performance improvements and architecture designing.
Worked on developing ETL processes to load data from multiple data sources to HDFS using SQOOP , Pig & Oozie .
Perform structural modifications using Map - Reduce, HIVE and analyze data using visualization/ reporting tools.
Hands on experience in installing, configuring, monitoring, and using Hadoop ecosystem components like Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works, Flume .
Experience in extracting source data from Sequential files, XML files, CSV files, transforming and loading it into the target Data warehouse.
Good knowledge on converting complex RDMS (Oracle, MySQL & Teradata) queries into Hive query language.
My area of expertise has been on performing duties such as Analytics, Design, Data warehouse Modeling, Development, Implementation, Maintenance, Migration and Production support of large-scale Enterprise Data Warehouses.
Well Exposure on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
Have an Academic Knowledge on R and SAS for Analytics.
Experience in developing and scheduling ETL workflows in Hadoop using Oozie with the help of deployment and managing Hadoop cluster using Cloudera and Horton works.
Experienced in installation, configuration, administration, troubleshooting, tuning, security, backup, recovery and upgrades of Cent OS, Red Hat Enterprise Linux (RHEL) operating systems in a large environment.

TECHNICAL SKILLS:

Big Data: Hadoop, Sqoop, Oozie Ambari, Hue, Hive, Pig and HBase

Data Base: Oracle 11g and Teradata

Scheduling Tools: Autosys, Control M

Ticketing Tools: JIRA, Service Now

Distribution: Cloudera, Hortonworks

Operating Systems: RHEL, Cent OS, Ubuntu, Windows

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities :

Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
Developed data pipeline using Flume, Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive metastore with MySQL , which stores the metadata for Hive tables.
Written Hive, Pig & UDF’S as per requirements to automate the workflow using shell scripts.
Implemented Map Reduce jobs in Hive by querying the available data and designed the ETL process by creating high-level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
Worked on NoSQL databases like HBase and imported data from MySQL and processed data using Hadoop Tools and exported to Cassandra NoSQL database.
Exposure on Spark Architecture and how RDD’s work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
Developed workflows in Oozie and scheduling jobs in Mainframes by preparing data refresh strategy document & Capacity planning documents required for project development and support.
Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
Mastered major Hadoop distributes like Horton Works and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.

Environment : Big Data Tools, Hadoop, Hive, Sqoop, HBase, Pig, Oozie, My SQL, Jenkins, Git hub, Agile.

Confidential

Big Data Engineer

Responsibilities:

Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
Installed and configured Hive, Pig, Sqoop, Oozie on the Hadoop cluster by Setting up and benchmarked Hadoop clusters for internal use.
Developed and implemented data acquisition of Jobs using Scala that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow.
In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
For data exploration stage used Hive to get important insights about the processed data from HDFS.
Handled importing of data from various data sources, performed transformations using Hive and MapReduce for loading data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
Used UDF’s to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS.
Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action & Java action.
Analyzed substantial amounts of data sets to determine optimal way to aggregate and report on it.

Environment: MapReduce, Hive, Pig, My SQL, Cloudera Manager, Sqoop, Oozie, No SQL, Eclipse

Dominion

Hadoop Developer

Responsibilities:

Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
Understand Business requirement and involved in preparing Design document preparation according to client requirement.
Analyzed Teradata procedure and imported all the data from Teradata to My SQL Database for Hive QL queries information for developing Hive Queries which consist of UDF’s where we don’t have some of the default functions in Hive.
Converted complex oracle code into HQL and developed UDF in Hive to reflect some keyword in Hive like Pivot and Unpivot.
Implemented Dynamic Partition and Bucketing in Hive as part of performance tuning for the workflow and co-ordination files using Oozie framework to automate tasks.
Loaded and transformed large sets of structured, semi structured and unstructured data using PIG by importing data using Sqoop to load and export data from My SQL to HDFS and NoSQL Databases on regular basis for designing and developing PIG scripts to process data in a batch to perform trend analysis of data.
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
Developed Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
Created Hive base script for analyzing requirements and for processing data by designing cluster to handle huge amount of data for cross examining data loaded in Hive and Map Reduce jobs.
Worked close with DevOps team to understand, design and develop end to end flow requirements by utilizing Oozie workflow to do Hadoop jobs.
Assisted with data capacity planning and node forecasting and collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Oozie, Cloudera Distribution with Hadoop (CDH4), MySQL, CentOS, Apache HBase, HDFS, MapReduce, Hue, Hive, PIG, Sqoop, SQL, Windows, Linux

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship