Hadoop Developer Resume

SUMMARY:

Professional experience of almost 2 years in IT and Expertise in Hadoop, HDFS, HBASE, Hive, Pig, Sqoop, Flume, Spark, Linux, SQL, and NoSQL.
Certified Hadoop Developer.
Passionate about working on Big Data/Hadoop Technologies.
Implemented Innovative solutions for big data Analysis using various Ecosystem components like Sqoop, Flume, Hive, Pig, Hbase, Spark and Zookeeper.
Great Understanding and extensive knowledge of HDFS Architecture and various components such as Name node, Job tracker, Data node, Task tracker and Map Reduce.
Hands on experience in Hive Query Language and debugging Hive issues.
Expertise in data loading tools like Sqoop and Flume.
Experience in analyzing and processing the streaming data into HDFS using Kafka with Spark .
Experience in developing Map Reduce programs using Java for to process and perform analytics on Big Data.
Excellent understanding of NoSQL Databases including Cassandra, Hbase, MongoDB.
Wrote custom UDFs for Hive and Pig using Java for analyzing the data efficiently.
Experience in using Sqoop for to import and export data from RDBMS to HDFS, vice versa.
Experience in Full Software Development Life Cycle including Analysis, Design, Implementation, Deployment and Maintenance of web applications.
Worked with the software development models, Waterfall Model and the Agile Software Development Methodology.
Excellent interpersonal skills, problem solving skills, good team player, ability to learn new concepts and hard worker.

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, Map Reduce Hive, Pig, Sqoop, Flume, Hbase, Oozie, Zookeeper, and YARN.

Programming Languages: Java, SQL, Scala, C/C++, Python.

Operating Systems: Linux, Unix, Windows.

Databases: SQL Server, MySQL, Oracle, Cassandra, Hbase, MongoDb.

Other Concepts: OOPS, Data Structures, Algorithms, ETL, Data Analytics.

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Developer

Responsibilities:

Responsible for writing the Map Reduce code in java for data cleaning and preprocessing
Importing/Exporting the data from RDBMS to HDFS.
Importing and exporting data into HDFS and Hive using Sqoop.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Hive and also written Hive UDFs.
Worked with Multiple input formats such as Text File, Key Value, Sequence File input formats.
Created HBase tables to store variable data formats of PII data coming from different portfolios.
Cluster coordination services through Zookeeper.
Developing Scripts and Batch Job to schedule various Hadoop Programs.
Written Hive queries for data analysis to meet the business requirements.
Creating Hive tables and working on them using Hive QL.
Wrote Map Reduce job using Pig Latin.
Analyzed large data sets using PIG scripting and Hive Queries.
Developing Spark code in Scala and Spark - SQL environment for faster testing and processing of data.
Loading the data into Spark RDD and doing in-memory computation to generate the output response with less memory usage.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Imported and processed the Structured, Semi-Structured, Unstructured data using Map Reduce, Hive, and Pig.
Created Hive External and Internal tables, involved in loading data and writing Hive Queries.
Experience in analyzing and processing the streaming data into HDFS using Kafka with Spark .
Worked on NoSQL databases like Cassandra, MongoDB, and Hbase.
Experience in using Sequence files, RC File, AVRO and HAR file formats.

Environment: Hadoop, HDFS, Map Reduce, Yarn, Java, Pig, Spark, Hive, Sqoop, Flume, Cassandra, Cloudera, MongoDb, Hbase, Linux, SQL, MySql, Oozie.

Confidential

Data Analysist

Responsibilities:

This data set consists of more than 25 columns separated with '\t'.
Loaded the Data from Local File system to HDFS.
Wrote the Pig Script to find out what are the top 5 categories with maximum number of videos uploaded.
Wrote Pig Script to find out the top 10 rated videos in YouTube.
Using Pig Script to find which video has maximum number of views.
Using Pig Script found which video has maximum number of comments.
Used Flume to collect the streaming data.
Configured the source, sink, and channel in flume configuration file in order to collect streaming data.
Started the flume agent. This will start fetching the data from twitter, and send into HDFS.
After some time stopped the fetching of data using 'ctrl + c'.
Checked the sink HDFS folder for whether data stored or not.
Because the data from twitter is in Json format, so Pig Json Loader is required.
Registered required jars for Json Loader.
The tweets are in nested Json format and consist of map data types. So to load the tweets using Json Loader which supports maps, so I used elephant bird Json Loader to load the tweets.
Wrote the complex pig script to find the time zone and average rating on topic.
This dataset consists of 12+ columns separated with 'comma'.
Loaded the data into HDFS.
Loaded the data into Pig Grunt shell using load command.
Wrote pig script to find the average age of males and females who died in the Titanic tragedy.
Wrote the pig script to find out how many people died and how many are survived in each class with their genders and ages.
Stored the results into HDFS.

Mobile Contacts: .Using C++ and Data Structures concepts I wrote the code for Mobile contacts. This is a menu driven code. With this code we can View, Add, Delete, Update, and Search the contacts.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship