We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Durham, NC


  • 9+ years of experience with skills in analysis, design, development, debugging and deploying various software applications including 5+ years of experience in Hadoop Eco system and Big - Data Analytics.
  • IBM Certification In Big Data & Hadoop Development.
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop, Map Reduce, HDFS, HBase, Oozie, Sqoop, Flume, Pig & Hive.
  • Experience in analyzing data using Pig Latin, HQL, HBase and custom MapReduce programs in Java.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Hands on experience on configuring aHadoopcluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance and S3 configurations.
  • Skilled in developing Mapreduce programs using Hadoop Java API and also using Hives and pig to perform data analysis, data cleansing and data transformation.
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational DB systems.
  • Configuring Kerberos and integrating with Directory services.
  • Experience in working with different data sources like Flat files, XML files, log files and Databases.
  • Excellent understanding of Object Oriented design methodology and Core Java Concepts such as multi-threading, exception handling, generics, annotations, collections and I/O.
  • Good understanding of NoSQL Data bases and hands on work experience in writing application on No SQL databases.
  • Experience in writing Shell scripting.
  • Worked with different Python libraries like Pandas, Numpy, boto3, web etc.
  • Experience in collecting business requirements, writing functional requirements and test cases and creating technical design documents with UML - Use Cases, Class, and Sequence and Collaboration diagrams.
  • Excellent communication skills, interpersonal skills, problem solving skills, and a very good team player along with can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.


Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, HBase, Avro.Spark

Data Bases: Oracle 7/8/8i/9i/10g/11g/12c, MySQL 5.6.16, 5.6.20 SQL Server 2000/2005/2008/2012/2014 , PostgreSQL

NOSQL Databases: HBase, MongoDB, Cassandra

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Programming Languages: Python, Java, SQL, PL/SQL, Scala, Unix shell scripts

Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX

Tools: Used: Eclipse, Putty, Superputty, MS Office, Crystal Reports, Microsoft Visio


Confidential, Durham, NC

Hadoop Developer


  • Working on cloudera hadoop clusters.
  • Set up keytab for scheduled applications Kerberos authentication on hadoop clusters.
  • Working on POC to migrate existing applications on cloudera hadoop cluster to EMR.
  • Extracted the data from SQL Developer into HDFS using SQOOP and scheduled an incremental load to HDFS.
  • Designed and developed data pipeline to ingest data from different sources to HDFS.
  • Working extensively on impala for preparing input data for application and ingesting application output into impala for analytics. worked on create/drop temp impala table on fly to load data into parquet table
  • Used data scripts in python and created csv files to append tables in impala.
  • Developed RDD's using Python and coded Python applications for business requirements.
  • Worked on submitting spark jobs using spark-submit command.
  • Worked on creating Spark RDD's and Data Frames applying operations like Transformation and Actions and converting RDD's to Data Frames.
  • Imported data from AWS S3 into spark RDD, Performed transformations and actions on RDD's.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto-scaling groups.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Developed and enhanced python code generate genotyping report using different data inputs.
  • Migrating data from traditional database to hadoop cluster using custom python script.
  • Developed Unix Shell scripts and python scripts for scheduling and automating the job flow.
  • Providing support for the production applications.
  • Participated in all the stages of software development lifecycle including design, development, implementation, and testing.

Environment: CDH5.9.1, Hadoop 2.6.0, YARN, HDFS, Spark 1.6.0, Sqoop 1.99.5, Hive 1.1.1, Pig 0.14.0, Hbase 1.1.2, Zookeeper 3.4.6, Oracle 11g/10g, Java 1.8, Python 2.7.5, PyCharm, Putty

Confidential, Wilmington, Delaware

Hadoop/Spark consultant


  • Worked on different size of clusters on Cloudera and Hotonworks distribution.
  • Created Hive and Tez views and providing permission to user and AD groups on Ambari server.
  • Created Hive internal/external tables with proper static and dynamic partitions and working on them using HQL.
  • Written Hive queries for data analysis to meet the business requirement.
  • Performance tuning using Partitioning, bucketing of HIVE tables.
  • Worked on creating UDF for Hive and Impala.
  • Worked on creating the RDD's, Data Frame's for the required input data and performed the data transformations usingSparkScala.
  • Import the data from different sources like HDFS/Hbase intoSparkRDD.
  • Developed spark scripts by using Scala shell as per requirements.
  • Used spark cluster to manipulate RDDS (Resilient Distributed Datasets) and also used concepts of RDD partitions.
  • Loading data into spark RDD and do in memory data Computation to generate the Output response.
  • Assisted in upgrading, configuration and maintenance of various Hadoop Ecosystem components like Pig, Hive, and Hbase.
  • Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
  • Involved in HDFS maintenance and loading the data using Sqoop and responsible to manage data coming from different sources.
  • Experience in working onSparkSQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS
  • Experience managing and reviewing the Hadoop log files.
  • Used PIG Latin scripts by defining a schema, creating new relations, performing Pig-Join, sorting and filtering using Pig-Group on large data sets.
  • Monitored workload, job performance and capacity planning using Cloudera Manager
  • Flume is used moving large amounts of log data from many different sources to a centralized data store.
  • Developed the sqoop scripts in order to make the interaction between HDFS and MySQL Database.
  • Providing Hue access as per user request.
  • Responsible for maintain voltage security at cluster level and voltage server level by understanding encryption and decryption process.
  • Worked on creating backup script for voltage severs using bash scripting and set up cron tab jobs.
  • Providing L2 level support for different Application. Managing and reviewing Log files for troubleshooting purpose, meeting the SLA's on time
  • Sending and Receiving Handovers to and from Offshore by following Global Delivery Model.

Environment: CDH5.5, Hadoop 2.7.0, YARN, HDFS, Spark 1.4.1, Sqoop 1.99.5, Hive 1.1.1, Flume 1.6.0, Oozie 4.1.0 HDP 2.0, Pig 0.14.0, Kafka 0.9.0,Hbase 1.1.2, Zookeeper 3.4.6, Jenkins 1.6, Oracle 11g/10g, MySQL 5.6.2, Java 1.6, Superputty, Scala IDE, voltage 6.3

Confidential, Atlanta, GA

Hadoop/Spark consultant


  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
  • Worked on the SparkSQL for analyzing the data
  • Used Scala to write code for all Spark use cases.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’S and Scala.
  • Worked in SparkSQL on different data formats like JSON and Parquet.
  • DevelopedSparkscripts by using Scala shell commands as per the requirement.
  • UsedSparkAPI over ClouderaHadoopYARN to perform analytics on data in HDFS.
  • Load the data intoSparkRDD and performed in-memory data computation to generate the output response.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.

Environment: CDH5.5, Hadoop 2.6.0, YARN, HDFS, Spark 1.4.1, Sqoop 1.99.5, Hive 1.1.1, Flume 1.6.0, Oozie 4.1.0, Pig 0.14.0, Kafka 0.9.0,Hbase 1.1.2, Zookeeper 3.4.6, Oracle 11g/10g, MySQL 5.6.2, Java 1.6

Confidential, Atlanta, GA

Java/Hadoop Developer


  • Experienced in loading and transforming large sets of structured, semi-structured and unstructured data.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
  • Worked on importing data from HDFS to MySQL database and vice-versa using SQOOP.
  • Extensive experience in writing HDFS & Pig Latin commands.
  • Develop UDF's to provide custom hive and pig capabilities and apply business logic on that data.
  • Created Hive internal/external tables with proper static and dynamic partitions.
  • Using Hive analyzed unified historic data in HDFS to identify issues & behavioral patterns
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Experience in NoSQL database such as Hbase.
  • Monitored workload, job performance and capacity planning using Cloudera Manager
  • Installed Oozie workflow engine to run multiple MapReduce, Hive, Zookeeper and Pig jobs which run independently with time and data availability.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Performed File system management and monitoring on Hadoop log files.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Involved in Configuring core-site.xml and mapred-site.xml per the multi node cluster environment.
  • Created an e-mail notification service upon completion of job or the particular team which requested for the data

Environment: Hadoop 2.3.0, HDFS, Map Reduce, CDH5, HIVE 0.12.0, PIG 0.12.0, Hbase 0.98.1, Sqoop 1.4.3, Flume 1.4.0, Oozie 4.1.0, Zookeeper 3.4.5, MySQL 5.6.16, Java 1.6

Confidential, Norcross, GA

Software Engineer


  • Actively involved in all the phases of Software Development Life Cycle (SDLC) of the application Requirement gathering, Design Analysis and Code development
  • Extensively used core java concepts such as OOP and exceptional handling.
  • Developed Records using Java, HTML, CSS, JSP and Servlets and MySQL.
  • Worked on Eclipse IDE to write the code and integrate the application
  • Developed Information System J2EE and MySQL.
  • Responsible for developing the Struts-configuration file, Action classes for handling the Http requests from the front-end components, OOAD concepts applied.
  • Used Hibernate as the object relational mapping tool for persisting java objects.
  • Developed the front-end for faculty home pages using Dreamweaver.
  • Worked on documenting the project and analyzing the requirements of the project.
  • Tested the application for various inputs.
  • Wrote client side scripts in JavaScript for User signup, Administrator logon and for updating the profiles of users.
  • Involved in code reviews and cross checking whether coding standards are being followed.

Environment: Java/J2ee, JSP, Struts, Hibernate, Servlets, MySQL, SQL/PL SQL, Macromedia Dreamweaver, Apache Tomcat, JavaScript, HTML, Maven

Hire Now