We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Phoenix, AZ


  • Having 7+ years of programming experience as a software developer, which includes hands on experience over 5+ years in Big Data Technologies and experience of 2+ years in Java, J2EE Technologies
  • Hands on experience in working with Hadoop framework stack including HDFS, MapReduce, YARN, Hive, Pig, HCatalog, HBase, Kafka, Sqoop, Flume, Zookeeper and Oozie.
  • In depth understanding of Hadoop architecture and various components such as Resource Manager, Node Manager, Application Master, Scheduler, Application Manager, Name Node, Data Node, HDFS and MapReduce programming paradigm
  • Implemented UDFs for Hive and PIG as per the business needs and strong understanding in Pig and Hive analytical functions.
  • Worked on Spark Core, Spark Streaming to handle real - time data from Kafka.
  • End to End Implementation in importing and exporting data using Sqoop from Relational Database to HDFS and from HDFS to Relational Database.
  • Performed administrative tasks such as installation and maintenance Hadoop framework and its components in test environment such as Sqoop, Flume, HBase, PIG and Hive
  • Extensively working experience with NoSQL databases like HBase.
  • Implemented Unit test cases and Integration test cases using Mockito and MRUnit
  • Experience in working with Apache Spark with Kafka to persist data to HBase
  • Experience in managing and reviewing Hadoop Log files
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs
  • Expertise in Java/J2EE technologies such as Core Java, Spring, JSF, Hibernate, JDBC, JSP, JSTL, HTML, JavaScript, jQuery
  • Extensive work experience in Object Oriented Analysis and Design and web technologies including HTML, XHTML, DHTML, JavaScript, JSTL, CSS, AJAX, Angular, Bootstrap and Oracle for developing server side applications and user interfaces.
  • Experience in deploying applications in heterogeneous application servers Tomcat, Web Sphere and Jboss
  • Experienced in working on Version Control tools like SVN and GitHub
  • Experienced in working with Different tools like JIRA, Maven, Service Now, Log4j
  • Excellent analytical, problem solving, communication and interpersonal skills with ability to interact with individuals at all levels and can work as a part of a team as well as independently.
  • A quick learner organized and highly motivated as well as a keen interest in the emerging technologies.


Languages: Java, Pig Latin - scripting, Hive/HQL and Linux shell scripts

Hadoop / Big Data: HDFS, MapReduce, YARN, HBase, Pig, Hive, HCatalog, Sqoop, Flume, Oozie and Spark

Java: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE’s: Eclipse, JDeveloper Studio, My Eclipse

Operating Systems: Linux, Fedora, CentOS, Ubuntu Windows XP, 7and MS DOS

J2EE Technologies: Spring, JSF, JMS, JNDI, Servlet 2.0, Hibernate

Scripting: JavaScript, JQuery, AJAX, AngularJS, Bootstrap

NoSQL Databases: HBase


Confidential, Phoenix, AZ

Hadoop developer


  • Working on a live 800+ nodes Hadoop cluster running on Cloudera Enterprise
  • Extensively worked with highly semi-structured and structured data of 700+ TB
  • Moved data to Big Data Platform, which is single source of truth for raw transformed data for analytics, reporting and decision making.
  • Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF’s in Hive querying.
  • Developed custom writable MapReduce JAVA programs to load web server and application logs into Hive using Flume.
  • Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HiveQL queries.
  • Optimized platform storage by using Parquet format for storing Hive tables.
  • Architected merging of multiple market feeds to single feed to save on storage and eliminating same schema for multiple markets
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Used Kafka and Spark Streaming to process real time data from source systems and stored in HBase to access it in real time.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Integrated Tableau with Hive and generated the reports which are consumed by the business analysts.
  • Designed and coded application components in an agile environment utilizing test driven development approach.

Environment: Hadoop, Pig, Hive, MapReduce, Sqoop, Flume, Linux, Cloudera, Spark, Kafka, Tableau, Oozie, Zookeeper, Service now, Rally, Shell Script.

Confidential, Charlotte, NC

Hadoop Consultant


  • Loaded customer data from various source systems such as Oracle, Mysql DB to HDFS using Sqoop.
  • Implemented 26 node CDH4 Hadoop cluster on Red hat Linux using Cloudera Manager.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Simple to complex MapReduce jobs, and loaded data into HBase.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Integrated Tableau with hive using ODBC drivers and connectors.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
  • Setup Hadoop cluster using EC2 (Elastic MapReduce) on managed Hadoop Frame Work.
  • Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.
  • Used S3 Bucket to store the jar’s, input datasets and used DynamoDB to store the processed output from the input data sets.
  • Wrote hive queries to export, import and query data in DynamoDB.
  • Ran hive queries in interactive and batch modes using AWS CLI and Console.

Environment: CDH4, Cloudera Manager, MapReduce, HDFS, Hive, Pig, HBase, Flume, Oracle, Mysql, Sqoop, Oozie, AWS, Tableau.

Hire Now