We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Atlanta, GA


  • Around 7 years of overall experience in Financial, Marketing and Enterprise Application Development in diverse industries which includes hands on experience in Big data ecosystem related technologies.
  • Three years of comprehensive experience as Hadoop Developer.
  • Experience in writing Hadoop Jobs for analyzing data using Hive and Pig
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on amazon web services (AWS).
  • Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume, Map reduce, Spark, Kafka, storm and Oozie. sss
  • Set up standards and processes for Hadoop based application design and implementation.
  • Extensive experienced in working with NoSQL databases including HBase, Cassandra and MongoDB.
  • Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Experience in using Pig, Hive, Scoop, HBase and Cloudera Manager.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems(RDBMS)and vice-versa.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFC, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce Concepts
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in analyzing data using HiveQL, PigLatin, and custom Map Reduce programs in Java.
  • In depth and extensive knowledge of Hadoop architecture and its components.
  • Familiarity and experience with Data warehousing and ETL tools.
  • Experienced in NoSQL databases such as HBase, Cassandra and MongoDB.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Experienced in analyzing/processing data using HiveQL, Storm, Kafka, Redis, Flume, Sqoop, Pig Latin, and custom MapReduce programs in Java.
  • Knowledge on importing and exporting data using Flume and kafka.
  • Familiarity working with popular frameworks likes Struts 1.1, Hibernate 3.0, Spring IOC, Spring AOP and Spring JDBC
  • Experience using middleware architecture using Sun Java technologies like J2EE, JSP 2.0, Servlets 2.4, JDBC, JUnit and application servers like Web Sphere 7.1 and Web logic 10.3.
  • Good understanding of XML methodologies (XML, XSL, XSD) including Web Services (JAX-WS Specification) and SOAP
  • Experience in Web Services using XML, HTML and SOAP.
  • Experience in component design using UML Design, Use case, Class, Sequence, Deployment and Component diagrams for the requirements.
  • Experience in Message based systems using JMS, TIBCO & MQ Series.
  • Experience in writing database objects like Stored Procedures, Triggers, SQL, PL/SQL packages and Cursors for Oracle, SQL Server, DB2 and Sybase.
  • Experienced in using CVS, SVN and Sharepoint as version manager.
  • Proficient in unit testing the application using Junit, MRUnit and logging the application using Log4J.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem-solving technique and leadership skills.


Bigdata Technologies:: HDFS, pig, Hive, Hana, AWS, ELK, Elastic search, Map Reduce, Pig, Sqoop, Oozie, Avro, Zookeeper, YARN, Spark, Scala

Scripting Languages:: Shell, Python, Perl

Tools: Quality center v11.0\ALM, TOAD, JIRA, HP QTP, HP UFT, ETL, Informatica, Selenium, Test NG, JUnit

Programming Languages:: Java, C.., C, SQL, PL/SQL

QA methodologies:: Waterfall, Agile, Devops, V-model.

Front End Technologies:: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP

Java Frameworks: : MVC, jQuery, Apache Struts2.0, spring and Hibernate

Defect Management:: Jira, Quality Center.

Domain Knowledge:: GSM, WAP, GPRS, CDMA and UMTS (3G)

Web Services:: SOAP (JAX-WS), WSDL, SOA, Restful (JAX-RS), JMS

Application Servers:: Apache Tomcat, Web Logic Server, Web Sphere, JBoss

Databases: : Oracle 11g, MySQL, MS SQL Server, IBM DB2 NoSQL Databases HBase, MongoDB Cassandra Data Stax Enterprise 4.6.1

Cassandra RDBMS:: Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL

Operating Systems:: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows


Hadoop Developer

Confidential, Atlanta, GA


  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Responsible for Spark Streaming configuration based on type of Input Source
  • Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
  • Developing business logic using scala.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop
  • Import the data from different sources like HDFS/HBase into SparkRDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Getting data using Kafka and processing using Spark
  • Developing traits and case classes etc in scala.
  • Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
  • Imported data from structured data source into HDFS using Sqoop incremental imports.
  • Implemented Kafka Custom partitioners to send data to different categorized topics.
  • Implemented Storm topology with Streaming group to perform real time analytical operations.
  • Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
  • Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
  • Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database
  • Used Talend Studio 6.2 to re-write the SSIS ETL packages.
  • Experience in implementing Kafka consumers and producers by extending Kafka high-level API in java and ingesting data to HDFS or Hbase depending on the context.
  • Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS (SNS, SQS)
  • Developed Spark scripts by using Python Shell commands as per the requirement.
  • Experience implementing machine learning techniques in spark by using spark Mlib.
  • Involved in moving data from Hive tables into Cassandra for real time analytics on hive tables.
  • Involved in using Hadoop bench marks in monitoring, testing Hadoop cluster.
  • Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
  • Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
  • Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.

Environment: s: Spark, Spark SQL, Spark Streaming, Cloudera, Map Reduce, Hive, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle.

Sr. Hadoop Developer

Confidential, Houston, TX


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Involved in performing the Linear Regression using Scala API and Spark.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Developed Simple to complex Map/reduce Jobs using Hive.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in Data Warehousing and ETL processes and Strong database, SQL, ETL and data analysis skills.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters
  • Developed multiple Kafka Producers and Consumers from base by using low level and high level API's and implementing.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing SparkRDD transformations, actions to implement business analysis migrated Hive QL queries on structured into SparkQL to improve performance.
  • Implemented Strom topologies as pre processing components before move data from Kafka consumers to HDFS and Cassandra.
  • Configured, deployed and maintained a single node storm cluster in DEV environment

Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Avro, Spark, Spark SQL, Spark Streaming,Kafka, Storm, Datameter, Teradata, SQL Server, IBM Mainframes, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA.

Hadoop Developer

Confidential, Dallas, TX


  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Implemented CDH3 Hadoop cluster on CentOS.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations
  • Wrote MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi-structured and unstructured data.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Java, and Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Installed Oozie workflow engine to run multiple Map Reduce, HiveQL and Pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, HDFS, Hive, Pig, Sqoop, MapReduce, Cloudera, NoSQL, HBase, Shell Scripting, Linux.

Java Developer



  • Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio.
  • Designed and developed Optimization UI screens for Rate Structure, Operating Cost, Temperature and Predicted loads using JSF myfaces, JSP, JavaScript and HTML.
  • Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
  • Developed JSP web pages for rate Structure and Operating cost using JSF HTML and JSF CORE tags library.
  • Designed and developed the framework for the IMAT application implementing all the six phases of JSF life cycle and wrote Ant build, deployment scripts to package and deploy on JBoss application server.
  • Designed and developed Simulated annealing algorithm to generate random Optimization schedules and developed neural networks for the CHP system using Session Beans.
  • Integrated EJB 3.0 with JSF and managed application state management, business process management (BPM) using JBoss Seam.
  • Wrote Angular.JS controllers, views, and services for new website features.
  • Developed Cost function to calculate the total cost for each CHP Optimization schedule generated by the Simulated Annealing algorithm using EJBs.
  • Implemented spring web flow for the Diagnostics Module to define page flows with actions and views and created POJOs and used annotations to map them to SQL Server database using EJB.
  • Wrote DAO classes, EJB 3.0 QL queries for Optimization schedule and CHP data retrievals from SQL Server database.
  • Used Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking
  • Created combined deployment descriptors using XML for all the session and entity beans.
  • Wrote JSF and JavaScript validations to validate data on the UI for Optimization and Diagnostics and Developed Web Services to have access to the external system (WCC) for the • Designed and coded application components in an agile environment utilizing a test driven development approach.
  • Skilled in test driven development and agile development.
  • Created technical design document for the Diagnostics Module and Optimization module covering Cost function and Simulated Annealing approach.
  • Involved in code reviews and performed version guidelines.

Environment: Java 1.5, J2EE, Microsoft Vision, EJB 3.0, JSP, JSF, JBoss Seam, JIRA, Web Services, JMS, JavaScript, Angular.js, HTML, ANT, Agile, JUnit, JBoss 4.2.2, MS SQL Server 2005, My ECLIPSE 6.0.1.

Java Developer



  • Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
  • Involved in complete requirement analysis, design, coding and testing phases of the project.
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Generated XML Schemas and used XML Beans to parse XML files.
  • Created Stored Procedures & Functions. Used JDBC to process database calls for DB2/AS400 and SQL Server databases.
  • Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
  • Created Data sources and Helper classes which will be utilized by all the interfaces to access the data and manipulate the data.
  • Designed and developed the front end using JavaScript.
  • At database end, responsibilities included creation of tables, integrity constraints, stored procedures, triggers and views.
  • Designed, developed and deployed on Bundled WebLogic server.
  • Implemented database interactions with Oracle 9i using JDBC API.
  • Developed web application called iHUB (integration hub) to initiate all the interface processes using Struts Framework, JSP and HTML.
  • Developed the interfaces using Eclipse 3.1.1 and JBoss 4.1 Involved in integrated testing, Bug fixing and in Production Support

Environment: Java 1.3, Servlets, JSPs, Java Mail API, Java Script, HTML, Spring Batch XML Processing, MySQL 2.1, Swing, Java Web Server 2.0, JBoss 2.0, RMI, Rational Rose, Red Hat Linux 7.1.

Hire Now