We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Birmingham, AL


  • Experience in Developing the Software Lifecycle core areas such as Analysis, Design, Implementation and Deployment of Object Oriented Distributed and Enterprise Applications with Java/J2EE (7,8 version) technologies.
  • Big Data implementation with strong experience on major components of Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Spark and Storm.
  • Strong experience on Hadoop distributions like Hortonwork, Amazon Web Service (AWS) and MapR.
  • Hands on experience using Sqoop to import data into HDFS from Oracle and vice - versa.
  • Experience in analyzing data using Hive, Pig Latin and custom Map Reduce programs in Java 7,8.
  • Experience in deploying and managing the Hadoop cluster using Cloudera Manager.
  • Good understanding in processing of real-time data using Spark.
  • Expertise in implementing Scala application using higher order functions for both batch and interactive analysis requirement.
  • Hands on work experience in writing applications on No SQL databases like Cassandra, HBase.
  • Good understanding of Impala and Kafka for monitoring and managing Hadoop jobs.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases. Used Informatica for ETL processing based on business.
  • Expertise in developing applications using Core Java concepts like OOPS, Multithreading, Garbage Collection.
  • Experience with application and web servers such as WebLogic 12c, Tomcat and jetty.
  • Strong working experience with Spring Framework, which includes usage of IoC/Dependency.
  • Experience in developing REST/SOAP based web services and API development.
  • Experienced in Web Services approach for Service Oriented Architecture (SOA).
  • Hands on experience on various DB platforms like Oracle and SQL.
  • Experienced with Agile SCRUM methodology, involved in design discussions and work estimations, takes initiatives, very proactive in solving problems and providing solutions.


Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Zookeeper, Spark, Kafka.

NoSQL Database: HBase, Cassandra.

Programming Languages: C, Core JAVA 7,8, Scala and Python.

Web technologies: Core Java, JSP, JDBC, Servlets.

Frame works: Strut, Spring, Hibernate.

Operating system: Linux, Unix, Mac Os, Windows 7/8/9.


Confidential, Birmingham, AL

Sr. Hadoop Developer


  • Hive external tables were used for raw data and managed tables were used for intermediate tables.
  • Developed Hive Scripts (HQL) for automating the joins for different sources.
  • Migration of ETL processes from MySQL to Hadoop utilizing Pig scripting and Pig UDF's as data pipe line for easy data manipulation.
  • Development of MapReduce programs and data migration from existing data Lake source using Sqoop.
  • Devised schemes to collect and stage large data in HDFS and also worked on compressing the data Lake using various formats to achieve optimal storage capacity
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data Lake.
  • Experience in hive partitioning, bucketing and performed joins on hive tables and utilizing hive SerDes like CVS, JSON.
  • Used Kafka to stream the application server logs into HDFS.
  • The logs that are stored on HDFS are analyzed and the cleaned data Lake is imported into Hive warehouse, which enabled end business analysts to write Hive queries.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Expertise in integrating Kafka with Spark streaming for high speed data Lake processing.
  • Wrote Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on Oracle database.
  • Used AWS to Runs tasks and stores data using the Hadoop Distributed File System (HDFS).
  • Involved in the tasks of resolving defects found in testing the new application and existing applications.
  • Used to monitor and manage the Hadoop cluster using Cloudera Manager.
  • Shell scripts were developed to add the process dates to the source files, to create trigger files
  • Developed various Big Data workflows using Kafka, Oozie.
  • Analyzed Business Requirements and Identified mapping documents required for system and functional testing efforts for all test scenarios.

Environment: Hadoop, Hive, Map Reduce, HDFS, Sqoop, Hbase, Pig, Oozie, AWS, Scala, Kafka, Bash, My-SQL, Oracle, Windows and Linux.

Confidential, Dublin, Ohio

Sr. Hadoop Developer


  • Translated the ETL job to MapReduce job by using the Informatica.
  • Installed Hadoop, MapReduce, HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Making use of Apache Kafka wherever possible in place of hive while analyzing data to achieve faster result.
  • Designed a data Structure warehouse using Hive and Importing and exporting data From SQL into HDFS and Hive using Sqoop.
  • Maintenance of all the services in Hadoop ecosystem using Informatica.
  • Used Spark Streaming on Scala to construct learner data Structure model from sensor data using MLib.
  • Used Spark API over Horton work hadoop YARN to perform analytics on data in Hive.
  • Developed multiple MapReduce jobs in Scala for data cleaning and preprocessing.
  • Worked with business teams and created Hive queries for ad hoc access.
  • Worked in converting Hive/SQL query into Spark transformations using Spark RDDs, Scala.
  • Used Spark to hold the intermediate results in memory rather than writing them to disk while working on the same dataset multiple times.
  • Implemented Storm topologies to pre-process data before move into HDFS system From oracle.
  • Configured Spark streaming to receive real time data from the Hortonwork and store data lake to HDFS using Scala.
  • Performed Performance tuning for Spark Streaming like setting right Batch interval time, correct level of Parallelism, selection of correct Serialization & memory Tuning.
  • Responsible for Load, aggregate and move large amounts of data lake using Kafka.
  • Developed a data pipeline using Sqoop and Storm to store data lake into HDFS.
  • Worked on the backend using Kafka and Spark to perform several aggregation logics.
  • Responsible for Continuous monitoring and managing the Hadoop cluster using Hortonwork Manager.
  • Maintain and develop ETL Informatics code written in Scala which pulls data from disparate internal and external sources.
  • Used RESTful and Soap based web sphere to communicate with the service layer from the Angular js
  • Loaded data structure from Unix File System into HDFS and Worked on Agile methodology for developing the project.

Environment: Hadoop, HDFS, Map Reduce, Hive, PIG, Sqoop, Hortonwork, Flume, Strom, Spark, Scala, RHEL, Unix, AWS, SQL, Kafka, Oracle, ETL (Informatica).

Confidential, Memphis, Tennesse

Hadoop Developer


  • Designed and developed a components of big data lake processing using HDFS, MapReduce, PIG, and Hive.
  • Exported data Structure from Oracle using Sqoop and Analyzed data using Hadoop components like Hive and Pig.
  • Imported the data lake from different source like Hortonwork into Spark RDD.
  • Wrote MapReduce jobs using Scala to load the data from system generated log file to oracle database and Configured SQL Database to store Hive Teradata.
  • Extracted the Rest request sent to and Rest response received from the backend system.
  • Developed Pig scripts in the areas where extensive coding needs to be reduced.
  • Optimized HIVE analytics Sql queries and achieve job performance Tuning.
  • To analyze migrated data used Hive data warehouse and developed Hive queries.
  • Developed a data lake structure pipeline using Spark and Storm to store data into HDFS.
  • Design technical solution for real-time analytics using Spark and Cassandra for faster testing and processing of data.
  • Installed, configured and optimized Hadoop infrastructure using Hortonwork Hadoop distributions CDH5 using Puppet.
  • Worked on debugging, Performance tuning of Hive& Pig Jobs.
  • Monitor the ETL (Informatica) process job and validate the data loaded in HDFS.
  • Use spark to analyze point-of-sale data and coupon usage and Worked with ETL (Informatica)tool to filter data based on end requirements.
  • Extensively used Pig for data cleansing and Written spark programs in Scala and ran spark jobs on yarn.
  • Used Kafka to co-ordinate cluster services. Installed Impala workflow engine to run multiple Hive and Pig jobs.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Working knowledge with ETL (Informatica)tool to filter data structure based on end requirements.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Experienced in writing Pig scripts to transform raw data from several data structure sources into forming baseline data.
  • Participated in development/Implementation of Kafka Hadoop environment.
  • Involved in loading data from Unix file system to HDFS using Sqoop and exported the analyzed data to the relational databases system.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Sqoop, Zookeeper, Spark, Impala, Scala, ETL, Oracle, Informatica, Hortonwork, Linux.

Confidential, Columbus, Ohio

Hadoop Developer


  • Involved in installing, configuring and managing Hadoop Ecosystem components like HDFS, Hive, Pig, Sqoop and Flume.
  • Worked on Linux shell scripts for business processes and with loading the data from different systems to the HDFS.
  • Developed a Restful WebService using spring framework.
  • Used Pig as ETL tool to do Transformations and some pre-aggregations before storing the data onto HDFS.
  • Developed scripts to automate the creation Sqoop jobs for various workflows.
  • Involved in generating analytics data structure using MapReduce programs written in Scala.
  • Used Hive data lake warehouse tool to analyze the data in HDFS and developed Hive queries.
  • Configured and designed Pig Latin scripts to process the data into a universal data model.
  • Involved in creating Hive internal and external tables, loaded them with data and writing hive queries which requires multiple join scenarios.
  • Created partitioned and bucketed tables in Hive based on the hierarchy of the dataset.
  • Used Sqoopfor Log aggregation to collect physical log files from servers and puts them in the HDFS for further processing.
  • Responsible for Continuous monitoring and managing the Hadoop cluster using Hortonwork Manager.
  • Configured deployed and maintained multi-node Dev and Test Hadoop Clusters.
  • To analyze data lake migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
  • Implemented SPARK using Scala and SparkSQL for faster testing and processing of data structure.
  • Designed and developed MapReduce programs for data lineage.
  • Extensively worked with Hortonwork Distribution Hadoop, CDH5.x, CDH4.x.
  • Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
  • Participated in development/implementation of Hortonwork Hadoop environment.
  • Developed workflow in Oozie to automate the tasks of loading the data structure into HDFS and pre-processing with Pig.
  • Created ETL (Informatica)jobs to generate and distribute reports from Cassandra database
  • Responsible for troubleshooting MapReduce jobs by reviewing the log files.
  • Experienced in loading data from Unix file system to HDFS.

Environment: Hadoop, Map Reduce, Hive, PIG, Sqoop, Kafka, Spark, Core java, Hortonwork, Oracle, ETL, Linux, Unix, Shell Scripting.

Confidential, St. Paul, Minnesota

Java/J2EE Developer


  • Analysis, design and development of Application based on J2EE using Spring and Hibernate.
  • Built the application using Spring MVC and Hibernate framework.
  • Developed Restful Web Service based on Rest-Jersey API and implemented the GET, PUT and POST functionalities.
  • Worked with core java technologies like Multi-Threading and Synchronization.
  • Business logic was developed using spring framework. Used Spring AOP for handling transactions.
  • Used Spring MVC framework for design and development of web application.
  • Implemented server side tasks using Servlets and XML.
  • Designed and developed the screens in HTML with client side validations in Javascript.
  • Used Oracle as the database and used Oracle SQL developer to access the database.
  • Implemented various Soap and REST services as a part of the application.
  • Written JUnit Test cases for performing unit testing.
  • Used Agile (SCRUM) methodologies for Software Development.

Environment: Core Java/J2EE, Multi-threading, Spring, Hibernate, Soap, Rest, Oracle, SQL, XML, junit.


Java/J2EE Developer


  • Responsible for development of Business logic in Core Java.
  • Worked with core java technologies like Multi-Threading and Synchronization.
  • Created RESTful Web service for updating customer data from sent from external systems.
  • Provided Hibernate mapping files for mapping java objects with database tables
  • Used Spring Framework and XML Bean to build Query service.
  • Used JDBC to invoke Stored Procedures and database connectivity to MYSQL.
  • Developed the Database interaction classes using JDBC, Core java and Implemented server side tasks using Servlets and XML.
  • Implemented Service and DAO layers in between Struts and Hibernate.
  • Designed and developed the screens in HTML with client side validations in JavaScript.
  • Responsible for coding MySQL Statements and Stored procedures for back end communication using JDBC.
  • Developed Restful web services including JSON formats for supporting client requests.
  • Involved in coding using Java Servlets, created web pages using JSP's for generating pages dynamically.
  • Worked on the JAVA Collections API for handling the data objects between the business layers and the front end.
  • Used JPA, hibernate combination to access data from ORACLE database using POJOs for coding simplicity.
  • Implemented various Soap and REST services as a part of the application.
  • Extensive use of Struts Framework for Controller components and view components.
  • Maven was used for building and Jenkins to run the periodic builds and tests of the application.

Environment:: Java, Apache Tomcat, JSF, J2EE, Eclipse, JDBC, Java Script, XML, Oracle, SQL/PLSQL, Spring, Hibernate, Soap, Rest, Struct, Json.

Hire Now