We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Long Beach, CA


  • 8+ years of progressive experience in the IT industry with proven expertise in Analysis, Design, Development, Implementation and Testing of software applications using Big Data(Hadoop) Technologies and Java based technologies.
  • 4+ years of hands on experience with Big Data Hadoop core and Eco - System components including Spark, Scala, HDFS , Map Reduce, Hive, Pig, Storm, Kafka, YARN, HBase, Oozie, Zookeeper, Flume, Sqoop and Cassandra.
  • Experience working with Horton works distribution and Cloudera Hadoop distribution .
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
  • Developed multiple spark jobs in Scala/python for data cleaning, pre-processing and aggregating.
  • Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Optimized streaming log files with no time latency using Flume and more importantly operating the data down stream flow to Hadoop ecosystems and it analysis segments.
  • Developed multiple MapReduce jobs in java for data cleaning, pre-processing.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Experience in importing the data from the MySQL into the HDFS using Sqoop.
  • Hands on with NoSQL databases like MongoDB , HBase and Cassandra.
  • Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
  • Developed Pig Latin scripts for data cleansing and Transformation.
  • Good knowledge on various scripting languages like Linux/Unix shell scripting and Python .
  • Hands on importing the unstructured data into the HDFS using Flume.
  • Experience working with Build tools like Maven and Ant .
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS .
  • Experience with Kibana, data visualization tool for plugin.
  • Experience in working with Databases like oracle, MySQL, IBM DB2, Teradata.
  • Experience in Core java and J2EE technologies such as spring, structs, Hibernate, JDBC, EJB, Servlets, JSP and JavaScript.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Experienced and skilled Agile Developer with a strong record of excellent teamwork and successful coding .
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.


Hadoop, HDFS, MapReduce, Hive, Pig, Spark: streaming, Scala, Kafka, Storm, Zoo Keeper, HBase, Yarn, Spark, Sqoop, Flume, Mahout.

Programming Languages: C++, JAVA, Python, Scala

Hadoop Distributions: Apache Hadoop, ClouderaHadoop Distribution CDH3, CDH4, CDH5 and Horton works Data Platform (HDP)

NoSQL Databases: HBase, Cassandra, MongoDB

Query Languages: HiveQL, SQL, PL/SQL, Pig

Web Technologies: Java, J2EE, Struts, Spring, JSP, Servlet, JDBC, EJB, JavaScript

IDE s: Eclipse, NetBeans

Frameworks: MVC, Structs, Spring, Hibernate

Build Tools: Ant, Maven

Databases: Oracle, MYSQL, MS Access, DB2, Teradata

Operating systems: Windows (Red Hat, CentOS), Linux, Unix, CentOS

Scripting Languages: Shell scripting

Version Control system: SVN, GIT, Confidential


Confidential, Long Beach, CA

Hadoop Developer


  • Collected Members, Providers, Claims data from various SQL servers and ingested them in to the Hadoop Distributed File system.
  • Experience with Talend, as a Ingestion Tool.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Worked on Creating the Hive tables on top of the transformations.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running Hive jobs using Autosys.
  • Implemented multiple UDF’S to execute the business logic.
  • Building the Rules for the provider Usecase by interacting with the Provider Team in the organization and creating the extract according to the requirement.
  • Implemented mail generation logic when the automated pipeline refresh job fails.
  • Experience in monitoring the pipeline jobs and analyzing the log files.
  • Categorizing the provider data based on the requirement.
  • Analysing the cluster configurations and setting the driver memory, executor memory and number of cores according to it.
  • Performing joins among large data sets and performance tuning.
  • Monitoring the refresh of HBASE tables and data validation on the front end application.
  • Created partitioned tables in Hive, mentored analyst and test team for writing Hive Queries.
  • Involved in agile methodologies, daily scrum meetings, Sprint planning's.

Environment: Hadoop, Spark, HDFS, Hive, Flume, Sqoop, Oozie, HBase, MySQL, Shell scripting, Linux Red Hat, core Java 7, Eclipse, SBT.

Confidential, New york, NY

Sr. Hadoop Developer


  • Migrated complex Map reduce programs into Spark RDD transformations, actions.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive and spark.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Evaluated the performance of Apache Spark in analyzing genomic data.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Implemented Impala for data analysis.
  • Prepared Linux shell scripts for automating the process.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR, PIG, and Hive jobs using Kettle and Oozie (Work Flow management).
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Load and transform large sets of structured, semi structured, and unstructured data with Map Reduce, Hive, and Pig.
  • Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from various sources.
  • Created partitioned tables in Hive, mentored analyst and test team for writing Hive Queries.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in agile methodologies, daily scrum meetings, Sprint planning's.

Environment: Hadoop, Spark, HDFS, Pig, Hive, Flume, Sqoop, kafka, Oozie, HBase, Zookeeper, MySQL, Shell scripting, Linux Red Hat, core Java 7, Eclipse.

Confidential, Indianapolis, IN

Hadoop Developer


  • Experience in configuration, management, supporting and monitoring Hadoop cluster using Cloudera distribution.
  • Worked in Agile scrum development model on analyzing Hadoop cluster and different Big Data analytic tools including Map Reduce, Pig, Hive, Flume, Oozie and SQOOP.
  • Configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Loaded data into cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Established custom MapReduce programs to analyze data and used Pig Latin to clean unwanted data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented Partitioning, dynamic Partitions and Buckets in Hive for increasing performance benefit and helping in organizing data in logical fashion.
  • Implemented in loading and transforming of large data sets of different types of data formats like structured and semi-structured data.
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading data and writing hive queries.
  • Involved in scheduling Oozie workflow engine to run jobs automatically.
  • Implemented No SQL database like HBase for storing and processing different formats of data.
  • Involved in Testing and coordination with business in User testing.
  • Involved in Unit testing and delivered Unit test plans and results documents.

Environment: Apache Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Oozie, HBase, UNIX shell scripting, Zookeeper, Java, Eclipse.

Confidential - Rocky Hill, CT

Sr. Java/Hadoop Developer

  • Worked as Java/Hadoop Developer and responsible for taking care of everything related to the clusters.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SQL Context.
  • Performed analysis on implementing Spark using Scala.
  • Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets sitting on HDFS.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort and limit.
  • Extensively experienced in deploying, managing and developing MongoDB clusters.
  • Created Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Implemented some of the big data operations on AWS cloud.
  • Used Hibernate reverse engineering tools to generate domain model classes, perform association mapping and inheritance mapping using annotations and XML.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
  • Have an experience to load and transform large sets of structured, semi structured and unstructured data, using Sqoop from Hadoop Distributed File Systems to Relational Database Systems.
  • Created Hive tables to store the processed results in a tabular format.
  • Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Performed data transformations by writing MapReduce as per business requirements.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Involved in various NoSQL databases like HBase, Cassandra in implementing and integration.
  • Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.

Environment: Java, Spark, Python, HDFS, YARN, Hive, Scala, SQL, MongoDB, Sqoop, AWS, Pig, MapReduce, Cassandra, NoSQL

Confidential - Philadelphia, PA

Java Developer


  • Worked on developing the application involving Spring MVC implementations and Restful web services.
  • Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML, XHTML and AJAX.
  • Developed the spring AOP programming to configure logging for the application
  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
  • Developed code using Core Java to implement technical enhancement following Java Standards.
  • Worked with Swing and RCP using Oracle ADF to develop a search application which is a migration project.
  • Implemented Hibernate utility classes, session factory methods, and different annotations to work with back end data base tables.
  • Implemented Ajax calls using JSF-Ajax integration and implemented cross-domain calls using JQuery Ajax methods.
  • Implemented Object-relational mapping in the persistence layer using Hibernate frame work in conjunction with spring functionality.
  • Used JPA (Java Persistence API) with Hibernate as Persistence provider for Object Relational mapping.
  • Used JDBC and Hibernate for persisting data to different relational databases.
  • Developed and implemented Swing, spring and J2EE based MVC (Model-View-Controller) framework for the application
  • Implemented application level persistence using Hibernate and spring.
  • Data Warehouse (DW) data integrated from different sources in different format (PDF, TIFF, JPEG, web crawl and RDBMS data MySQL, oracle, Sql server etc.)
  • Used XML and JSON for transferring/retrieving data between different Applications.
  • Also wrote some complex PL/SQL queries using Joins, Stored Procedures, Functions, Triggers, Cursors, and Indexes in Data Access Layer.
  • Implementing Restful web services architecture for Client-server interaction and implemented respective POJOs for its implementations
  • Designed and developed SOAP Web Services using CXF framework for communicating application services with different application and developed web services interceptors.
  • Implemented the project using JAX-WS based Web Services using WSDL, UDDI, and SOAP to communicate with other systems.
  • Involved in writing application level code to interact with APIs, Web Services using AJAX, JSON and XML.
  • Wrote JUnit test cases for all the classes. Worked with Quality Assurance team in tracking and fixing bugs.
  • Developed back end interfaces using embedded SQL, PL/SQL packages, stored procedures, Functions, Procedures, Exceptions Handling in PL/SQL programs, Triggers.
  • Used Log4j to capture the log that includes runtime exception and for logging info.
  • Used ANT as build tool and developed build file for compiling the code of creating WAR files.
  • Used Tortoise SVN for Source Control and Version Management.
  • Responsibilities include design for future user requirements by interacting with users, as well as new development and maintenance of the existing source code.

Environment: JDK 1.5, Servlets, JSP, XML, JSF, Web Services (JAX-WS: WSDL, SOAP), Spring MVC, JNDI, Hibernate 3.6, JDBC, SQL, PL/SQL, HTML, DHTML, JavaScript, Ajax, Oracle 10g, SOAP, SVN, SQL, Log4j, ANT.

Hire Now