We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

Franklin, TN


  • Over 7 years of experience in Hadoop and Java technologies involving Analysis, Design, Testing, Implementation and Training with HDFS, MapReduce, Apache Pig, Hive, HBase, Sqoop, Oracle, JSP, JDBC and Spring.
  • Very good experience in complete project life cycle design, development, testing and implementation of Client Server and Web applications
  • Developed scripts, numerous batch jobs scheduled under Hadoop ecosystem.
  • Experience in analyzing data using Hive Query Language, Pig Latin, and custom Map Reduce programs in Java.
  • Worked on Importing and exporting data from different databases like Oracle, Teradata, and MySQL into HDFS and Hive using Sqoop.
  • Involved in writing Database Queries, creating Stored Procedures, Views, Indexes, Triggers, Functions, Code optimization and performance.
  • Basic knowledge in Apache Spark for fast large scale in memory MapReduce.
  • Good working experience on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node & Map Reduce programming paradigm.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good working experience on Hadoop Cluster architecture and monitoring the cluster. In - depth understanding of Data Structure and Algorithms.
  • Expert in using Sqoop for fetching data from different systems to analyze in HDFS, and again putting it back to the previous system for further processing.
  • Good experience with MapReduce performance optimization techniques to effective utilization of cluster resources.
  • Have experience in creating ETL data pipelines by using MapReduce, Hive, Pig, Sqoop and UDF's (Hive, Pig) in Java.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Databases like Oracle, SQL Server, DB2, and MySQL.
  • Proficient in deploying applications on J2EE Application servers like WebSphere, WebLogic, Glassfish, JBoss and Apache Tomcat web server
  • Extensive experience in Java/J2EE applications including AngularJS, HTML, JavaScript, XML, JSON, CSS, SQL/HQL queries and data analysis.
  • Worked extensively on Web services and the Service-Oriented Architecture (SOA), Web Services Description Language (WSDL), Simple Object Access Protocol (SOAP), and UDDI.
  • Experience in utilizing Java tools in business, web and client server environments including Java platform, JSP, Servlet, Java beans, JSTL, JSP custom tags, JSF and JDBC.
  • Experience in designing class diagrams, block Diagrams and Sequence Diagrams by using Microsoft Visio.
  • Used Kafka for message brokering, streaming and log aggregation to put physical logs into centralized locations.
  • Highly motivated self-starter with Excellent Communication, Presentation and Problem Solving Skills and committed to learning new technologies.
  • Committed to professionalism, highly organized, ability to work under strict deadline schedules with attention to details, possess excellent written and communication skills.


Hadoop Technologies: HDFS, MapReduce, YARN, Hive, Pig, HBASE, Impala, Zookeeper, Sqoop, OOZIE, Apache Cassandra, Flume, Spark, AWS, EC2


Web Technologies: HTML, CSS, JavaScript, AJAX, Servlets, JSP, DOM, XML, XSLT.

Languages: C, Java, SQL, PL/SQL, Scala, Shell Scripts

Operating Systems: Linux, UNIX, Windows

Databases: NoSQL, Oracle, DB2, MySQL, SQL Server, MS Access, HBase

Application Servers: WebLogic, WebSphere, Apache Tomcat, JBOSS

IDE’s: Eclipse, NetBeans JDeveloper, IntelliJ IDEA.

Version Control: CVS, SVN, Git

Reporting Tools: Jaspersoft, Qlik Sense, Tableau, JUnit


Confidential, Franklin TN

Big Data/Hadoop Developer


  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Worked on the Log Analytics platform, which collects usage data from CRM users and determine popularity of features, adoption of newly added features & retire unused features.
  • Installed and configured Hadoop, MapReduce, and HDFS. Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis. Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Installed, configured, and wrote Pig Latin scripts, Wrote MapReduce job using Pig Latin.
  • Involved in extracting customer's Big data from various data sources into Hadoop HDFS. This included data from Excel, ERP systems, databases and also log data from servers.
  • Responsible for creating Hive tables, loading them with data, and writing Hive queries which ran internally in MapReduce.
  • Hands on exporting the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Involved in creating a plugin that allows Hadoop MapReduce programs, HBase, Pig, and Hive to work unmodified and access files directly.
  • Extensively used Pig for data cleansing.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage & review Hadoop log files and Data backups.
  • Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues.
  • Involved in loading data from UNIX file system to HDFS.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Involved in designing Restful services using Java based API’s like JERSEY.
  • Used Pig to do transformations, event joins, Java API and pre-aggregations performed before loading JSON files format onto HDFS.
  • Involved in resolving performance issues in Pig and Hive with understanding of Map Reduce physical plan execution and using debugging commands to run code in optimized way.
  • Good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Responsible for Data Ingestion using Flume and Kafka.
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Involved in loading data from LINUX file system to HDFS.
  • Very good understanding of Partitioning, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Supporting, Configuring, installing, building and designing Hadoop. Being a part of a POC effort in order to make new Hadoop clusters.
  • Used Kafka to move logs from physical repositories to centralized locations.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Apache Hadoop, Spark, HIVE, PIG, HDFS, Zookeeper, Kafka, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Storm, REST/SOAP API, AWS

Confidential, Seattle, WA

Big Data/Hadoop Developer


  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Developed java Map Reduce programs using core concepts like OOPS, Multithreading, Collections and IO.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Helped the team to increase cluster size from 35 nodes to 118 nodes. The configuration for additional data nodes was managed using Puppet.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Used Sqoop to import the data on to Cassandra tables from databases and also importing data from various sources to the Cassandra cluster using Java API's.
  • Supported HBase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.
  • Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
  • Involved in creating data-models for customer data using Cassandra Query Language.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Developed Pig Latin scripts to extract data from web server output files to load into HDFS.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Hands on writing Map Reduce code to make semi structured data as structured data and for inserting data into HBase from HDFS.
  • Implemented a script to transmit information from Webservers to Hadoop using Flume.
  • Used Zookeeper to manage coordination among the clusters.
  • Used Apache Kafka and Apache Storm to gather log data and fed into HDFS.
  • Created workflow in Oozie for Automating tasks of loading data into Amazon S3 and to preprocess using Pig, utilized Oozie for data scrubbing and processing
  • Developed scripts and deployed them to pre-process the data before moving to HDFS.
  • Performed extensive analysis on data with Hive and Pig.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4 and setup High Availability Cluster Integrate the HIVE with existing applications
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
  • Analyzed data by performing Hive queries and running Pig scripts to know user behavior.

Environment: Apache Hadoop, HIVE, PIG, HDFS, Zookeeper, Kafka, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Storm

Confidential, Los Angeles, CA

Java/Hadoop Developer


  • Performed Hadoop cluster environment administration like adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, & trouble shooting.
  • Worked on Hadoop cluster which ranged from 5-10 nodes during pre-production stage and it was sometimes extended up to 25 nodes during production.
  • Used the light weight container of the Spring Frame work to provide architectural flexibility for inversion of controller.
  • Involved in the configuration of System architecture by implementing Hadoop file system in master and slave systems in Red Hat Linux Environment.
  • Developed Map Reduce programs to cleanse data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
  • Wrote SQL queries to process the data using Spark SQL.
  • Extracted data from different databases and to copy into HDFS file system using Sqoop.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Used Maven with SOAP Web services (JAX-WS) using XML, WSDL and Apache CXF.
  • Used Spring Integration (SI) to expose some services of our application for other applications in the company to use.
  • Used SOAP UI to test the SOAP Web services.
  • Managed and created and altered Databases, Tables, Views, Indexes, and Constraints with business rules using T-SQL.
  • Created complex Stored Procedures, Triggers and User Defined Functions to support the front-end application.
  • Participated in trouble shooting the production issues and coordinated with the team members for the defect resolution under the tight timelines.
  • Implementation: Involved in end to end implementation in the production environment validating the implemented modules.

Environment: Apache Hadoop, HIVE, PIG, HDFS, Java, UNIX, MYSQL, Eclipse, Sqoop, REST/SOAP API

Confidential, Chicago, IL

Java/Hadoop Developer


  • Developed Map Reduce jobs for Log Analysis, Analytics and to generate reports for the number of activities created on particular day.
  • Involved in implementing complex Map Reduce programs to perform joins on the Map side using distributed cache in Java.
  • Implemented analytical algorithms using Map Reduce programs to apply on HDFS data.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Developed the application with help of Struts Framework that uses Model View Controller (MVC) architecture with JSP as the view.
  • Generated Business Logic using SOAP and deployed them on WebLogic server.
  • Responsible for determining bottlenecks and fixing the bottlenecks with performance tuning.
  • Configure JMS MQ in IBM WebSphere for various environment and make sure ESL connectivity between Dealer Direct, Trade Capture and Settlement via ESL message queue.
  • Create common Java components between the applications in order to convert data to appropriate state for the applications.
  • Coordinate with Engineering team to provide engineering design documents to build the environments as per Single Security project requirements.
  • Using IBM RAD application tool and debug any Java issues while deploying or integrate with other Java applications.
  • Developed Pig UDF's to know the customer behavior and Pig Latin scripts for processing the data in Hadoop.
  • Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.
  • Wrote Java code for file writing and reading, extensive usage of data structure Array List and Hash Map.
  • Wrote test cases which adhere to a Test-Driven Development (TDD) pattern.
  • Built scripts using ANT that compiles the code, pre-compiles the JSPs, built an EAR file and deployed the application on the application server.
  • Used CVS as a version control system, an important component of Source Configuration Management (SCM).

Environment: Apache Hadoop, HIVE, PIG, HDFS, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, SOAP

Confidential, South Windsor, CT

Java Developer


  • Involved in requirements gathering and analysis from the existing system. Captured requirements using Use Cases and Sequence Diagrams.
  • Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
  • Implemented MVC, DAO, Session façade, Service locator J2EE design patterns as a part of application development.
  • Designed and implemented the user interface using JSP, Servlets, JavaScript, HTML, CSS and AJAX.
  • Used Custom Tag Library (JSTL) to build the user Interface of the application.
  • Implemented the MVC pattern with Struts framework with Tiles for the presentation layer
  • Implemented various design patterns: Singleton, Data Access Object (DAO), Command Design Pattern, Factory Method Design Pattern.
  • Used Web Services - WSDL and SOAP for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
  • Involved in requirement gathering, HLD and LLD and prepared activity diagrams, sequence diagrams, class diagrams & use case diagrams for various use cases using Rational Rose.
  • Extensively involved in the development of backend Logics or data access logic using Oracle DB & JDBC.
  • Developed stored procedures, triggers and functions with PL/SQL for Oracle database.
  • Used Ant for building & worked with Production Control team for implementation & deployment.
  • Used Log4J for logging and analyzing system performance and flow, involved in code refactoring and bug fixing
  • Tested Service and data access tier using JUnit in TDD methodology

Environment: Java (JDK 1.6), J2EE, JSP, Servlet, JavaScript, HTML, CSS, AJAX, Eclipse, Oracle 10g, Log4J, JUnit

Hire Now