We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Developer Resume

Merrimack, NH


  • Over 9+ years of IT experience as Sr. Big Data Developer using Hadoop, HDFS, Hortonworks, MapReduce and Hadoop Ecosystem (Pig, Hive, Impala and Spark, Scala ), Java and J2EE.
  • Hands on experience in setting up databases using RDS, storage using S3 bucket and configuring instance backups to S3 bucket to ensure fault tolerance and high availability.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Working experience on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, Pig, and MapReduce.
  • Experience in analyzing data using HiveQL, and custom MapReduce programs in Java.
  • Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
  • Proficient in development methodologies such as Scrum, Agile, and Waterfall.
  • Experience in working with No-SQL database like MongoDB, Cassandra, and HBase.
  • Experience in loading data into Spark schema RDD's and querying them using Spark-SQL.
  • Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 WebServices which provides fast and efficient processing of Teradata Big Data Analytics.
  • Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used SparkSQL as required.
  • Experience on developing Java MapReduce jobs for data cleansing and data manipulation as required for the business.
  • In-depth understanding of spark architecture including spark core, Spark SQL, Data Frames and Spark Streaming.
  • Strong knowledge on Hadoop eco-systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
  • Expertise in JavaScript, JavaScript MVC patterns, objects Oriented JavaScript Design Patterns and AJAX calls.
  • Experience in Object Oriented language like Java and Core Java.
  • Experience in creating web-based applications using JSP and Servlets.
  • Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
  • Experience on Spark and SparkSQL, Spark Streaming, Spark GraphX, Spark MLlib.
  • Extensively worked with object oriented Analysis, Design and development of software using UML methodology.
  • Knowledge on Spark and its in-memory capabilities, mainly in framework exploration for transition from MapReduce to Spark.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Experience in using various Hadoop Distributions like Cloudera, Hortonworks and Amazon EMR.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Expertise in using J2EE application servers such as IBM WebSphere, JBoss and web servers like Apache Tomcat.
  • Experience in using ANT and Maven for building and deploying the projects in servers and using JUnit and log4j for debugging.


Big Data Ecosystem: MapReduce MRv2, HDFS, HIVE 2.3, HBase 1.2 Pig 0.17, Sqoop 1.4.7, Apache Flume 1.8, HDP, Oozie 4.3, Zookeeper 3.4, Spark 2.3, Kafka 2.0, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Platform: Amazon AWS, EC2, Redshift

Databases: Oracle 12c, MySQL, MS-SQL Server 2017/2016

Version Control: GIT, GitLab, SVN

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

NoSQL Databases:: HBase and MongoDB

Programming Languages: Java 8, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, Intellij, Toad and NetBeans.


Confidential, Merrimack, NH

Sr. Big Data/Hadoop Developer


  • As a Sr. Big Data Developer, I worked on Hadoop eco-systems including Hive, HBase, Oozie, Pig, Zookeeper, Spark Streaming MCS (MapR Control System) and so on with MapR distribution.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and Pre-processing.
  • Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Worked on Apache Solr which is used as indexing and search engine.
  • Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
  • Worked on analyzing Hadoop stack and different Big data tools including Pig and Hive, Hbase database and Sqoop.
  • Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Exported the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
  • Built code for real time data ingestion using Java, Map R-Streams (Kafka) and STORM.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into Hive tables.
  • Used J2EE design patterns like Factory pattern & Singleton Pattern.
  • Used Spark to create the structured data from large amount of unstructured data from various sources.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, Impala and loaded final data into HDFS.
  • Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
  • Experienced in designing and developing POC's in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Responsible for coding MapReduce program, Hive queries, testing and debugging the MapReduce programs.
  • Managed and supported of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
  • Imported data using Sqoop to load data from Oracle to HDFS on regular basis or from Oracle server to HBase depending on requirements.
  • Built web portal using JavaScript, it makes a REST API call to the elastic search and gets the row key.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Created and maintained various Shell and Python scripts for automating various processes and optimized MapReduce code, pig scripts and performance tuning and analysis
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
  • Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
  • Used RESTful web services with MVC for parsing and processing XML data.
  • Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
  • Involved in loading data from UNIX file system to HDFS. Involved in designing schema, writing CQL's and loading data using Cassandra.
  • Built the automated build and deployment framework using Jenkins, Maven etc.

Environment: Spark 2.3, Hive 2.3, Pig 0.17, SQL, HBase, Sqoop 2.0, Kafka 2.0, Scala 2.12, Apache Flume 1.8, Cassandra 3.11, Zookeeper 3.4, Python, MapReduce MRv2, Hortonworks

Confidential, Eden Prairie, MN

Big Data/Hadoop Developer


  • As a Big Data/Hadoop Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Worked with Apache Nifi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems.
  • Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
  • Developed Scala, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Exported analyzed data to relational databases using Sqoop in deploying data from various sources into HDFS and building reports using Tableau.
  • Exported analyzed data to relational database using Sqoop for visualization to generate reports for the BI team.
  • Used the JSON and Avro for serialization and deserialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF's.
  • Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
  • Involved in developing ETL data pipelines for performing real-time streaming by ingesting data into HDFS and HBase using Kafka and Storm.
  • Involved in moving log files generated from varied sources to HDFS, further processing through Flume.
  • Involved in creating Hive tables by using Impala and working on them using HiveQL and perform data analysis using Hive and Pig.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Involved in creating Hive tables by using Impala and working on them using HiveQL and perform data analysis using Hive and Pig.
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in MongoDB.
  • Creating Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used Spark to create the structured data from large amount of unstructured data from various sources.
  • Used Apache Spark on Yarn to have fast large scale data processing and to increase performance.
  • Responsible for design & development of Spark SQL Scripts using Scala/Java based on Functional Specifications.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.

Environment: Hadoop 3.0, Kafka 2.0.0, Pig 0.17, Hive 2.3, MVC, Scala 2.12, JDBC, Oracle 12c, POC, Sqoop 2.0, Zookeeper 3.4, Python, Spark 2.3, HDFS, EC2, MySQL, Agile.

Confidential, Las Vegas, NV

Sr. Java/Hadoop Developer


  • Responsible for ingesting large volumes of data into Hadoop Data Lake Pipeline on daily basis.
  • Developed Spark programs to Transform and analyze the data.
  • Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Wrote simple MapReduce programs and executed them by using Eclipse IDE .
  • Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQl Server.
  • Performed building and deployment of EAR, WAR, JAR files on test, stage systems in Web logic Application Server.
  • Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Implemented java J2EE technologies on the server side like Servlets, JSP and JSTL.
  • Worked in Implementing Hibernate by creating hbm.xml file to configure the Hibernate to the Oracle Database.
  • Involved in writing SQL Queries, Stored Procedures and PL/SQL for the back-end server.
  • Used HTML, JavaScript for creating interactive User Interfaces.
  • Extensively used Custom JSP tags to separate presentation from application layer.
  • Developed JSP Pages and implemented AJAX in them for a responsive User Interface.
  • Involved in developing presentation layer using JSP and Model layer using EJB Session Beans.
  • Implemented Unit test cases by using JUnit and Implemented Log4J for logging and debugging the application.
  • Implemented Maven Build Scripts for building the application.
  • Deployed the application in IBM Web Sphere and tested for and server related issues.
  • Used Git as the repository and for Version Control. Used Intellij as the IDE for the development.

Environment: Java 7, Pig 0.16, MapReduce MRv1, PL/SQL, SOA, MVC, HDFS, JSP, HTML5, Shell Script, Hibernate 5.0.0, Apache Tomcat.

Confidential, Baton Rouge, LA

Sr. Java/J2EE Developer


  • Responsible for designing, coding and developed the application in J2EE using MVC architecture.
  • Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
  • Actively involved in the application architecture and development tools for web solutions that fulfill the business requirements of the project.
  • Developed application using Spring Framework that leverages classical Model View Controller (MVC) architecture, and Hibernate as the ORM.
  • Used Eclipse as the Java IDE in the development of the application.
  • Developed Use case diagrams, Object diagrams, Class diagrams, and Sequence diagrams using UML
  • Implemented client side validations using JavaScript and server side validation in the Action classes.
  • Developed Web pages using Angular JS, JavaScript, HTML, AJAX, CSS, and XSLT to create the user interface views.
  • AngularJS was used to binding information between elements of the pages and for routing of the WebPages.
  • Created BI Controllers based Java classes working together with XML transformation layer, to transform data received from the data providers.
  • Designed and developed a web client using Servlets, JavaScript, HTML and XML.
  • Validated if existing web services can be reusable to support new UI functionality, and created Spring boot services for processing scheduled or one time or stored payment functionalities.
  • Worked on basic authentication in both Java Spring Boot and IIB, for implementing security between front end UI and backend SOA services (Java Spring boot & IIB), using base encoded authentication string.
  • Used JMS for sending the messages to the Export Queue.
  • Used JUnit to unit test the modules & Log4j for logging error/debug messages.
  • Implemented exception handling in Java Spring Boot for REST API, by making use of Exception Handler and Controller Advice annotations.
  • Developed Java classes for implementing asynchronous processing using Web logic.
  • Involved in creation and deployment of Enterprise Application in Web Logic.
  • Employed Hibernate to store the persistent data as an Object-Relational Mapping (ORM) too for communicating with database.
  • Used Web services for sending and getting data from different applications using REST.
  • Developed JSP pages using Custom tags and Tiles framework and Struts framework.
  • Deployed the complete Web applications in WebSphere Application server.
  • Developed Shell Scripts UNIX and Perl programs for data integrity with Mainframe
  • Used ANT tool for building and packaging the application.
  • Used Log4j to capture the log that includes runtime exception and for logging info and are helpful in debugging the issues.

Environment: MVC, Struts, J2EE, JavaScript, AngularJS, HTML5, AJAX, CSS3, XML, Java 6, Hibernate, JUnit


Java Developer


  • Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
  • Involved in designing & developing web-services using SOAP and WSDL .
  • Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
  • Designed and developed user interface using HTML , CSS and JavaScript .
  • Developed POJO objects and used Hibernate as the Object-Relational Mapping (ORM) tool to access the persistent data from SQL Server.
  • Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and Spring frameworks.
  • Involved in development of presentation layer using JSP and Servlets with Development tool Eclipse IDE.
  • Worked on development of Hibernate, including mapping files, configuration file and classes to interact with the database.
  • Extensively worked on JavaScript and JQuery libraries for form Validations and for the other interactive features.
  • Implemented caching in hibernate to improve performance by caching the result of queries.
  • Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
  • Developed the ANT Script for building the application and deploying on JBoss Application Server.
  • Worked on AngularJS to create Controllers, services and used AngularJS filters to filter the functionality in the search box.
  • Involved in analyzing the Maven dependency management in the base code to annotate dependencies on to Spring Boot application for Micro Services.
  • Used JUnit and Mockito framework for unit testing of application and Log4j to capture the log that includes runtime exceptions.
  • Extensively worked with Bootstrap and Responsive design to make the application responsive.
  • Extensively Worked on Restful Web Services to get the JSON Object and to use that JSON object to display the result on the webpage.
  • Used Jenkins as the integration tool and improved scalability of applications on cross-platforms.
  • Used Maven as a Build Process and used generating Documentation, Reporting, and adding Dependencies.
  • Developed test scripts in Selenium Web Driver using Java Language. Developed testing using JUnit.
  • Used JIRA for resolving bugs/defects in application by coordinated with team members of the project.
  • Used GIT as a version control system and deployed the application in Production environment.
  • Implemented Java multi-threading and thread handling issues during application development.
  • Performed logging of the entire debug, error and warning at the code level using log4j.
  • Involved in developing object oriented JavaScript and experienced with AJAX, JQuery, HTML, Angular.js, Node.js and CSS.
  • Developed fully automated continuous integrated system using Git, Jenkins and custom designed tools developed.

Environment: JSF, JSP, POJO, HTML4, CSS, JavaScript, MVC, Eclipse 3.2, Hibernate, JQuery, ANT, AngularJS, Maven, JSON, Java

Hire Now