We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Developer Resume

Hartford, CT


  • Overall 9+ years of IT experience in software analysis, design, development and implementation of Big Data, Hadoop and Java/J2EE technologies.
  • In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, Elastic Search and Zookeeper.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.
  • Experience on Unit testing using JUnit, TDD, and BDD.
  • Experience in modeling applications with UML, Rational Rose and Rational Unified Process (RUP).
  • Experience in using CVS and Rational Clear Case for version control.
  • Good Working Knowledge of Ant & Maven for project build/test/deployment, Log4j for logging and JUnit for unit and integration testing.
  • Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables
  • In depth knowledge of Spark concepts and experience with Spark in Data Transformation and Processing.
  • Hands on experience working on NoSQL databases including HBase, Cassandra and its integration with Hadoop cluster.
  • Good experience in ETL tool Informatica, Managing/maintaining the Hadoop cluster with the help of Apache Ambari
  • Worked on migration project from Oracle DB to Hadoop environment thus enhancing the business to next level.
  • Installed and configured Hive, HDFS and the NiFi, implemented CDH cluster. Assisted with performance tuning and monitoring.
  • Expertise in web development applications using Core Java, Servlets, JSP, EJB, JDBC, XML, XSD, XSLT, RMI, JNDI, Java Mail, XML Parsers (DOM and SAX), JAXP, JAXB, Java Beans etc.
  • Good Understanding of RDBMS through Database Design, writing queries using databases like Oracle, SQL Server, DB2 and MySQL.
  • Good Experienced in developing User interfaces using JSP, HTML, DHTML, CSS, Java Script, AJAX, JQuery and Angular JS
  • Implementing database driven applications in Java using JDBC, XML API and using hibernate framework.
  • Expertise in using J2EE Application Servers such as Web Logic 10.3, Web sphere 8.2 and Web Servers such as Tomcat 6.x/7.
  • Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
  • Extensive experience in developing Pig Latin Scripts for transformations and using Hive Query Language for data analytics.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of.
  • Expertise in writing Apache Spark streaming API on Big Data distribution in the active cluster environment.
  • Experience in working with MapReduce programs, Pig scripts and Hive commands to deliver the best results.
  • Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
  • Experienced with IBM Web Sphere Application Server, Oracle Web Logic application servers and Apache Tomcat Application Server.


Hadoop Ecosystem: Hadoop3.0, MapReduce, Sqoop, Hive 2.3, Oozie, Pig 0.17, HDFS1.2.4, Zookeeper, Flume 1.8, Impala 2.1, Spark 2.2, Storm, Hadoop (Cloudera), Hortonworks and Pivotal).

NoSQL Databases: HBase 1.2, MongoDB 3.6 & Cassandra 3.11

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

Programming Languages: Java, Python, SQL, PL/SQL, HiveQL, Unix Shell Scripting, Scala 2.12

Cloud Platform: AWS EC2, AWS Configured and S3, Microsoft Azure.

Methodologies: Agile, RAD, JAD, RUP, Waterfall & Scrum

Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014

Web/ Application Servers: WebLogic, Tomcat, JBoss

Web Technologies: Html5, CSS3, XML, JavaScript, JQuery, AJAX, WSDL, SOAP

Tools: and IDE: Eclipse, NetBeans, Maven, DB Visualizer, SQL Server Management Studio

Version Control Tools: SVN, GIT, GITHUB, TFS, CVS and IBM Rational Clear Case


Confidential - Hartford, CT

Sr. Big Data/Hadoop Developer


  • Extensively involved in Design phase and delivered Design documents in Hadoop eco system with HDFS, HIVE, PIG, SQOOP and SPARK with SCALA.
  • Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Kafka.
  • Worked with clients to better understand their reporting and dash boarding needs and present solutions using structured Agile project methodology approach.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
  • Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Worked with the Apache Nifi flow to perform the conversion of Raw data into ORC.
  • Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
  • Exported the analyzed data to the NoSQL Database using HBase for visualization and to generate reports for the Business Intelligence team using SAS.
  • Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
  • Created Hive tables as per requirement as internal or external tables, intended for efficiency.
  • Developed MapReduce programs for the files generated by hive query processing to generate key, value pairs and upload the data to NoSQL database HBase.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Experienced in pulling the data from Amazon S3 bucket to Data Lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in loading data from UNIX file system to HDFS using Flume and HDFS API.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
  • Involved unit testing, interface testing, system testing and user acceptance testing of the workflow tool.
  • Used JIRA for bug tracking and GIT for version control.
  • Involved in the high-level design of the Hadoop architecture for the existing data structure and Business process
  • Extensively worked on creating an End-End data pipeline orchestration using Nifi
  • Developed scalable data pipelines to process data from multiple sources in real time using Kafka, Nifi and Spark streaming.
  • Part of Configuring & deployment of Hadoop Cluster in the AWS cloud.
  • Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
  • Involved in loading disparate datasets into Hadoop Data Lake, this would be available to the data science team to predict the future.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
  • Worked with Elastic MapReduce (EMR) and setting up environments on Amazon AWS EC2 instances.

Environment: Apache Hadoop 3.0, AWS, MLlib, MYSQL, Kafka, HDFS 1.2, Hive 2.3, Pig 0.17, MapReduce, Flume 1.8, Cloudera, Oozie, UNIX, Oracle 12c, Tableau 7, GIT, UNIX.

Confidential - Chicago, IL

Sr. Big data/Hadoop Engineer


  • Responsible for manage data coming from different sources. Storage and Processing in Hue covering all Hadoop ecosystem components.
  • Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Involved in developing Stored Procedures for fetching data from GreenPlum and created workflow using Apache Nifi.
  • Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Wrote extensive Map reduce jobs in java to train the cluster and developed Java map reduce programs for the analysis of sample log files stored in cluster.
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
  • Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's written in Java or Python
  • Involved in creating Data Vault by extracting customer's Big Data from various data sources into Hadoop HDFS.
  • This included data from Excel, Flat Files, Oracle, SQL Server, MongoDB, Cassandra, HBase, Teradata, Netezza and also log data from servers.
  • Designed Data flow to pull the data from Rest API using Apache Nifi with SSL context configuration enabled.
  • Involved in integrating HBase with Spark to import data into HBase and also performed some CRUD operations on HBase.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Experienced on MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for Input and Output.
  • Used the Teradata fast load/Multi load utilities to load data into tables.
  • Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
  • Involved in creating Data Vault by extracting customer's Big Data from various data sources into Hadoop HDFS
  • Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
  • Developed Shell and Python scripts to automate and provide Control flow to Pig scripts.
  • Worked on designing NoSQL Schemas on HBase.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Used HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Automated workflows using shell scripts and Control-M jobs to pulldatafrom various databases into HadoopDataLake.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Performed data validation against source system data for analyzing the existing database source files and tables to ingest data into Hadoop Data Vault.
  • Used AWS to produce comprehensive architecture strategy for environment mapping.
  • Implemented Spark RDD transformations, actions to migrate MapReduce algorithms
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.

Environment: Hadoop 3.0, AWS, HDFS, Pig, Hive 2.3, MapReduce, AWS S3, Scala 2.1, Sqoop, SparkSQL, Spark Streaming, Spark LINUX, Teradata 14, Oracle 11g, Java, Python.

Confidential - Merrimack, NH

Sr. Java/Hadoop Developer


  • Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC) using Agile software development methodology.
  • Used Rational Rose for developing Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Extracted files from Cassandra through Sqoop and placed them in HDFS and processed them.
  • Performed data modeling to connect data stored in Cassandra DB to the data processing layers and wrote queries in CQL.
  • Implemented Model View Controller (MVC) architecture using Spring Framework.
  • Worked onJavaBeans and other business components for the application and implemented new functionalities for the ERIC application.
  • Developed various SQL queries and PL/SQL Procedures in Oracle db for the Application
  • Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs injava fordatacleaning and preprocessing.
  • Involved in implementation of the presentation layer (GUI) for the application using JSF, HTML4, CSS2/3 and JavaScript.
  • Used log4j to log the messages in the database.
  • Performed unit testing using JUNIT framework.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
  • Used Hibernate to access the database and mapped different POJO classes to the database tables and persist the data into the database.
  • Used Spring Dependency Injection to set up dependencies between the objects.
  • Developed Spring-Hibernate and struts integration modules.
  • Developed Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting
  • Implemented scripts for loading data from UNIX file system to HDFS.
  • Integrated Struts application with Spring Framework by configuring Deployment descriptor file and application context file in Spring Framework.
  • Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Automated all the jobs from pulling data from databases to loading data into SQL server using shell scripts.
  • Developed integration services using SOA, Mule ESB, Web Services, SOAP, and WSDL.
  • Designed UI screens using JSP 2.0 and HTML.Using JavaScript for client side validation.
  • Actively involved in designing and implementing Singleton, MVC, and Front Controller and DAO design patterns.

Environment: Hadoop, Hive, HDFS, Sqoop, Spark, Java, Hibernate 4.0, Oracle 10g, HTML3, CSS2/3, SQL Server 2012, Spring 3.1 framework, Spring Model View Controller (MVC), Servlets 3.0, JDBC4.0, AJAX, Web services, Rest full, JSON, JQuery, JavaScript

Confidential - Buffalo, NY

Sr. Java/J2EE Developer


  • Analysis and understanding of business requirements and implement the process using Agile (Scrum) methodology
  • Followed Test driven development of Agile Methodology to produce high quality software.
  • Used Hibernate as ORM tool to store the persistence data into the MySQL database.
  • Developed application using Spring MVC, JSTL (Tag Libraries) and AJAX on the presentation layer, the business layer is built using spring and the persistent layer uses Hibernate.
  • Developed Web services for consuming Stock details and Transaction rates using JAX-WS and Web services Template.
  • Developed PL/SQL stored procedures and extensively used HQL.
  • Used Spring to develop light weight business component and Core Spring framework for Dependency injection.
  • Developed the project using Waterfall methodologies and Test Driven Development.
  • Code review with the Clients using SmartBear tool.
  • Developed the presentation layer and GUI framework based on spring framework involving JSP, HTML, JavaScript, AJAX, CSS.
  • Designed and developed a Batch process to for VAT.
  • Followed Test Driven Development (TDD), Scrum concepts of the Agile Methodology to produce high Quality Software.
  • Actively participated in development of user interfaces and deploying using web logic Application server.
  • Developed the J2EE application based on the Service Oriented Architecture
  • Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
  • Used Hibernate to access Oracle database for accessing customer information in this application.
  • Used Maven script to create WAR and EAR files to work on Defects/Bug fixes as per Weekly Sprint Planning
  • Worked on developing the REST web services and integrating with them from the front-end.
  • Designed and developed the communication tier to exchange data through JMS & XML over HTTP.
  • Used Object-oriented development techniques such as UML for designing Use case, Sequence, Activity and Class and Object diagrams.
  • Configured different layer (presentation layer, server layer, persistence layer) of application using Spring IOC and maintained the Spring Application Framework's IOC container.
  • Implemented Java classes to read data from XLS and CSV Files and to store the data in backend tables using Web Frame APIS.
  • Configured faces-config.xml and navigation.xml to set all page navigations and created EJB Message Driven Beans to use asynchronous service to perform profile additions.
  • Used various CoreJavaconcepts such as Exception Handling, Collection APIs to implement various features and enhancements.
  • Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a WebLogic Application server.
  • Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP, DHTML and JavaScript.
  • Used CVS as version control system for the source code and project documents.

Environment: Core Java1.5, JSP2.1, JQuery, JavaScript, AJAX, HTML, CSS, XML, WSDL2.0, SOAP, JAX-WS, Struts 2.0 Springs Framework, Struts Tiles, Spring2.5, Hibernate 3.5, SOA, EJB 2.0, MDB, JMS, RAD, WSAD 6.1, DB2, Ivy, UML, Rational Rose, UNIX, Log4j, JUnit, Ant, JSF.


Java Developer


  • Designed and developed User Interface of application modules using HTML, JSP, CSS, JavaScript (client side validations), JQuery and AJAX.
  • Worked on technologies such as HTML, CSS, JavaScript, Core Java, JDBC and JSP.
  • Worked on eclipse with Tomcat Apache for development.
  • Used SOAP (Simple Object Access Protocol) for web service by exchanging XML data between the applications.
  • Implemented Singleton, Factory design pattern, DAO Design Patterns based on the application requirements.
  • Designed and developed the communication tier to exchange data to Xpress Services through JMS & XML over HTTP.
  • Developed Unit test cases using JUnit and Mock Objects
  • Modifying and migrating existing applications for fine-tuning and performance improvements
  • Developed the web interface using MVC design pattern with Struts framework.
  • Involved in developing the presentation layer using JSF along with JSP,JavaScripts, Ajax, CSS, and HTML.
  • Implemented MVC architecture to develop web application using Struts framework.
  • Implemented Ajax and JavaScript for front-end data validation.
  • Involved in Designing the Database Schema and writing the complex SQL queries.
  • Participated in the design and development of database schema and Entity-Relationship diagrams of the backend Oracle database tables for the application.
  • Involved in the development of backend Logics or data access logic using Oracle DB & JDBC.
  • Designed and developed the front end using HTML, CSS, and JavaScript and Ajax and tag libraries.
  • Extensively developed stored procedures, triggers, functions and packages in oracle SQL, PL/SQL.
  • Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
  • Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies
  • Involved in writing JSP, JavaScript and Servlets to generate dynamic web pages and web content.
  • Developed EJB for processing the Business logics and to provide data persistence in the application
  • Responsible for developing Use Case, Class diagrams and Sequence diagrams for the modules using UML and Rational Rose.

Environment: Java, J2EE, JSP, Servlets, Spring, Web Services, XML, Hibernate, JMS, ExtJS, IBM WebSphere, RAD 5.6, Oracle 8i/9i, HTML, CSS, Maven, Junit, Log4j, JavaScript, XML/XSL.

Hire Now