We provide IT Staff Augmentation Services!

Sr. Big Data/ Hadoop Developer Resume

Philadelphia, PA


  • Over 9+ years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR, etc) to fully implement and leverage new Hadoop features.
  • Install Kafka on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Experience working with Data Frames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
  • Experience with Core Java component Collection, Generics, Inheritance, Exception Handling and Multi - threading.
  • Very good understanding on NoSql databases like MongoDB and HBase.
  • Experience on major components in Hadoop Ecosystem including Hive, Sqoop, Flume &knowledge of MapReduce/HDFS Framework.
  • Hands-on programming experience in various technologies like Java, J2EE, Html, XML
  • A very good experience in developing and deploying the applications using Web logic, Apache Tomcat, and JBoss.
  • Experience in working with Developer Toolkits like Force.com IDE, Force.com Ant Migration Tool, Eclipse IDE, Mavens.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
  • Experience in installation, configuration and deployment of Big Data solutions.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2.
  • Implementing in setting up standards and processes for Hadoop based application design and implementation.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) Yarn Architecture.
  • Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expertise in using XML related technologies such as XML, DTD, XSD, XPATH, XSLT, DOM, SAX, JAXP, JSON and JAXB.
  • Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in implementing spark solution to enable real time reports from Cassandra data.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB.
  • Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
  • Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Build AWS secured solutions by creating VPC with private and public subnets.
  • Extensive experience in Application servers likes Web logic, Web Sphere, JBoss, Glassfish and Web Servers like Apache Tomcat.


Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig 0.17, Hive 2.3, Sqoop 1.4, Apache Impala 3.0, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper 3.4

Hadoop Distributions: Cloudera, Hortonworks, MapR

Cloud: AWS, Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.

Programming Language: Java, Scala 2.12, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets

Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS

Web Technologies: HTML5, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX

Databases: Oracle 12c/11g, SQL

Database Tools: TOAD, SQL PLUS, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

IDE and Tools: Eclipse 4.7, NetBeans 8.2

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Version Control: GIT, SVN, CVS, Maven


Confidential - Philadelphia, PA

Sr. Big Data/ Hadoop Developer


  • Worked as a Sr. Big Data/Hadoop Developer with Hadoop Ecosystems components.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
  • Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
  • Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, Pig, and loaded data into HDFS.
  • Involved in identifying job dependencies to design workflow for Oozie & Yarn resource management.
  • Designed solution for various system components using Microsoft Azure.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
  • Developed and designed data integration and migration solutions in Azure.
  • Worked on Proof of concept with Spark with Scala and Kafka.
  • Worked on visualizing the aggregated datasets in Tableau.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using Sqoop.
  • Implemented MapReduce jobs in Hive by querying the available data.
  • Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
  • Performance tuning of Hive queries, MapReduce programs for different applications.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked on MongoDB, HBase databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts

Environment: Hadoop 3.0, Agile, Pig 0.17, Hbase 1.4.3, Jenkins 2.12, NoSQL, Sqoop 1.4, Impala 3.0.0, Hive 2.3, MapReduce, YARN, Oozie, Microsoft Azure, Nifi, Avro, MYSQL, Kafka, Scala 2.12, Spark, Apache Flume 1.8

Confidential - Greensboro, NC

Hadoop/ Spark Developer


  • Actively involved in designing Hadoop ecosystem pipeline.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked with data science team to build statistical model with Spark MLLIB and Pyspark.
  • Involved in performing importing data from various sources to the Cassandra cluster using Sqoop.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Worked extensively on Apache Nifi to build Nifi flows for the existing Oozie jobs to get the incremental load, full load and semi structured data and to get data from rest API into Hadoop and automate all the Nifi flows runs incrementally.
  • Created Nifi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Developed shell scripts to periodically perform incremental import of data from third party API to Amazon AWS
  • Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Worked on creating data models for Cassandra from Existing Oracle data model.
  • Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
  • Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Worked with spark techniques like refreshing the table and handling parallelly and modifying the spark defaults for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted the data to BI team for generating reports, after the processing and analyzing of data in Spark SQL.
  • Used version control tools like Github to share the code snippet among the team members.
  • Involved in daily Scrum meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Hadoop 3.0, Scala 2.12, Spark, SQL, Hive 2.3, Pyspark, Cassandra 3.11, Oozie, Apache Nifi, AWS, Oracle 12c, RDBMS, HDFS, Oozie 4.3, Hortonworks

Confidential - West Point, PA

Spark Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Worked on loading data into Spark RDD's, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate the Output response.
  • Developed the statistics graph using JSP, Custom tag libraries, Applets and Swing in a multi-threaded architecture
  • Executed many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
  • Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Used Kafka Streams to Configure Spark Streaming to get information and then store it in HDFS.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Performed the migration of Hive and MapReduce Jobs from on-premise MapR to AWS cloud using EMR.
  • Partitioned data streams using Kafka, designed and Used Kafka producer API's to produce messages.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Performed tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Ingested data from RDBMS to Hive to perform data transformations, and then export the transformed data to Cassandra for data access and analysis.
  • Experienced in Core Java, Collection Framework, JSP, Dependency Injection, Spring MVC, RESTful Web services.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Extracted the data from Teradata into HDFS/Dashboards using Spark Streaming.
  • Implemented Informatica Procedures and Standards while developing and testing the Informatica objects.

Environment: Hadoop 3.0, Spark 2.1, Cassandra 1.1, Kafka 0.9s, JSP, HDFS, AWS, EC2, Hive 1.9, MapReduce, MapR, Java, MVC, Scala, NoSQL

Confidential - SFO, CA

Sr. Java/ J2ee Developer


  • Worked as a Java/J2EE Developer to manage data and to develop web applications.
  • Implemented MVC architecture by separating the business logic from the presentation layer using spring.
  • Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case diagrams.
  • Extensively worked on n-tier architecture system with application system development using Java, JDBC, Servlets, JSP, Web Services, WSDL, Soap, spring, Hibernate, XML, SAX, and DOM.
  • Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
  • Developed UI using HTML, CSS, Bootstrap, JQuery, and JSP for interactive cross browser functionality and complex user interface.
  • Developed the application using Servlets and JSP for the presentation layer along with JavaScript for the client side validations.
  • Wrote Hibernate classes, DAO's to retrieve & store data, configured Hibernate files.
  • Developed Service layer interfaces by applying business rules to interact with DAO layer for transactions.
  • Developed various UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams
  • Involved in requirements gathering and performed object oriented analysis, design and implementation.
  • Used various Core Java concepts such as multi-threading, Exception Handling, Collection APIs to implement various features and enhancements.
  • Wrote and debugged the Maven Scripts for building the entire web application.
  • Designed and developed Ajax calls to populate screens parts on demand.
  • Developed user interface using JSP, JSP Tag libraries and Struts Tag Libraries to simplify the complexities of the application.
  • Implemented Business Logic using POJO's and used WebSphere to deploy the applications.
  • Responsible to write complex SQL and HQL queries to retrieve data from the Oracle database.
  • Extensively written unit and integration test cases using mock objects and JUnit.
  • Used XML to transfer the application data between client and server.
  • Used the JDBC for data retrieval from the database for various inquiries.
  • Used for client side validations.
  • Used Spring Framework for MVC for writing Controller, Validations and View.
  • Provided utility classes for the application using Core Java and extensively used Collection package.

Environment: Java, J2EE, MVC, spring 3.0, Hibernate 3.6, Eclipse, HTML, CSS, Bootstrap, JQuery, Maven, Ajax, WebSphere, Oracle 11c, XML, JavaScript


Java Developer


  • As a Java Developer involved in back-end and front-end developing team.
  • Involved in the Software Development Life Cycle (SDLC) including Analysis, Design, Implementation
  • Responsible for use case diagrams, class diagrams and sequence diagrams using Rational Rose in the Design phase.
  • Developed ANT scripts that checkout code from SVN repository, build EAR files.
  • Used XML Web Services using SOAP to transfer information to the supply chain and domain expertise Monitoring Systems.
  • Use Eclipse and Tomcat web server for developing & deploying the applications.
  • Developed REST Web Services clients to consume those Web Services as well other enterprise wide Web Services.
  • Used JavaScript and AJAX technologies for front end user input validations and Spring validation framework for backend validation for the User Interface.
  • Used both annotation based configuration and XML based.
  • Developed application service components and configured beans using (applicationContext.xml) Spring IOC.
  • Implemented persistence mechanism using Hibernate (ORM Mapping).
  • Developed the DAO layer for the application using Spring Hibernate Template support.
  • Used WebLogic workshop, Eclipse IDE to develop the application.
  • Performed the code build and deployment using Maven.
  • Implementation of Spring Restful web services which produces JSON.
  • Responsible for maintaining the code quality, coding and implementation standards by code reviews.
  • Developed the front end of the application using HTML, CSS, JSP and JavaScript.
  • Created RESTFULL APIs using Spring MVC.
  • Used SVN version controller to maintain the code versions.
  • Worked on web applications using open source MVC frameworks.
  • Developed Web interface using JSP, Standard Tag Libraries (JSTL), and Spring Framework.
  • Implemented logger for debugging and testing purposes using Log4j.

Environment: JSON, HTML 4, CSS, XML, Hibernate 3.6, Eclipse, Maven, JUnit, JDBC, ANT, SOAP, Log4j

Hire Now