We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Johns Creek, GeorgiA


  • 7+ years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Big Data and Spark related technologies.
  • Hadoop Developer with 5+ years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
  • Hands-on development and implementation experience in Big Data Management Platform (BMP) using HDFS, MapReduce, Hive, Pig and other Hadoop related eco-systems as a Data Storage and Retrieval systems.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Good knowledge on Spark Ecosystem and Spark Architecture .
  • Having good knowledge on Spark Streaming
  • Having Good knowledge on Machine Learning .
  • Experience developing Pig Latin and Hive QL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
  • Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
  • In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
  • Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
  • Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in handling messaging services using Apache Kafka .
  • Experience with migrating data to and from RDBMS into HDFS using Sqoop .
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper .
  • Worked on NoSQL databases including HBase, Cassandra and Mongo DB .
  • Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Experience in working with Java HBase API for ingestion processed data to HBase tables.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN .
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • Solid understanding of Green Plum, Proficient with creation of scalable databases.
  • Proficient in using Cloudera Manager , an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
  • Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3 .
  • Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, spring, Hibernate, Struts, JMS, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
  • Profound knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio and IBM Rational Rose.
  • Strong Experience in working with Databases like Oracle … DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Good experience in development of software applications using Core Java, JDBC, Servlets, JSPs, spring and RESTful Web Services.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers .
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.


Hadoop Eco System: HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Cassendra, Spark, Scala, Zookeeper, Oozie, Cloudera Manager

Hadoop Distributions: Apache Hadoop, Cloudera, Yarn,Hortonworks

Languages: J2SE, J2EE, C/C++, UNIX Shell Scripting

ORMTools: Hibernate, iBATIS

Web Technologies: Servlets, JSPs, AJAX

J2EETechnologies: JDBC, EJB 3.0, JPA, JMS, Web service, JAX-WS, JAX-RS

Frameworks: Struts 1.3, Spring 2.5/3.0, Amazon AWS (EMR).

Scripting Languages: HTML, CSS, JavaScript, DHTML, XML, JQuery

Servers: Weblogic 8.1/9.1/10.3, Web Sphere 7.0/8.0, JBoss 4.0/5.0, Apache Tomcat 6.0/7.0, Jetty Server

IDEs: Eclipse 3.x

Tools: PL/SQL Developer, Poseidon, JAD etc

Databases: My SQL 5.0, Oracle 10g (PL/SQL)

Operating Systems: Windows, Unix/Linux

Bug tracking tools: WPBN, Jira


Confidential, Johns Creek, Georgia

Hadoop/Spark Developer


  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Real time streaming the data using Spark Streaming with Kafka
  • Developed Spark scripts by using Scala as per the requirement.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Performed different types of transformations and actions on the RDD to meet the business requirements.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Also worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best offer logic using Pig scripts and Pig UDFs.
  • Responsible to manage data coming from various sources.
  • Installed and configured Hive and also written Hive UDFs.
  • Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Cluster coordination services through Zookeeper.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Installed and configured Hadoop MapReduce, HDFS.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Installed and configured Pig.
  • Involved in managing and reviewing Hadoop log files.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Responsible for writing Hive queries for data analysis to meet the business requirements.
  • Responsible for creating Hive tables and working on them using Hive QL.
  • Responsible for importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Designed and implemented MapReduce based large-scale parallel relation-learning system.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs.

Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Java 1.8, Oozie, HBase, Kafka, Jerkins, Spark 1.6.0, Scala 2.10.5, Green Plum 4.3 (PostgreSQL), CDH 5.8.2 Eclipse, Linux, Oracle, Teradata.

Confidential, I ndianapolis, IN

Hadoop Developer


  • Installed and configured HadoopMapReduce , HDFS , Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Worked on moving all log files generated from various sources to HDFS for further processing.
  • Developed workflows using custom MapReduce, Pig, HiveandSqoop .
  • Tuned the cluster for optimal performance to process these large data sets.
  • Worked hands on with ETL process. Handled importing data from various data sources, performed transformations
  • Devised and lead the implementation of the next generation architecture for more efficient data ingestion and processing
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real time data Analyses.
  • Designed and developed the Apache Storm topologies for Inbound and outbound data for Real time ETL to find the latest trends and keywords.
  • Written Hive UDF to sort Structure fields and return complex data type.
  • Responsible for loading data from UNIX file system to HDFS.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
  • Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG .
  • Cluster co-ordination services through ZooKeeper .
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
  • HA Implementation of Namenode replication to avoid single point of failure.
  • Involved in troubleshooting issues on the Hadoop ecosystem, understanding of systems capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks
  • Involved in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level
  • Operating system and Hadoop Cluster monitoring using tools like Nagios, Ganglia.
  • Responsible for scheduling jobs in Hadoop using FIFO, Fair scheduler and Capacity scheduler
  • Possess good Linux and Hadoop System Administration skills, networking and familiarity with open source configuration management and deployment tools such as Salt & Ansible

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, UNIX, Cosmos.

Confidential, Atlanta, GA

Hadoop/Spark Developer


  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS .
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop .
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark , Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
  • Involved in creating Hive tables, and loading and analyzing data using Hive queries
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Design and improve internal search engine using Big data and SOLR/Fusion
  • Data migration from various data sources to SOLR via stages according to the requirement
  • Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
  • Developed custom fields, custom Jira Plugins , and validations to implement complex workflows.
  • Extensively worked on Jenkins for continuous integration and for End to End automation for all build and deployments.
  • Involved in preparing JIL's for Autosys jobs.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS ) and later analyzed the imported data using Hadoop Components.
  • Extracted data from Oracle database transformed and loaded into Green Plum database according to the Business specifications.
  • Created Mappings to move data from Oracle , SQL Server to new Data Warehouse in Green Plum.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Good experience with continuous Integration of application using Jenkins
  • Migrated an existing on-premises application to AWS . Used AWS services like EC2 and S3 for small data sets.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS .
  • Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.

Environment: Java, Scala, Apache Spark, Apache Zeppelin, Green Plum 4.3 (PostgreSQL), spring, Maven, Hive, HDFS, YARN, MapReduce, Sqoop, Flume, SOLR, JIRA, UNIX Shell Scripting, Python, AWS, Kafka, Jenkins, Akka.

Confidential - Hartford, CT

Hadoop Developer


  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Written Hive join query to fetch info from multiple tables, written multiple Map Reduce jobs to collect output from Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce

    Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.

  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Performed Filesystem management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Involved in Configuring core-site.xml and mapred-site.xml per the multi node cluster environment.
  • Used Apache Maven 3.x to build and deploy application to various environments
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, Flume, Zookeeper, Java, MySQL, Eclipse, PL/SQL and Python.


Java Developer


  • Involved in design and development phases of Software Development Life Cycle (SDLC).
  • Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
  • Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
  • Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
  • Developed a Dojo based front end including forms and controls and programmed event handling.
  • Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
  • Used Core java and object-oriented concepts.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
  • Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
  • Deployed application on windows using IBM Web Sphere Application Server.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
  • Used Web Services - WSDL and REST for getting credit card information from third party.
  • Used ANT scripts to build the application and deployed on Web Sphere Application Server.

Environment: Core Java, J2EE, Oracle, SQL Server, JSP, JDK, JavaScript, HTML, CSS, Web Services, Windows.


Java Developer


  • Prepared user requirements document and functional requirements document for different modules.
  • Designed the application architecture in lines of Struts Frame work based on MVCII.
  • Architecture with JSP as View, Action Class as Controller and combination of EJBs and Java classes as Model.
  • Used Struts, JSTL, and Struts-eland Tag Libraries.
  • Responsible for designing, writing code in Action Class, Validators, Action forms and developing the system flow for the module using Struts Framework.
  • Involved in coding Session-beans and Entity-beans to implement the business logic.
  • Designed and developed presentation layer using JSP, HTML with client-side form validation by JavaScript and Struts built-in form validations.
  • Used AJAX for asynchronous data transfer (HTTP requests) between the browser and the web server.
  • Used SAX and DOM for parsing XML documents retrieved from different data sources.
  • Prepared SQL script for database creation and migrating existing data to the higher version of application.
  • Installed and configured required software's for application development (Eclipse IDE, oracle database, WebSphere, Tomcat, plugins for eclipse, required framework jars.
  • Developed different Java Beans and helper classes to support Server Side programs.
  • Written test cases for unit testing using JUnit testing Framework.
  • Involved in development of backend code for email notifications to admin users with multi excel sheet using the xml.
  • Involved with responsibility to assist in cleaning the dojo on a daily basis.
  • Involved with the dojo used for different purpose according to the requirement.
  • Modified the existing Backend code for different level of enhancements.
  • Used Axis to implementing Web Services for integration of different systems.
  • Designing error handling flow and error logging flow.
  • Developing build files for the project using ANT build tool.

Environment: Java 1.5, J2EE, JSP, Servlets, Struts 1.3, Dojo, TagLibs, RAD, XML, EJB 3.0, Ant, SQL, CVS, PVCS, Web Services, SOAP, WSDL, MVC, JavaScript, CSS, AJAX, Oracle10g, Web Sphere, Toad, UNIX.

Hire Now