We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Valley Forge, PA


  • IT professional Around 8 years of experience with extensive knowledge and background in Software Development Lifecycle Analysis, Design, Development, Debugging and Deploying various software applications.
  • More than 5 years of hands on experience in Big Data andHadoopEcosystem in ingestion, storage, querying, processing and analysis using HDFS, MapReduce, Pig, Hive, Spark, Flume, Kafka, Oozie etc.
  • 3 years of work experience using JAVA/J2EE technologies.
  • Good experience in developing and deploying enterprise - based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, HBase, Flume, Scoop, Spark, Strom, Kafka, Oozie and Zookeeper.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks and Cloudera (CDH3, CDH4) distributions on Amazon webservices (AWS).
  • Excellent Programming skills at a higher level of abstraction using Scala andSpark.
  • Good understanding in processing of real-time data usingSpark.
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop.
  • Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools likeSparkStreaming, Kafka, Flume, MapReduce, Hive.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
  • Experience in managing and reviewing Hadoop Log files.
  • Experience in setting up Zookeeper to provide coordination services to the cluster.
  • Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experience and understanding in Spark and Storm.
  • Hands Experience on dealing with log files to extract data and to copy into HDFS using flume.
  • Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
  • Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
  • Experience in multiple database and tools, SQL analytical functions, Oracle PL/SQL server and DB2.
  • Experience in Creating ETL/Talend jobs both design and code to process data to target databases.
  • Experience in working with Amazon Web Services EC2 instance and S3 buckets.
  • Worked on different file formats like Avro, Parquet, RC file format, JSON format.
  • Involved in writing Python scripts for building disaster recovery process for current processing data into data center by providing current static location.
  • Hands on experience working on NoSQL databases like MongoDB, HBase, Cassandra and its integration with Hadoop cluster.
  • Experience in ingesting data into Cassandra and consuming the ingested data from Cassandra to HDFS.
  • Used Apache Nifi for loading PDF Documents from Microsoft SharePoint to HDFS.
  • Used Avro serialization technique to serialize data for handling schema evolution.
  • Experience in designing and coding web applications using Core Java&Web Technologies - JSP, Servlets and JDBC, full Understanding of utilizing J2EE technology Stack, including Java related frameworks like Spring, ORM Frameworks (Hibernate).
  • Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
  • Developed web application in open source java framework Spring. Utilized Spring MVC framework.
  • Experienced front-end development using EXT-JS, jQuery, JavaScript, HTML, Ajax and CSS.
  • Have good interpersonal, communicational skills, strong problem-solving skills, explore and adapt to new technologies with ease and a good team member.


Programming Languages: Java, C, Python, Shell Scripting

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Hue, Impala, Sqoop, Apache Spark, Apache Kafka, Apache Ignite, Apache Nifi, OOZIE, FLUME, Zookeeper, YARN

No SQL Databases: MongoDB, HBase, Cassandra

Hadoop Distribution: Hortonworks, Cloudera, MapR

Databases: Oracle 10g, MySQL, MSSQL

IDE/Tools: Eclipse, NetBeans, Maven

Version control: GIT, SVN, CLEARCASE

Platforms: Windows, Unix, Linux

BI Tools: Tableau, MS Excel

Web/Server Application: Apache Tomcat, Web Logic, Web sphere, MSSQL Server, Oracle Server

Web Technologies: HTML, CSS, JavaScript, jQuery, JSP, Servlets, Ajax


Confidential - Valley Forge, PA

Hadoop/Spark Developer


  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, CSV formats.
  • Leveraged Hive queries to create ORC tables.
  • Created Views from Hive Tables on top of data residing in Data Lake.
  • Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
  • Involved in creation and designing of data ingest pipelines using technologies such as ApacheStrom and Kafka.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific job.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using SparkContext, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced with batch processing of data sources using Apache Spark, Elastic search.
  • Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
  • Import the data from different sources like HDFS/HBase into SparkRDD.
  • Involved in converting Map Reduce programs into Spark transformations using SparkRDD's on Scala.
  • Wrote complex SQL to pull data from the TeradataEDW and create Ad-Hoc reports for key business personnel within the organization.
  • Used the version control system GIT to access the repositories and used in coordinating with CI tools.
  • Integrated maven with GIT to manage and deploy project related tags.
  • Experience with AWS S3 services creating buckets, configuring buckets with permissions, logging, versioning and tagging.
  • Implementing a Continuous Integration and Continuous Deployment framework using Jenkins, and Maven in Linux environment.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
  • Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.

Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, GIT, Jenkins, AWS (S3), Python, Java, SQL Scripting and Linux Shell Scripting, Hortonworks.

Confidential, Denver, CO



  • Worked with systems engineering team to plan and deploy newHadoopenvironments and expand existingHadoopclusters with agile methodology.
  • Monitored multipleHadoopclusters environments and monitored workload, job performance and capacity planning using Cloudera Manager.
  • Load and transform large sets of structured, semistructured and unstructured data.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
  • Developed Oozie workflow for scheduling Pig and Hive Scripts.
  • Configured the Hadoop Ecosystem components like YARN, Hive, Pig, HBase and Impala.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, visit duration.
  • Involved in setting QA environment by implementing pig and Sqoop scripts.
  • Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Developed Pig Latin scripts to do operations of sorting, joining and filtering source data.
  • Performed MapReduce programs on log data to transform into structured way to find Customer Name, age group, etc.
  • Pro-actively monitored systems and services, architecture design and implementation ofHadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Executed test cases in automation tool, Performed System, Regression, IntegrationTesting, reviewed result, logged defect.
  • Participated in functional reviews, test specifications and documentation review.
  • Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.

Environment: Cloudera Hadoop Distribution, HDFS, Talend, Map Reduce(JAVA), Impala, Pig, Sqoop, Flume, Hive, Oozie, HBase, Shell Scripting, Agile Methodologies.

Confidential - Richardson, TX

Hadoop Developer


  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive and HBase database.
  • Extracted the data from Oracle into HDFS using Sqoop to store and generate reports for visualization purpose.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
  • Built a data flow pipeline using flume, Java (MapReduce) and Pig.
  • Developed Hive scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Extensive experience in writing Pig scripts to transform raw data into baseline data.
  • Created Hive tables, partitions and loaded the data to analyze using HiveQL queries.
  • Worked on Oozie workflow engine for job scheduling.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Worked on analyzing and writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Leveraged Solr API to search user interaction data for relevant matches.
  • Designed the Solr Schema, and used the Solr client API for storing, indexing, querying the schema fields
  • Loading the data to HBASE by using bulk load and HBASE API.
  • Validated applications Functionality, Usability and Compatibility during Functional, Exploratory, Regression, System Integration Content,UATTestingphases.

Environment: Hortonworks Hadoop Distribution, MapReduce, HBase, Hive, Pig, Sqoop, Oozie, Flume, Solr, Shell script.


Java/J2EE Developer


  • Analyzed project requirements for this product and involved in designing using UML infrastructure.
  • Interacting with the system analysts & business users for design & requirement clarification.
  • Extensive use of HTML5 with Angular JS, JSTL, JSP, jQuery and Bootstrap for the presentation layer along with JavaScript for client-side validation.
  • Taken care ofJavaMultithreading part in back end components.
  • Developed HTML reports for various modules as per the requirement.
  • Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
  • Created multiple RESTful web services using jersey2 framework.
  • Used Aqua Logic BPM (Business Process Managements) for workflow management.
  • Developed the application using NOSQL on MongoDB for storing data to the server.
  • Developed complete business tier with state full sessionJavabeans and CMPJavaentity beans with EJB 2.0.
  • Developed integration services using SOA, Web Services, SOAP and WSDL.
  • Designed, developed and maintained the data layer using the ORM framework in Hibernate.
  • Used Spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated Spring with JSF.
  • Responsible for managing the Sprint production test data with the help of tools like Telegance, CRMetc for tweaking the test data during the IAT /UATTesting.
  • Involved in writing Unit test cases using JUnit and involved in integration testing.

Environment: Java, J2EE, HTML, CSS, JSP, JavaScript, Bootstrap, AngularJS, Servlets, JDBC, EJB,JavaBeans, Hibernate, Spring MVC, Restful, JMS, MQ Series, AJAX, WebSphere Application Server, SOAP, XML, MongoDB, JUnit, Rational Suite, CVS Repository.

Hire Now