We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Washington, DC


  • Around 7+years of professional experience in Information Technology includes 5+years in Big Data and HADOOP Ecosystem related technologies.
  • Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as Big Data and Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop Map Reduce programming paradigm.
  • Experience in analyzing data using HQL, Pig Latin and custom MapReduce programs in Java.
  • Extensive experience in both MapReduce MRv1 and MapReduce MRv2 (YARN).
  • Expertise in Hadoop, MapReduce, Spark - Scala, YARN, Spark Stream, Hive, Pig, HBase, Kafka, Cassandra, Oracle.
  • Hands on experience with major components in Hadoop Ecosystem including Flume, Avro, Oozie, Zookeeper and MapReduce frameworks, Cassandra.
  • Expertise in Designing and developing a distributed processing system running into a Data Warehousing platform for reporting.
  • Worked on creating indexes and working with Indexes using SOLR on HDP.
  • Performed importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in loading streaming log data from various web servers into HDFS using Flume.
  • Performed data analytics using Pig and Hive for Data Architects and Data Scientists within the team.
  • Expertise in writing Map Reduce Programs and UDFs for both Hive and Pig in JAVA.
  • Expertise in job workflow scheduling and monitoring tools like Oozie and Crontab.
  • Worked on NOSQL database like HBase, and Cassandra.
  • Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH3&4 clusters
  • Experience in writing numerous test cases using JUnit framework.
  • Strong Knowledge on full Software Development life cycle -Software analysis, design, architecture, development and maintenance.
  • Worked on relative ease with different working strategies like Agile, Waterfall, and Scrum and Test Driven Development (TDD) methodologies.
  • Excellent experience in designing and developing Enterprise Applications for J2EE platform using JSP, Struts, Spring, Hibernate and Web services.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience with web-based UI development using JQuery, UI, CSS, HTML, HTML5, XHTML and JavaScript.
  • Experienced both in development and deployment of source cord on to Application server using Apache Maven, Jenkins, Nexus, GitHub, SVN, and Puppet.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
  • Excellent Analytical, problem solving and communication skills with the ability to work as a part of the team as well as independently.


Big Data Technologies: HDFS, Hive, Map Reduce, Spark, Pig, Sqoop, Oozie, Zookeeper, YARN, Avro, and Kafka.

Programming Languages: Java, SQL, HQL, Scala, Pig Script.

Front End Technologies: HTML, XHTML, CSS, XML, JavaScript.

Java Frameworks: MVC, Apache Struts2.0, Spring, Hibernate, Angular JS.

Build and Deployment(CICD): Apache Maven, Jenkins, GitHub, SVN, Nexus, Puppet

Application Servers: Apache Tomcat, Weblogic Server.

Databases: Oracle 11g, MySQL, MS SQL Server.

NOSQL Databases: HBase, Cassandra.

IDE: Eclipse, Netbeans, JBuilder.

RDBMS: MS Access, MS SQL Server, IBM DB2, PL/SQL.

Operating Systems: Linux, Windows, Mac



Confidential, Washington DC

Hadoop/Spark Developer


  • Responsible for building scalable distributed data solutions using Apache Hadoop and Spark.
  • Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Developed Hive queries for the analysts.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
  • Managed and reviewed Hadoop log files.
  • Shared responsibility for administration of Apache Spark, Hive and Pig.
  • Built and maintained scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
  • In memory Processing using Spark and run real-time streaming analytics on it.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in the tables in EDW.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Developed and maintained complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML and Web Services.
  • Tested raw data and executed performance scripts.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Used scripting languages JavaScript, Python for client validations.
  • Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Configured Hive bolts and written data to hive in Hortonworks Sandbox as a part of POC.
  • Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/ BI architecture meets the needs of the business unit and enterprise and allows for business growth.

Environment: Hadoop, HDFS, Hive, Oozie, Hortonworks Sandbox, Java, Eclipse LUNA, Zookeeper, JSON file format, Scala, Apache Spark, Kafka.

Confidential, SFO, CA

Hadoop Developer


  • Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
  • Responsible for Installing, Configuring and Managing of Hadoop Cluster spanning multiple racks.
  • Developed Pig Latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
  • Used Sqoop tool to extract data from a relational database into Hadoop.
  • Involved in performance enhancements of the code and optimization by writing custom comparators and combiner logic.
  • Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
  • Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
  • Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
  • Involved in loading data from UNIX file system to HDFS.
  • Experience in developing shell scripts to perform the incremental loads.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Involved in identifying possible ways to improve the efficiency of the system. Involved in the requirement analysis, design, development and Unit Testing use of MRUnit and Junit
  • Prepare daily and weekly project status report and share it with the client.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Hadoop, Java (JDK 1.7), Oracle, My SQL, Hive, Pig, Sqoop, Linux, Cent OS, Junit, MRUnit.

Confidential, Boston, MA

Hadoop Developer


  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in writing MapReduce jobs.
  • Involved in SQOOP, HDFS Put or CopyFromLocal to ingest data.
  • Used Pig to do transformations, event joins, filter bot traffic and some pre­aggregations before storing the data onto HDFS.
  • Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables.
  • Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
  • Used Eclipse and ant to build the application.
  • Involved in using SQOOP for importing and exporting data into HDFS and Hive.
  • Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
  • Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Used Autosys to schedule batch jobs and scripts.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java, Cloudera HDFS, Eclipse.

Confidential, Rochester, MN

Hadoop Developer


  • Coordinated with business customers to gather business requirements.
  • Importing and exporting data into HDFS from database and vice versa using SQOOP.
  • Responsible to manage data coming from different sources.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hbase database and SQOOP.
  • Load and transform large sets of structured and semi structured data.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Analyzed data using Hadoop components Hive and Pig.
  • Involved in running Hadoop streaming jobs to process terabytes of data.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in writing Hive queries for data analysis to meet the business requirements.
  • Worked on streaming the analyzed data to the existing relational databases using SQOOP for making it available for visualization and report generation by the BI team.
  • Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Imported data using SQOOP to load data from MySQL to HDFS on regular basis.

Environment: Hadoop, Hive, Map Reduce, Pig, SQOOP, MYSQL, Hbase, Flume, Spark, Scala, Hortenworks- Sandbox.


Java Developer and Operational Engineer


  • Involved in Analysis, Design, Implementation, and Testing of the project.
  • Implemented the presentation layer with HTML, XHTML, JavaScript, and CSS.
  • Developed web components using JSP, Servlets, and JDBC.
  • Implemented database using MySQL.
  • Involved in fixing defects and unit testing with test cases using Junit.
  • Developed user and technical documentation.
  • Made extensive use of Java Naming and Directory interface (JNDI) for looking up enterprise beans.
  • Developed presentation layer using HTML, CSS, and JavaScript.
  • Developed stored procedures and triggers in PL/SQL.
  • Deployed source code to application server using Jenkins, GitHub, Maven, and Puppet.
  • Database design, writing stored procedures and triggers, writing session and entity beans, JMS client and message driven beans to receive & process JMS messages, JSPs & Servlets using MVC architecture.
  • Deployed the application into WebLogic server.
  • Using Jenkins and Maven built POM.XML files which used to deploy files to Weblogic Server.
  • Used GitHub, Nexus and Puppet while deploying files to Application server.
  • Responsible for Parsing XML data using XML parser and Testing, fixing of the bugs and coding modifications.
  • Involved in writing JUnit test cases and suits using Eclipse IDE.

Environment: Java, JSP, Servlets, JDBC, JavaScript, CSS, MySQL, JUnit, Eclipse, Apache Tomcat, Jenkins, Puppet, Maven.

Hire Now