We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Blue Bell, PA


  • 7+ years of overall IT experience in a variety of industries, which includes hands - on experience in Big Data technologies.
  • 4+ years of comprehensive experience in Big Data processing using Apache Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, HBase, Spark, NoSQL, Oozie, Sqoop, Kafka, ZooKeeper and Flume).
  • In-depth understanding and knowledge of Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager.
  • Knowledge on testing with Big Data Technologies like Hadoop, MapReduce, Hive, Pig, HBase, Kafka and Spark.
  • Hands on experience in installing, configuring and testing ecosystem components like Hadoop MapReduce, HDFS, HBase, ZooKeeper, Oozie, Hive, HDP, Cassandra, Sqoop, PIG, Flume.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase, and custom Map-Reduce programs in Java.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, XML, JSON, and Avro.
  • Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
  • Expertise in writing Map-Reduce Jobs in Java for processing large sets of structured semi-structured and unstructured data sets and stores them in HDFS.
  • Good Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Worked on NoSQL databases such as HBase, Cassandra, MongoDB and its Integration with Hadoop cluster.
  • Experienced in handling different optimization join operations like Map join, Sorted Bucketed Map join, Merge, Update, Delete, HUE etc.
  • Good experience and great knowledge in testing the process for Hadoop based application design and implementation.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Worked with real-time data processing and streaming techniques using Spark streaming, Storm and Kafka.
  • Good knowledge in programming Spark using Scala and Experienced in handling Spark SQL, Streaming and complex analytics using Spark over Cloudera Hadoop YARN.
  • Experienced in handling different file formats like a Text file, Sequence files, and JSON files.
  • Preprocessed and cleansed big data for better analysis.
  • Ability to work effectively with associates at all levels within the organization.
  • Strong background in mathematics and have very good analytical and problem-solving skills.


Big Data Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark(Spark SQL, Spark Streaming)

Hadoop Distributions: Cloudera, HortonWorks, AWS EMR

Languages: Java, PL/SQL, Python, Pig Latin, HiveQL, Scala

IDE Tools: Eclipse, NetBeans, IntelliJ.

Web Technologies: HTML, CSS, JavaScript, XML, RESTful.

Operating Systems: Windows, UNIX, LINUX, UbuntuReporting Tools/ETL Tools: Tableau, Power view for Microsoft Excel.

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB)


Hadoop Developer

Confidential, Blue Bell, PA


  • Provided a solution using Hive, Sqoop (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
  • Maintaining and Monitoring Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries
  • Designed Pig Latin scripts to sort, group, join and filter the data as part of data transformation as per the business requirements.
  • Data files were merged and loaded into HDFS using java code and tracking history related to merging files were maintained in HBase.
  • Collaborate with the Data Warehouse team to design and develop required ETL processes, performance tune ETL programs/scripts.
  • Creating Hive tables and working on them using HiveQL.
  • Written Apache PIG scripts to process the HDFS data.
  • Created Java UDFs in PIG and HIVE.
  • Involved in the analysis of the specifications from the client and actively participated in SRS Documentation.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Developing Scripts and Scheduled Autosys Jobs to filter the data.
  • Implemented near real-time data pipeline using a framework based on Kafka, Spark, and MemSQL.
  • Involved monitoring Autosys file watcher jobs and testing data for each transaction and verified data whether it ran properly or not.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Implemented Object-Relational mapping in the persistence layer using Hibernate Framework in conjunction with Spring Functionality.
  • Involved in the planning process of iterations under the Agile Scrum methodology.
  • Involved in writing PL/SQL, SQL queries.
  • Involved in testing the Business Logic layer and Data Access layer using JUnit.
  • Used Scala to test Dataframe transformations and debugging issues with data.
  • Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
  • Wrote JUnit test cases to test the functionality of each method in the DAO layer. Configured and deployed the WebSphere Application Server.
  • Prepared technical reports and documentation manuals for efficient program development.

Environment: Java, HDP-2.2 YARN cluster, HDFS, Map Reduce, Apache Hive, Apache Pig, HBase, Sqoop, XML. Oracle8i, UNIX, ETL, Spark, Scala.

Hadoop Developer

Confidential, Hartford, CT


  • Analyzed the data using Spark, Hive and produced summary results to downstream systems.
  • Create/Modify Shell scripts for scheduling data cleansing scripts and ETL loading process.
  • Developed Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala (Prototype).
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Implemented Spark using Java for faster testing and processing of data.
  • Implemented near real-time data pipeline using a framework based on Kafka, Spark, and MemSQL.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing
  • Transformations using Hive, Map Reduce and then loading data into HDFS.
  • Used Scala to test Dataframe transformations and debugging issues with data.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Created HBase tables and column families to store the user event data.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, and EMR.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Imported application from AWS Lambda to store data in S3.
  • Worked with AWS Kinesis to analyze real time streaming data, it makes easy to generate reports for customer’s requirement.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.

Environment: Hadoop, Map Reduce, HDFS, HBase, Spark, Hive, Pig, Python, Java, SQL, Scoop, Flume, Oozie, Talend, Unix, Java Script, Maven, MRUnit, SVN, Eclipse, Spark, Scala.

Hadoop Developer

Confidential, Austin, TX


  • Implemented EP Data Lake provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.
  • Involved in moving legacy data from Sybase data warehouse to Hadoop Data Lake and migrating the data processing to the lake.
  • Responsible for creating Datastore, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
  • Created Java based Spark refiners to replace existing SQL Stored Procedures.
  • Created Hive refiners for simple UNIONS and JOINS.
  • Have experience Java-based in executing Hive Queries using Spark SQL that integrates Spark environment.
  • Implemented near real-time data pipeline using a framework based on Kafka, Spark, and MemSQL.
  • Used REST services in Java and Spring to expose data in the lake.
  • Automated the triggering of Data Lake REST API calls using Unix Shell Scripting and PERL.
  • Created reconciliation jobs for validating data between source and lake.
  • Used Scala to test Dataframe transformations and debugging issues with data.
  • Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Used Avro format for staging data and ORC for final repository.
  • Worked on the data modeling service which is our own tool (i.e. PURE MODEL). I have used the data from data lake virtual warehouse and I have exposed the output of data model to java web services and which has been accessed by the end users.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie coordinator jobs.
  • Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
  • Experience in tuning Hive Queries and Pig scripts to improve performance.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Used Eclipse and Ant to build the application.
  • Performed unit testing and integration testing using JUnit framework.
  • Configured build scripts for multi-module projects with Maven and Jenkins.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
  • Designing technical architecture and developed various Big Data workflows using custom MapReduce, Pig, Hive and SQOOP.
  • Responsible for creating Datastore, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
  • Involved in moving legacy data from Sybase ASE data warehouse to Hadoop Data Lake and migrating the data processing to the lake.
  • Built reusable Hive UDF libraries for business requirements which enabled various business analysts to use these UDFs in Hive querying.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Assigned the tasks of resolving defects found in testing the new application and existing applications.

Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, Oozie, Sqoop, HBase, Sybase, Java, Kafka, UNIX, Maven, Junit, SVN, MapR

J2EE Developer

Confidential, Winston Salem, NC


  • Responsible for gathering the Requirements from business users and preparing Technical Requirements documents and maintaining document versions according to Department standards.
  • Involved in the design and development of presentation and web layers based on MVC Architecture (Model-View-Controller) and Struts framework.using spring frame worked.
  • Involved in the development and deployment of the Application using JSP, Struts1.3, JavaScript, JDeveloper 10.1.3, Eclipse 3.0, Tomcat 6.0 and JBoss 4.0.
  • Responsible for design and development of Web Service using AXIS 1.1 Web Services, which communicates with the FoxPro database to retrieve data.
  • Involved in developing interfaces to communicate with other departments using XML, XSLT and DOM Parsers.
  • Responsible for setting up the workspace and initial project framework for project development.
  • Developed UI and Mockup Screens as per the technical requirements and presented to the users for review in the weekly meetings.
  • Introduced Jasper Reports in the application for Reports generation in both PDF and Excel.
  • Responsible for writing efficient stored procedures using SQL, PL/SQL, and Oracle.
  • Developed database connections and complicated queries to communicate with the Oracle database.
  • Responsible for writing Design documents and Junit Unit Test Case documents.
  • Responsible for maintaining code versions in Harvest.

Environment: JAVA 1.4, JavaScript, JSP, CSS, HTML, Struts 1.3, SOA, WSDL, Axis Web Services, Jasper Reports 3.0, Oracle 10g, TOAD 8.6.1, JDeveloper 10.1.3, Eclipse3.0, Apache Tomcat 6.0, JBoss 4.0 and Harvest.

J2EE Developer



  • Responsible for gathering the requirements, prepare functional and technical specifications as per the standard templates.
  • Responsible for coordinating with the offshore team, reviewing the code upon delivery and deploying the code on Web sphere 5.1
  • Involved in the design and development of the presentation and web modules using Java, JSP, Servlets, Ajax, JavaScript, Eclipse3.2 and WSAD5.1 as per MVC Standards.
  • Involved in design and development of different modules using JSF and Spring framework 1.2.
  • Responsible for maintaining the code versions using Rational Clear case.
  • Involved in unit testing, integration testing, and system testing.

Environment: JAVA, JavaScript, J2EE, Servlets, JSF, Spring Framework, JSP, Ajax, HTML, JDBC, XML, Oracle, Web Sphere 5.1, Eclipse 3.0, Rational Clear Case.

Hire Now