We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Richmond, VA


  • Around 8 years of overall experience in a variety of industries including 4+ years of experience in Big Data Technologies (Apache Hadoop stack and Apache Spark), 4+ years of experience in Java Technologies
  • Hands on experience with Cloudera and Hortonworks.
  • Hands on experience in Hadoop Ecosystem components such as Hive, Pig, Sqoop, Flume, Impala, Oozie, Zookeeper, HBase.
  • Strong knowledge of Hadoop Daemons such asHDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
  • Hands on experience in writing Map Reduce programs using Java to handle different data sets using Map and Reduce tasks.
  • Hands on experience with various Apache Hadoop Ecosystems such as Hadoop, Spark, HDFS, MapReduce, YARN, TEZ,HBase, Pig, Hive, Sqoop, Flume, Oozie, and Kafka
  • Hands on experience in writing MapReduce jobs in Java, Pig, and Python
  • Experience in dealing with SQL in Hadoop with Apache Hive
  • Hands on experience in writing Apache Spark SQL and Spark Streaming programming with Scala and Python.
  • Developed multiple Map Reduce jobs to perform data cleaning and preprocessing.
  • Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
  • Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
  • Experienced in optimizing Hive queries by tuning configuration parameters.
  • Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
  • Extensively used Apache Flumeto collect the logs and error messages across the cluster.
  • Experience in implementing Real - Time streaming and analytics using SparkStreaming and Kafka
  • Experience in data ingestion using Sqoop from RDBMS to HDFS and Hive and vice-versa
  • Proficient in Java/J2EE technologies - Core Java, JSP, Java Beans, Java Servlets, Ajax, JDBC, ODBC, Web Services, Swing, Hibernate, Spring, Struts, XML and XSLT
  • Performed data analysis using MySQL, SQL Server Management Studio and Oracle
  • Experience with ETL Tool using Informatica, Talend and SSIS
  • Experience in working with Cloudera (CDH3 & CDH4&CDH5) and Horton works Hadoop Distributions.
  • Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2).
  • Worked with Oozie and Zookeeper to manage the flow of jobs and coordination in the cluster
  • Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
  • Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Storm, Spark, Yarn, Tez.
  • Experience with Restful Services and Amazon Web Services
  • Hands on Experience on Amazon’s EC2, EMR and S3
  • Conversant with Web/Application Servers - Tomcat, Websphere, Weblogic and IIS
  • Experience in writing Maven and SBT scripts to build and deploy Java and Scala Applications
  • Around 2 years’ experience on Spark and Scala
  • Implemented unit testing with Junit and MRUnit
  • Expertise in Web Application Development with JSP, HTML, CSS, JavaScript, ASP .Net, C# .Net and JQuery


Big data Technologies: Hadoop, Map Reduce, HDFS, Hive, Pig, Zookeeper, Sqoop,Oozie, Flume, IMPALA, HBASE, Kafka, Storm

Big Data Frameworks:: HDFS, YARN, Spark

Hadoop Distributions: Cloudera(CDH3,CDH4,CDH5),Horton works, Amazon EMR

Programming Languages: Java, C, C++,shell scripting, Scala

Databases: RDBMS, MySQL, Oracle, Microsoft SQL Server, Teradata, DB2, PL/SQL, CASSANDRA, MongoDB

IDE and Tools: Eclipse, NetBeans, Tableau

Operating System: Windows XP/vista/7, Linux/Unix

Frameworks: Spring, Hibernate, JSF, EJB, JMS

Scripting Languages: JSP & Servlets, JavaScript, XML, HTML, Python

Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss

Methodologies: Agile, SDLC,Waterfall

Web Services: Restful, SOAP

ETL Tools: Talend, Informatica

Others: Solr, Elasticsearch


Confidential, Richmond, VA

Hadoop Developer


  • Imported the retail and commercial data from various vendors into HDFS using EDE process and Sqoop.
  • Designed the Cascading flow setup from the Edgenode to the HDFS(Data lake)
  • Created the cascading code to do several type of data transformations as required by the DA
  • Used the Hue to create external Hive tables on the data in the data imported and on transformed data
  • Developed the code for removing or replacing the error fields in the data fields using cascading
  • Created the custom functions for several datatype conversions, handling the errors in the data provided by the vendor
  • Monitored the cascading flow using the Driven component to ensure the desired result was obtained
  • Optimized a Confidential tool Docs, for importing the data and converting the data into parquet file format post validation.
  • Involved in testing the tool Spark for exporting the data from HDFS to external database in POC
  • Developed the shell scripts for automating the cascading jobs for Control Mschedule.
  • Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC
  • Developed Hive queries to analyze the data according to the customer rating Id for several projects
  • Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading
  • Involved in writing the test cases for the cascading jobs using Plunger framework.
  • Setting up the cascading environment and troubleshooting the environmental issues related to cascading.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts

Environment: MapReduce, HDFS Sqoop, Cascading, LINUX, Shell, Hadoop, Spark, Hive, AWS RedShift, Hadoop Cluster

Confidential, New York, NY

Sr. Hadoop Developer


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data onto HDFS.
  • Hands on experience in writing, executing pig scripts.
  • Hands on experience in writing Pig UDFs.
  • Configured Oozie work flows to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
  • Daily Monitoring of Cluster status and health included Data Node, Job Tracker, Talk Tracker, and Name Node.
  • Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Storm, Spark, Yarn, Tez.
  • Experience with CDH distribution and ClouderaManager to manage and monitor Hadoop clusters.
  • Knowledge on rendering and delivering reports in desired formats by using reporting tools such as Tableau.
  • Worked on debugging, performance tuning of Hive & Pig Jobs
  • Worked on tuning the performance Pig queries
  • Gained experience in managing and reviewing Hadoop log files
  • Created HBase tables to store various data formats coming from different applications
  • Developed ETL Scripts for Data acquisition and Transformation using Talend
  • Extensive experience with Talend source & connections configuration, credentials management, context management
  • Implemented and assisted with Talend installations and Talend Servers setup which including,MDM server
  • Implemented proof of concept to analyze the streaming data using Apache Spark with Scala and Python; Used Maven and SBT for build and deploy the Spark programs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Developed simple to complex MapReduce jobs using Java, Pig and Hive
  • Developed application using Eclipse and used build and deploy tool as Maven
  • Exported the analyzed data to the relational databases using Sqoop for visualization

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Java, Oracle 10g, MySQL, SQL Server, Ubuntu, Agile, SQL Server, YARN, Spark,Hortonworks, Teradata, Talend, UNIX Shell Scripting, Oozie, Maven, Eclipse

Confidential, NY

Hadoop Developer


  • Used Sqoopto extract data from Oracle SQL server and MySQL databases to HDFS
  • Developed workflows in Oozie for business requirements to extract the data using Sqoop
  • Developed MapReduce(YARN) jobs for cleaning, accessing and validating the data
  • Used Hive and Impala to query the data in HBase
  • Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
  • Hive scripts were written in Hive QL to de-normalize and aggregate the data
  • Used Solr for querying and searching the Hbase DB
  • Optimized the existing Hive and Pig Scripts
  • Created external table using Hive to perform analysisin HDFS
  • Involved in loading data from UNIX file system to HDFS
  • Designed workflows by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume
  • Involved in implementing a query to search the clients using their respective fields using Neo4j
  • Developed schemas to handle reporting requirements using Tableau
  • Involved with a team who worked on NoSQL databases like MongoDB for POC (proof of concept) in storing documents using GridFs.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment
  • Worked with application teams to install operating systems, Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, Map Reduce, Hive QL, Hive, HBase, Sqoop, Solr, Flume, Tableau, Impala, Oozie, MYSQL, Oracle SQL, Java, Unix Shell, YARN, Pig Latin.

Confidential, Great Neck, NY

Hadoop and Java Developer


  • Worked as a senior developer for the project
  • Used Enterprise Java Beans as a middleware in developing a three-tier distributed application
  • Developed Session Beans and Entity beans to business and data process
  • Implemented Web Services with REST
  • Developed user interface using HTML, CSS, JSPs and AJAX
  • Client side validation using JavaScript and JQuery
  • Performed client side validation with JavaScript and applied server side validation as well to the web pages.
  • Used JIRA for BUG Tracking of Web application.
  • Written Spring Core and Spring MVC files to associate DAO with Business Layer.
  • Worked with HTML, DHTML, CSS, and JAVASCRIPT in UI pages.
  • Wrote Web Services using SOAP for sending and getting data from the external interface.
  • Extensively worked with JUnit framework to write JUnit test cases to perform unit testing of the application
  • Implemented JDBC modules in java beans to access the database.
  • Designed the tables for the back-end Oracle database.
  • Application hosted under Web Logic and developed utilizing Eclipse IDE.
  • Used XSL/XSLT for transforming and displaying reports. Developed Schemas for XML.
  • Involved in writing the ANT scripts to build and deploy the application.
  • Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
  • Implemented field level validations with AngularJS, JavaScript and JQuery
  • Preparation of unit test scenarios and unit test cases
  • Branding the site with CSS
  • Code review and unit testing the code
  • Involved in unit testing using Junit
  • Implemented Log4J to trace logs and to track information
  • Involved in project discussions with clients and analyzed complex project requirements as well as prepared design documents

Environment: Hive, Pig, HBase, Zookeeper, Sqoop,Cloudera,Java, JDBC, JNDI, Struts, Maven, Trac, Subversion, JUnit, SQL language, spring, Hibernate, Junit, Oracle, XML, Altova XmlSpy, Putty and Eclipse.


Java Developer


  • Access to this site is provided for authorized users.
  • Coding using Java, JSP, and HTML.
  • Developed front end validations using JavaScript and developed design and layouts of JSPs and custom taglibs for all JSPs.
  • Participated in planning and development of UML diagrams like Use Case Diagrams, Object Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
  • Implemented several Test cases using Junit.
  • Implemented the Log4J logging component from Apache into the Application.
  • Made Builds and deployed the same onto Common development test Environment, which is a Web sphere Application server Environment to verify its functional requirements.

Environment: Java, J2EE, Tomcat, JSP and Struts Framework, Eclipse, SQL and Oracle.

Hire Now