We provide IT Staff Augmentation Services!

Hadoop Developer Resume



  • Over 7+ years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
  • Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS , Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Horton works , and Cloud era ( CDH4, CDH5 ) distributions and on Amazon web services ( AWS ).
  • Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework .
  • Set up standards and processes for Hadoop based application design and implementation.
  • Worked on NoSQL databases including Hbase, Cassandra and MongoDB .
  • Good experience in analysis using PIG and HIVE and understanding of SQOOP and Puppet .
  • Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
  • Experience on ETL development using Kafka, Flume, and Sqoop and Expertise in database performance tuning & data modelling .
  • Developed automated scripts using Unix Shell for performing RUNSTATS , REORG , REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
  • Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data
  • Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
  • Expertise in working with different databases likes Oracle, MS-SQL Server, Postgress , and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
  • Familiarity and experience with data warehousing and ETL tools .
  • Good knowledge on Apache Spark and Scala.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Experience in production support and application support by fixing bugs.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Good knowledge on Teradata Manager, TDWM, PMON, DBQL, SQL assistant and BTEQ.
  • Expertise in creating databases, users, tables, triggers, macros, views, stored procedures, functions, Packages, joins and hash indexes in Teradata database.
  • Extensively worked with Teradata utilities like BTEQ, Fast Export, Fast Load, Multi Load to export and load data to/from different source systems including flat files.
  • Hands on experience using query tools like TOAD, SQL Developer, PLSQL developer, Teradata SQL Assistant and Query man.
  • Expertise in writing large/complex queries using SQL.
  • Proficient in performance analysis, monitoring and SQL query tuning using EXPLAIN PLAN, Collect Statistics, Hints and SQL Trace both in Teradata as well as Oracle.


Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Storm, Spark, Kafka, Hbase, Scoop, Flume, Zookeeper, Oozie, Avro

Apache Hadoop Distributions: Cloud era (CDH3/CDH4/CDH5), Horton Works

Operating systems: Windows, Centos, Ubuntu, Red Hat Linux, Linux, UNIX

Scripting Languages: Java, SQL, Unix Shell Scripting, PL/SQL, Python

Databases/Technologies: DB Oracle, MySQL, Teradata. Oracle 8i/9i/10g, Microsoft NoSQL, SQL Server, Teradata

Java Technologies: JDBC, Servlets, JSP, Spring and Hibernate

IDE Tools: Eclipse, Net Beans

Application Servers: Tomcat, Web Logic, Web Sphere

ETL Tools: Informatica, Pentaho, SSRS, SSIS, Cognos

Frameworks: MVC, Struts, Hibernate, Spring

Development Strategies: Agile, Lean Agile, Scrum, Water Fall and Test Driven Development


Hadoop Developer

Confidential, MN


  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Developed Map Reduce programs to parse the raw data, populate staging tables an store the refined data in partitioned tables in the EDW
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics
  • Responsible in working with Message broker system such as Kafka
  • Worked on Storm Topology integrated with Kafka in order to achieve near to real time processing of various Tests involved in Manufacturing of Trucks.
  • Worked on data warehouse product Amazon Redshirt which is a part of the AWS
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data
  • Implemented and extracted the data from Hbase using SPARK
  • Shared responsibility for administration of Hadoop, Hive and Pig and built wrapper shell scripts to hold this Oozie workflow.
  • Involved in testing the AWS Red shift to connecting with SQL database for testing and storing data in POC
  • Designed and developed data management system using MySQL
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in creating Hadoop streaming jobs using Python.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
  • Familiarity with NoSQL databases including HBase, MongoDB .
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on Map Reduce Joins in querying multiple semi-structured data as per analytic needs.
  • Worked on Performance optimization of Spot fire applications, Spot fire server configuration
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created many Java UDF and UDAFs in hive for functions that were not pre-existing in Hive like the rank, Csum, etc.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Developed POC for Apache Kafka .
  • Gained knowledge on building Apache Spark applications using Scala.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
  • Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Wrote shell scripts for rolling day-to-day processes and it is automated
  • Used Tera data utilities fast load, multiload, tpump to load data.
  • Wrote Teradata Macros and used various Teradata analytic functions .
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.

Environment : Hadoop , Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloud era, Flume, HBase, Spot fire, Zookeeper, CDH3, MongoDB, AWS Red shift, Cassandra, Oracle, NoSQL and Unix/Linux, Spark, Kafka, Amazon web services .

Hadoop Consultant

Confidential, Morris Plains, NJ


  • Installed and configured various components of Cloudera Hadoop ecosystem and maintained their integrity
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Use Spark API over Horton works Hadoop YARN to perform analytics on data in Hive
  • Worked on implementing custom Hive and Pig UDF's to transform large amounts of data.
  • Installed and configured Cloudera Manager and Involved in building a multi node Hadoop cluster.
  • Used Apache flume to ingest log data from multiple sources directly into Accumulo, file roll and HDFS
  • Worked in AWS environment for development and deployment Custom Hadoop Applications
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard
  • Extracted data from mainframes and feed to KAFKA and ingested to HBASE to perform Analytics
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2
  • Worked on NoSQL databases including HBase and Elastic Search
  • Developed multiple POCs using Scala and deployed on the Yarn cluster , compared the performance of Spark, with Cassandra and SQL
  • Involved in extracting customer's Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also log data from servers.
  • Worked on Commissioning and Decommissioning Data Nodes and Task Trackers .
  • Installed and configured Cassandra cluster and CQL on the cluster.
  • Developed multiple Map Reduce jobs in Java for data cleaning and processing.
  • Worked on implementing custom Hive and Pig UDF's to transform large amounts of data.
  • Experience in analyzing data using hive queries, pig scripts and Map Reduce programs.
  • Implemented Oozie workflow engine to manage inter-dependent Hadoop jobs and to automate Hadoop jobs such as Hive, Sqoop and system-specific jobs.
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
  • Worked on creating Key Spaces and loading data on the Cassandra Cluster.
  • Monitored the nodes, the streaming process between nodes during the start up of new nodes and clearing of keys which are no longer used using the node tool utility.

Environment : Linux, Java, Map Reduce, AWS Red shift, HDFS, Oracle, SQL server, Accumulo, Hive, Pig, Sqoop, Cloudera manager, Cassandra

Java/Hadoop developer

Confidential, Fremont, CA


  • Exported data from DB2 to HDFS using Sqoop and Developed Map Reduce jobs using Java API.
  • Installed and configured Pig and also wrote Pig Latin scripts.
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Wrote Map Reduce jobs using Pig Latin and worked on Cluster coordination services through Zookeeper
  • Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven. The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces
  • Programmed ETL functions between Oracle and Amazon Red shift
  • Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Optimized the configuration of Amazon Red shift clusters, data distribution, and data processing.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Implemented various Java/J2EE design patterns such as Singleton, Session Façade, Business Delegate, Value Object, Data Access Object
  • Implementing various advanced join operations using Pig Latin.
  • Involved in loading data from LINUX file system to HDFS.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Used SOAPUI for testing the web service response
  • Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
  • Experience in Document designs and procedures for building and managing Hadoop clusters .
  • Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
  • Successfully loaded files to Hive and HDFS from Mongo DB Solar .
  • Experience in Automate deployment, management and self-serve troubleshooting applications.
  • Define and evolve existing architecture to scale with growth data volume, users and usage.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Installed and configured Hive and also written Hive UDFs.
  • Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.

Environment : Hadoop, HDFS, Hive, Flume, Sqoop, HBase, ORACLE, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Red shift, Java (JDK 1.6)

Java Developer



  • Gathered user requirements followed by analysis and design. Evaluated various technologies for the client.
  • Designed and developed the HTML based web pages for displaying the reports and developed Java classes and JSP files.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
  • Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
  • Extensively used JSF framework.
  • Extensively used XML documents with XSLT and CSS to translate the content into HTML to present to GUI.
  • Involved in development of JavaScript code for client side Validations.
  • Involved in the implementation of business logic in struts Framework and Hibernate in the back-end.
  • J2EE is used to develop the application based on MVC architecture
  • Wrote complex SQL queries and stored procedures Developed Java Mail for automatic emailing and JNDI to interact with the knowledge server.
  • Used Struts Framework to implement J2EE design patterns (MVC).
  • Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
  • Developed Enterprise java Beans like Entity Beans, session Beans (both Stateless and State full Session beans) and Message Driven Beans.

Environment : Java, J2EE, EJB 2.1, JSP 2.0, Servlets 2.4, JNDI 1.2, Java Mail 1.2, JDBC 3.0, Struts, HTML, XML, CORBA, XSLT, Java Script, Eclipse3.2, Oracle10g, Weblogic8.1, Windows 2003.

Java Developer



  • Involved in the analysis, design, implementation, and testing of the project
  • Created the Database , User, Environment, Activity, and Class diagram for the project (UML).
  • Implement the Database using Oracle database engine
  • Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven.
  • The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- RichFaces.
  • Created an entity object (business rules and policy, validation logic, default value logic, security)
  • Experience on working in Agile development following SCRUM process, Sprint and daily stand-up meetings
  • Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
  • Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
  • Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
  • Experience using Version controls such as CVS, PVCS, and Rational Clear Case.

Environment : Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J), web services Using Oracle SOA (BPEl), Oracle Web Logic.

Hire Now