We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York, NY


  • Over 8+ years of experience in overall IT, which includes hands on experience in Big Data technologies.
  • 3 years of experience in installing, configuring, and using Hadoop ecosystem components like MapReduce, HDFS, Hive, Sqoop, Pig, Zookeeper, Oozie and Flume.
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Good understanding of HDFS Designs, Daemons, HDFS high availability (HA).
  • Expertise in data transformation & analysis using PIG, HIVE and SQOOP.
  • Experienced in implementing SPARK using Scala and Python.
  • Configured Zoo Keeper, Cassandra & Flume to the existing Hadoop cluster.
  • Strong understanding of Cassandra database.
  • Experience in importing and exporting data using Sqoop from HDFS & Hive to Relational Database Systems and vice - versa.
  • Worked on NoSQL databases including HBase and MongoDB.
  • Experienced in coding SQL, PL/SQL, Procedures/Functions, Triggers and Packages on database (RDBMS) packages like Oracle.
  • Experience in implementation of Open-Source frameworks like Struts, Spring, Hibernate, Web Services etc.
  • Extensive experience working in Oracle and My SQL database.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Good Knowledge on Spark, Storm and HBase to do real time streaming.
  • In-depth knowledge of Statistics, Machine Learning, Data mining.
  • Experienced with big data machine learning in Mahout and Spark MLlib.
  • Experienced with data cleansing in writing Map Reduce jobs and Spark jobs.
  • Experienced with statistic tools Matlab, R and SAS.
  • Experienced supervised learning techniques like Multi-Linear Regression, Nonlinear Regression, Logistic Regression, Artificial Neural Networks, Support Vector Machine, Decision tree, Random Forest.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
  • Excellent communication skills, interpersonal skills, problem-solving skills, a very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.


Big Data: Hadoop, HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Spark, Sqoop, Oozie, Zookeeper, Impala, Flume

IDE Tools: Eclipse, NetBeans

Java Technologies: Core Java, Servlets, JSP, JDBC, Collections

Web Technologies: XML, HTML, JSON, JavaScript, AJAX, Web services

Programming Languages: C, C++, Python, Core Java, JavaScript, Shell Script, Scala

Databases: Oracle, MySQL, DB2, PostgreSQL, MongoDB

Operating Systems: WindowsXP/Vista/7, Mac OSX, Linux, Unix

Logging tools: Log4j

Version Control System: SVN, CVS, GIT

Other Tools: Putty, WINSCP


Confidential, New York, NY

Hadoop Developer


  • Built a suite of Linux scripts as a framework for easily streaming data feeds from various sources onto HDFS.
  • Wrote Interface specifications to ingest structured data into appropriate schemas and tables to support the rules and analytics.
  • Extracted the data from Teradata into HDFS using Sqoop and exported the patterns analyzed back into Teradata using Sqoop.
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
  • Used hive schema to create relations in pig using HCatalog.
  • Developed a Java MapReduce and pig cleansers for data cleansing.
  • Analyze and visualized data in Teradata using Datameer.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala.
  • Implemented Machine learning Models like K-means clustering using PySpark.
  • Used Spark to create reports for analysis of the data coming from various sources like transaction logs.
  • Involved in migration from Hadoop System to Spark System.
  • Refactored formal Hive queries to Spark SQL.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Used Oozie Operational Services for batch processing and scheduling work flows dynamically.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to cluster.
  • Managed Agile Software Practice using Rally by creating Product Backlog, Iterations and Sprints in collaboration with the Product Team.

Environment: - Hadoop, HDFS, Hive, Pig, MapReduce, YARN, Datameer, Flume, Oozie, Linux, Teradata, HCatalog, Java, Eclipse IDE, GIT.

Confidential, New York, NY

Hadoop Developer


  • Worked on writing various Linux scripts to stream data from multiple data sources like Oracle and Teradata onto the data lake.
  • Built the infrastructure that aims at securing all the data in transit on the data lake. This helps in ensuring at most security of customer data.
  • Extended Hive framework through the use of custom UDF to meet the requirements.
  • Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
  • I have taken the initiative to learn eCDW, an AT&T custom scheduler to schedule periodical run of various scripts for initial and delta loads for various datasets.
  • Played a key role in mentoring the team on developing MapReduce jobs and custom UDFs.
  • I played an instrumental role in working with the team to leverage Sqoop for extracting data from Teradata.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
  • Developed job flows in TWS to automate the workflow for extraction of data from Teradata and Oracle.
  • Was actively involved in building the Hadoop generic framework to enable various teams to reuse some of the best practices.
  • Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
  • Helped the team in optimizing Hive queries.


  • A massive cluster with High Availability of Name Nodes.
  • Cluster to support 400 TB of data keeping growth rate into consideration.
  • Secured cluster with Kerberos.
  • Used LDAP for role based access.
  • Apache Hadoop, MapReduce, HDFS, Hive, Sqoop, Linux, JSON, Oracle11g, PL/SQL, Eclipse, SVN, Teradata Client, TWS (Tivoli Work Scheduler).

Confidential, Louisville, KY.

Hadoop Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in start to end process of hadoop cluster installation, configuration and monitoring.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Setup and benchmarked Hadoop, HBase clusters for internal use.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Used UDF's to implement business logic in Hadoop.
  • Used Hive for managing customer and franchise information in the tables.
  • Imported data using Sqoop into Hive and HBase from existing DB2.
  • Used Ganglia to monitor the distributed cluster network.
  • Used Jira for bug tracking.
  • Used Git to check-in and checkout code changes.

Environment: Java, Hadoop, HDFS, MapReduce, PIG, Sqoop, Hive, HBase, DB2, Eclipse.

Confidential, Jacksonville, FL

Java Developer


  • Involved in requirements gathering and creating functional specifications by interacting with business users.
  • Responsible for analysis, design, development and unit testing.
  • Implemented the application using Spring MVC. Used Spring Batch for handling the parallel processing of batch jobs.
  • Followed MVC and DAO as the design patterns for developing the application. Any database calls are handled only through DAO’s.
  • Improved performance of application using Hibernate for retrieving, manipulating and storing millions of records.
  • Created web Pages using HTML, JSP, JavaScript and Ajax.
  • Unit and Integration testing before check in the code for the QA builds.
  • Involved in Production Support.
  • Used JavaScript for client side validations.
  • Used Log4j and commons-logging frameworks for logging the application flow.
  • Involved in developing build scripts using Ant.
  • Used CVS for Version controlling.
  • Developed JavaScript functions for the front-end validations.

Environment: Apache Tomcat, JSP, Servlets, Ajax, Eclipse, PL/SQL, Oracle, HTML, JavaScript, UML, Windows XP.

Confidential, Dublin, OH

Jr. J2EE Developer


  • Developed User Interface screens using JSP, HTML. Used JavaScript for client side validation.
  • Worked with Onsite and Offshore team to coordinate the knowledge transfer and work.
  • Developed session beans by using EJB's for business logic at the middle tier.
  • Written SQL and stored procedure to extract data model from Oracle enterprise data.
  • Written various Java classes for registrations of users.
  • Used JDBC API to access database.
  • Involved in the generation of reports.
  • Performed build releases planning and co-ordination for QA testing and actual deployment.
  • Participated in unit testing (using JUnit) and integration testing.

Environment: s: Java, JDBC, XML, log4j, Ant, Oracle 9i, TOAD, Solaris, AIX, Windows.

Hire Now