We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00 Rating

Dearborn, MI


  • Seven years of impeccable work experience in IT Industry, with over Three years of professional work experience in Big DataHadoop (Cloudera distribution CDH3, and 4) on clusters of 300 nodes
  • Extensive experience inMapReduce MRv1.
  • Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
  • Extensive experience in working with HDFS, PIG, Hive, Sqoop, Flume, Oozie, Zookeeper and Cassandra
  • Experience with Cloudera CDH3, CDH4 distributions
  • Extensive experience with ETL and Big data query tools like Pig Latin and Hive QL
  • Expertise in installing,designing, sizing, configuring, provisioning and upgrading Hadoop environments
  • Experience in tuning and troubleshooting performance issues in Hadoop cluster with size of data over 70 TB
  • Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
  • Hands on NoSQL database experience with HBase, and Cassandra
  • Good working knowledge on Eclipse IDE for developing and debugging Java applications
  • Expertise in creating UI using JSP, HTML, XML and JavaScript.
  • Good experience in using databases - SQL Server, Stored Procedures, Constraints and Triggers.
  • Created the data maps from database to dimension and fact table.
  • Carried out the QA deployments and worked on the process flow diagram.
  • Created dimension and fact jobs and scheduling job runs.
  • Well experienced in using networking tools like PuTTY and WinSCP
  • Extensive experience in documenting requirements, functional specifications, technical specifications
  • Highly motivated, adaptive and quick learner
  • Exhibited excellent communication and leadership capabilities
  • Excellent Analytical, Problem solving and technical skills
  • Holds strong ability to handle multiple priorities and work load and also has ability to understand and adapt to new technologies and environments faster


Big Data/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase.

Databases: Microsoft SQL Server, MySQL, Oracle, Cassandra

Languages: C, C++, Java, SQL, PLSQL, Pig Latin, HiveQL

Web Technologies: JSP, JavaBeans, JDBC, XML

Operating Systems: Windows, Unix and Linux

Front-End: HTML, CSS3, JavaScript/Jquery

Development Tools: Microsoft SQL Studio, Eclipse,NetBeans, MySQL Workbench.

Office Tools: Microsoft Office Suite

Development Methodologies: Agile/Scrum, Waterfall


Confidential, Dearborn, MI

Hadoop Developer


  • Worked on a live Big Data Hadoop production environment with 400 nodes
  • Worked with highly unstructured and semi structured data of 40 TB in size
  • Designed and developed Pig ETL scripts to process data in a Nightly batch
  • Created Pig Macros to improve reusability of code and modularizing the code
  • Developed Hive scripts for end user / analyst requirements for ad-hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Worked in tuning Hive and Pig scripts to improve performance
  • Good experience in troubleshooting performance issues and tuning Hadoop cluster
  • Good working knowledge of using Sqoop in performing incremental imports from Oracle to HDFS.
  • Good experience in working with compressed files and related formats.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Hands on experience with Cassandra and its architecture
  • Performed data analysis with HBase using Hive external tables to HBase
  • Very good understanding of Single Point Of Failure (SPOF) of Hadoop Daemons and recovery procedures
  • Worked with the infrastructure and admin team in designing, modeling, sizing and configuring Hadoop cluster of 60 nodes
  • Currently planning a migration from CDH3 to CHD4
  • Good understanding of Impala

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Cassandra, Java, Oracle 10g, MySQL, Ubuntu.

Confidential, Danville, IL

Hadoop Developer


  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
  • Worked on moving all log files generated from various sources to HDFS for further processing
  • Developed workflows using custom MapReduce, Pig, Hive and Sqoop
  • Developing predictive analytic product for using SQL/HiveQL, JavaScript, and High Charts.
  • Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison
  • Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE.
  • Developed the Apache Storm and HDFS integration project to do a real time data analyses.
  • Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
  • Developed Map Reduce program for parsing and loading into HDFS information
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Written Hive UDF to sort Structure fields and return complex data type
  • Responsible for loading data from UNIX file system to HDFS
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
  • Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
  • Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
  • Cluster co-ordination services through Zookeeper
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
  • Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse, Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Flume, PIG, Sqoop, Spark, UNIX.

Confidential, Austin, TX

Hadoop Developer


  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive.
  • Managed and reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Java (jdk 1.6), Eclipse

Confidential, Omaha, NE

ETL QA Tester


  • Tested ETL jobs as per business rules using ETL design document
  • Promoted Unix/Data Stage application releases from development to QA and to UAT environments
  • Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
  • Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
  • Defined data requirements and elements used in XML transactions.
  • Tested the database schema with help of data architects using ERWIN
  • Involved in the testing of Data Mart using Power Center
  • Identified and Documented additional data cleansing needs and consistent error patterns that could diverted by modifying ETL code.
  • Extensively used Teradata load utilities FastLoad, Multiload and FastExport to extract, transform and load the Teradata data warehouse
  • Responsible for different Data mapping activities from Source systems to Teradata.
  • Queried Teradata Database and validated the data using SQL Assistant.
  • Tested the messages published by data stage and data loaded into various databases and for extraction, transformation and loading process
  • Used import and export facilities of the application to download/upload XMLs of failed test cases so as to re-verify.
  • Writing UNIX scripts to perform certain tasks and assisting developers with problems and SQL optimization.
  • Configured Quick Test Pro with Quality Centre and Maintained the project information in Quality Centre.
  • Extensively used Autosys for automation of scheduling jobs on daily, bi-weekly, weekly monthly basis with proper dependencies.
  • Wrote complex SQL queries using joins, sub queries and correlated sub queries
  • Performed Unit testing and System Integration testing by developing and documenting test cases in Quality Center.
  • Designed and developed UNIX shell scripts as part of the ETL process, automate the process of loading, pulling the data.
  • Tested several complex reports generated by Micro Strategy including Dashboard, Summary Reports, Master Detailed, Drill Down and Score Cards
  • Involved in testing the Micro Strategy reports by writing complex SQL queries
  • Involved in extensive DATA validation using SQL queries and back-end testing
  • Tested complex objects to the universe to enhance the report functionality.
  • Responsible for migrating the code changes from development environment to SIT, UAT and Production environments.
  • Validated cube and query data from the reporting system back to the source system.

Environment: Data stage, Flat files, Perl, Erwin 4.0, DTS, MS SQL Server 2008, Oracle 10g, SQL, PL/SQL, IBM DB2 8.0, AGILE, Teradata V2R6, Teradata SQL Assistant, Micro strategy, COBOL, HP QTP 9.0, HP Quality Center 10, Autosys, Toad, Unix Shell Scripting, Windows XP/2000


Java/J2EE Developer


  • Involved in analysis and design of the application.
  • Involved in preparing the detailed design document for the project.
  • Developed the application using J2EE architecture.
  • Involved in developing JSP forms.
  • Designed and developed web pages using HTML and JSP.
  • Designed various applets using JBuilder.
  • Designed and developed Servlets to communicate between presentation and business layer.
  • Used EJB as a middleware in developing a three-tier distributed application.
  • Developed Session Beans and Entity beans to business and data process.
  • Used JMS in the project for sending and receiving the messages on the queue.
  • Developed the Servlets for processing the data on the server.
  • The processed data is transferred to the database through Entity Bean.
  • Used JDBC for database connectivity with MySQL Server.
  • Used CVS for version control.
  • Involved in unit testing using Junit.

Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JBuilder 8.0, JBoss, Swing, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000

We'd love your feedback!