We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Beaverton, OR


  • Over 7+ years of professional IT experience including 4+ years in Big data ecosystem related technologies. Expertise in Big Data technologies as consultant, proven capability in project based teamwork and also as an individual developer with good communication skills.
  • Excellent understanding / knowledge of Hadoop architecture and various c omponents such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in working with Hadoop clusters using AWS EMR, Cloudera (CDH5), MapR and HortonWorks Distributions .
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce(MR), HDFS, HBase, Oozie, Hive, Sqoop, Pig and Flume .
  • Hands - on development and implementation experience in Big Data Management Platform (BMP) using HDFS, Map Reduce, Hive, Pig, Oozie, Apache Kite an d other Hadoop related eco-systems as a Data Storage and Retrieval systems .
  • Performed importing and exporting data into HDFS and Hive using Sqoop .
  • Experience in managing and reviewing Hadoop log files.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Extending Hive and Pig core functionality by writing UDFs.
  • Good experience installing, configuring, testing Hadoop ecosystem components.
  • Highly knowledgeable in Writer Comparable, Writer interfaces, Mapper and Reducer abstract classes, Hadoop Data Objects such as IntWritable, ByteWritable, Text objects .
  • Well-experienced Mapper, Reducer, Combiner, Partitioner, Shuffling and Sort process along with Custom Partitioning for efficient Bucketing.
  • Good experience in writing PIG and Hive UDF’s to solve the purpose of util classes.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in installation, configuration, supporting and managing - Cloud Era’s Hadoop platform along with CDH4&5 clusters.
  • Hands on experie nce in Agile and Scrum methodologies .
  • Extensive experience in working with the Customers to gather required information to analyze, provide data fix or code fix for technical problems, and providing Technical Solution documents for the users.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance .
  • Have flair to adapt to new software applications and products, self-starter, have excel lent communication skills and good understanding of business work flow.
  • Expertise in Object-oriented analysis and programming(OOAD) like UML and use of various design patterns
  • Working knowledge in SQL, PL/SQL, Stored Procedures, Functions,Packages, DB Triggers and Indexes.
  • Good experience in design the jobs and transformations and load the data sequentially & parallel for initial and incremental loads.
  • Good experience in using various PDI / Kettle (Pentaho Data Integrator)steps in cleansing and load the data as per the business needs


Languages: C,C++,JAVA, SQL and PL/SQL

Big Data Framework and Eco Systems: Hadoop, MapReduceHive, Pig, HDFS, Zookeeper, Sqoop, Apache Crunch, Oozie and Flume

No SQL: Cassandra, HBase and MemBase

Web Technologies: JavaScript, CSS, HTML, XHTML, AJAX, XML, XSLT

Databases: Oracle 8i/9i/10g/11g, MySQL, PostGre SQL and MS-Access

Operating Systems: Windows XP/2000/NT, Linux, UNIX

Tools: Ant, Maven, TOAD, AgroUML, WinSCP, Putty, Lucene

IDE Tools: Eclipse 4.x, Eclipse RCP, NetBeans 6, Editplus

Version Control Tools: CVS, SVN

ETL Tools: PDI / Kettle (Pentaho Data Integration)



Hadoop Developer

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioural data and purchase histories into HDFS for analysis.
  • Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
  • Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
  • Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting .
  • Experienced in managing and reviewing the Hadoop log files.
  • Working with Apache Crunch library to write, test and run HADOOP MapReduce pipeline jobs.
  • Involved in joining and data aggregation using Apache Crunch.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on Oozie workflow engine for job scheduling.
  • Developed custom implementation for Partioner, Input / Output Formats, Record Reader and Writers.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Monitoring and Debugging Hadoop jobs/Applications running in production.
  • Worked on Providing User support and application support on Hadoop Infrastructure.
  • Reviewing ETL application use cases before on boarding to Hadoop.
  • Worked on Evaluating, comparing different tools for test data management with Hadoop.
  • Helped and directed testing team to get up to speed on Hadoop Application testing.
  • Worked on Installing 20 node UAT Hadoop cluster.
  • Created ETL jobs to generate and distribute reports from MySQL database using Pentaho Data Integration.
  • Created ETL jobs using Pentaho Data Integration to handle the maintenance and processing of data.
  • Created ETL jobs using Pentaho Data Integration to handle the generation and distribution of reports.

Environment: JDK1.6, RedHat Linux, HDFS, Mahout, Map-Reduce, Apache Crunch, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, HBase and Pentaho.


Hadoop Developer

  • Developed Map Reduce jobs in java for data cleansing and pre-processing.
  • Moving data from Oracle to HDFS and vice-versa using SQOOP .
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked with different file formats and compression techniques to determine standards
  • Developed Hive queries and UDF’s to analyze/transform the data in HDFS.
  • Developed Hive scripts for implementing control tables logic in HDFS.
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
  • Developed Pig scripts and UDF’s as per the Business logic.
  • Importing log files using Flume into HDFS and load into Hive tables to query data.
  • Developed pig scripts to convert the data from Avro to Text file format.
  • Developed hive scripts for implementing control tables logic in HDFS.
  • Developed Sqoop commands to pull the data from Teradata.
  • End to End implementation with AVRO and Snappy .
  • Analyzing/Transforming data with Hive and Pig .
  • Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
  • Designed and developed read lock capability in HDFS.
  • Implemented Hive UDF Float equivalent to the Oracle Decimal .
  • Involved in End to End implementation of ETL logic.
  • Effective coordination with offshore team and managed project deliverable on time.
  • Worked on QA support activities, test data creation and Unit testing activities.

Environment: JDK1.6, CDH4.x, Hive, Pig, MapReduce, Oozie, Oracle, Sqoop, Flume, Avro.

Confidential, NC

Hadoop Developer

  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Designed and developed Oozie workflows for sequence flow of jod execution.
  • Mainly working on handling of BigData Analytics and infrastructure of Hadoop, MapReduce
  • Got good experience with NoSQL database.
  • Performed Map Reduce Programs those are running on the cluster.
  • Installed and configured Hive and also written Hive UDFs.
  • Implemented CDH3 Hadoop cluster.
  • Installing cluster, monitoring/administration of cluster recovery, capacity plan ning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Business Intelligence (BI) team.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Writing Hadoop MR programs to get the logs and feed into Cassandra for Analytics purpose
  • Building, packaging and deploying the code to the Hadoop servers.
  • Unix Scripting to manage the Hadoop Operation stuffs.
  • Written Puppet program for installation and configuration of Cloudera Hadoop CDH3u1.

Environment: JDK 1.6, Hadoop, MapReduce, HDFS, Hive, Java, SQL,Datameter, PIG, Sqoop, CentOS,Cloudera.


Java/J2EE developer

  • Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in a number of production systems.
  • Normalized Oracle datab ase, conforming to design concepts and best practices.
  • Maintained records in Excel Spread Sheet and exported data into SQL Server Database using SQL Server Integration Services (SSIS).
  • Experience in providing Logging, Error handling by using Event Hand ler, and Custom Logging for SSIS Packages.
  • Resolved product complications at customer sites and funneled the insights to the development and deployment teams to adopt long term product development strategy with minimal roadblocks.
  • Convinced business us ers and analysts with alternative solutions that are more robust and simpler to implement from technical perspective while satisfying the functional requirements from the business perspective.
  • Applied design patterns and OO design concepts to improve the existing Java/JEE based code base.
  • Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.

Environment: Java 1.2/1.3, Swing, Applet, Servlets, JSP, custom tags, JNDI, JDBC, XML, XSL, DTD, HTML, CSS, Java Script, Oracle, DB2, PL/SQL, Weblogic, JUnit, Log4J and CVS.

Hire Now