We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Bellevue, WA


  • 5 years of professional IT experience and technical proficiency in Big data space with hands on expertise in development on Hadoop Platform and Java.
  • Extensive working experience on Hadoop eco - system components like MapReduce (MRv1, Yarn), Hive, Pig, Sqoop, Oozie.
  • Proficient in writing Map Reduce Programs and using Apache Hadoop Java API for analyzing the structured and unstructured data.
  • Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Name Node, Data Node and Map Reduce concepts and HDFS Framework.
  • Experience with working on cloud infrastructure like Amazon Web Services(AWS)
  • Experience in launching EMR cluster, Redshift cluster,EC2 instances,S3 buckets, Amazon DataPipeline,SimpleWorkflowServices instances.
  • Experience in ingesting streaming data into hadoop using Spark, Storm Framework and Scala.
  • Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing Partitioning and Bucketing, writing and optimizing the HiveQL queries.
  • Experience in writing Pig Latin scripts to sort,group,join and filter the data.
  • Experience in writing UDF’S in java for hive and pig.
  • Worked on UNIX shell scripts as part of the ETL process for implementing business logic and scheduled the jobs using CA7 Scheduler,Oozie Scheduler.
  • Experience in writing customized input formats using Mapreduce, working on various file formats like Avro,XML,JSON files,Log data.
  • Worked with different Hive file formats like RC file, Sequence file, ORC file format and Parquet.
  • Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
  • Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
  • Good knowledge of No-SQL databases-Hbase, Cassandra and MongoDB.
  • Working experience on Pentaho Report Designer and Tableau visualization.
  • Experience in developing applications using Core Java and JSP,Html and CSS.
  • Worked on customizing Log4j.Properties redirecting hive/hbase logs to databases.
  • Good experience working with AWS, Cloudera and Pivotal HD Distribution.
  • Has knowledge on Kafka,Mahout machine learning,R.
  • Comprehensive knowledge of Software Development Life Cycle,Agile methodology, coupled with excellent communication skills.
  • Experience working in both team and individual environments. Always eager to learn new technologies and implement them in challenging environment.
  • Strong analytical and Problem solving skills.
  • Team player with good Inter personnel skills,communication and presentation skills. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines


Hadoop Technologies and Distributions: Apache Hadoop, HDP,Cloudera Hadoop Distribution CDH3, CDH4, CDH5, AWS, Pivotal HD(2.0)

Hadoop Ecosystem: HDFS, Map-Reduce, Hive, Pig, Sqoop, Oozie, Flume,Kafka,Zookeeper,HCatalog,Spark,StormNoSql Databases: Cassandra, MongoDB, HBase

Programming: C,Core Java 7,8, Advanced Java PL/SQL,Shell Scripting

AWS Hadoop Services: S3,EMR,SimpleWorkFlow,DataPipeline,Redshift Database


Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8

Web Servers: Apache Tomcat

ETL: Pentaho Report Designer

BI Tools: Tableau.


Confidential, Bellevue, WA

Senior Hadoop Developer


  • Involved in injesting data into IDW staging directly from BEAM, (an inbuilt component for ingesting real time data into hadoop) using Apache Storm to push data into HDFS.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically to run multiple Hive, shell script and Pig jobs which run independently with time and data availability.
  • Part of the design team of the various generic components such as SCD and Data Validation.
  • Development of the solution for several data ingestion channel and patterns, also involved in production issues.
  • Extensively worked on creating End-End data pipeline orchestration using Oozie.
  • Used Shell scripting for automation of scripts.
  • Worked on QA support activities, test data creation and Unit testing activities.
  • Used HBase in accordance with Hive/Pig as per the requirement.
  • Worked on PIG joins, and Join optimization, processing the incremental data using hadoop.
  • Created oozie jobs using sqoop to export the data from Hadoop toTeradata development.
  • Involved in developing a customized in built tool Data Movement Framework(DMF) for ingesting data from external and internal sources into hadoop using Sqoop,Shell script.
  • Proposed an automated system using Shell script to implement import using sqoop .
  • Worked in Agile development approach and managed the Hadoop teams of various Sprints

Environment: HortonworksDataPlatform Hadoop Platform, HDFS, Hbase,Hive, Java, Sqoop, Oracle,MySQL,Storm .

Confidential, Bentonville, AR

Senior Hadoop Developer


  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Worked on automation of delta feeds from, Teradata using Sqoop, also from FTP Servers to Hive.
  • Involved in exporting data from Hadoop to Greenplum using GPload utility.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous datasources to make it suitable for ingestion into Hive schema for analysis
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analysed the imported data using Hadoop Components
  • Established custom MapReduces programs in order to analyze data and used Pig Latin to clean unwanted data
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side join’s.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Involved in loading data from LINUX file system to HDFS.

Environment: Hadoop, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie.


Hadoop Developer


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig,Hive, and MapReduce.
  • Launching and Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP
  • Managed the Hive database, which involves ingest and index of data.
  • Launching the EMR Cluster and Redshift cluster.
  • Implementing the Amazon EMR (Elastic MapReduce) job to process the data in zip format and converting to Gzip format.
  • Involved in customizing the Input format for zip files(ZipInputFormat).
  • Cleansing and processing the Zip file data in the MapReduce.
  • Creating jar file and uploading into S3 Bucket.
  • Adjustments of delimiters in data using EMR.
  • Creating Datapipeline jobs for automation process.
  • Scheduling the Dataload process into Redshift DB.
  • Monitoring the EMR jobs.
  • Implementing and running the queries in redshift cluster.
  • Implementing autoscaling for Redshift database.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Worked on tuning the performance Pig queries.
  • Experience working on processing unstructured data using Pig and Hive.
  • Worked on evaluating complex business metrics in Pig,Mapreduce.

Environment: Amazon EMR,DataPipeline,,MapReduce(Java), S3, Redshift, Java, Map-Reduce, Hive, Pig,EMR,SWF Java API


Hadoop Developer


  • Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting tools to leverage the performance of System.
  • Collected the logs from the physical machines and integrated into HDFS using Flume.
  • Developed custom MapReduce programs to extract the required data from the logs.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Imported data frequently from Teradata to HDFS using Sqoop.
  • Used Tableau for visualizing and to generate reports.
  • Managing and scheduling Jobs using Oozie on a Hadoop cluster.
  • Experience in Hadoop stack, cluster architecture and monitoring the cluster
  • Involved in defining job flows, managing and reviewing log files.
  • Installed Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs.
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
  • Extracted files from different sources like Teradata,db2 and placed into HDFS using Sqoop and preprocess the data for analysis.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: JDK 1.5, Hadoop, HDFS, Pig, Hive, MapReduce, HBase, Sqoop, Oozie and Flume, Tableau.

Hire Now