We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Atlanta, GA


  • 7+years of extensive IT experience with multinational clients which includes 3+years of Hadoop related architecture experience developing Big Data / Hadoop applications.
  • Hands on experience with the Hadoop stack (Map Reduce, HDFS, Sqoop, Pig, Hive, H Base, Flume, Oozie and Zookeeper).
  • Well versed in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
  • Integrating data from multiple database using data warehouses . designed for query and analysis containing historical data derived from transaction data & data from other sources.
  • Well versed with DWH tools like Amazon redshift,Informatica,Oracle & Teradata.
  • Well versed with Data warehousing concepts SCD, capturing data by Slowly Changing Dimensions (SCDs)
  • Experienced with performing real time analytics on No SQL databases like H Base and Cassandra.
  • Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
  • Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
  • Good experience in developing projects using TALEND studio for Big data.
  • Developed Hive/MapReduce/Spark Python modules for ML & predictive analytics in Hadoop/Hive/Hue on AWS.
  • Developed server side and front - end validation using Struts Validation framework and JavaScript.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Analysed large amounts of data sets writing Pig scripts and Hive queries.
  • Used Flume to channel data from different sources to HDFS.
  • Experience with configuration of Hadoop Ecosystem components: Hive Impala, H Base, Pig, Sqoop, Mahout, Zookeeper, and Flume.
  • Supported Map Reduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
  • Experience with Testing Map Reduce programs using MR Unit, J unit and Easy Mock.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing. Predictive analytic using Apache Spark Scala APIs.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Experienced with implementing Web based, Enterprise level applications using J2EE frameworks like springs, Hibernate, EJB, JMS, JSF and Java.
  • Experienced in writing functions, stored procedures, and triggers using PL/SQL.
  • Experienced in working on RDBMS, OLAP, OLTP concepts.
  • Experienced with build tool ANT, Maven and continuous integrations like Jenkins.


Big Data Technologies: HDFS, MapReduce, Pig, Hive, HBase, Sqoop, Flume, Oozie, Hadoop Streaming Zookeeper, Kafka, Impala, Apache Spark, Apache Storm, Scala, YARN, Mahout, Akka, Mango DB, Cassandra

Hadoop Distributions: Cloudera (CDH4/CDH5), Horton Works

Languages: SQL, PL/SQL, PIG-Latin, HQL

IDE Tools: Eclipse, NetBeans, RAD

Framework: Hibernate, Spring, Struts, J unit

Web Technologies: HTML5, CSS3, JQuery, AJAX, Servlets, JSP, JSON, XML, XHTML, JSF, Angular JS

Web Services: SOAP, REST, WSDL, JAXB, and JAXP, AWS

Operating Systems: Windows (XP,7,8), UNIX, LINUX, Ubuntu

Application Servers: J boss, Tomcat, Web Logic, WebSphere, Glass Fish

Databases: Oracle, MySQL, DB2, PostgreSQL, No-SQL Database (HBase, Cassandra)


Confidential, Atlanta, GA

Hadoop Developer


  • Involved in the process of data acquisition, data pre-processing and data exploration of
  • Telecommunication project.
  • In pre-processing phase used spark to remove all the missing data and data transformation to create new features.
  • Using Spark Streaming code Initializing a Spark & Applying transformations and output operations to DStreams. Start & stop receiving data and processing it using streaming Context commands.
  • Well versed with micro-batching framework that uses timed intervals & D- Streams that structure computation as small sets of short, stateless, and deterministic tasks. Building of D-Stream from various data sources such as Kafka, Flume, or HDFS offering many of the same operations available for RDDs with additional operations typical for time operations such as sliding windows.
  • Creating multiple input DStreams to receive multiple streams of data in parallel in the streaming applications therefore creating multiple receivers which simultaneously receive multiple data streams.
  • In data exploration stage used hive and impala to get some insights about the customer
  • Data.
  • Used flume, sqoop, Hadoop, spark and Oozie for building data pipeline.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce
  • Jobs in java for data cleaning and Processing.
  • Used spark and spark-sql to read the parquet data and create the tables in hive using the Python API
  • Implemented AWS provides a variety of computing and networking services to meet the needs of applications
  • Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the Agile Software development methodology.
  • Developed automated workflows for monitoring the landing zone for the files and ingestion into HDFS in Bedrock Tool and Talend.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in defining job flows
  • Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
  • Used Agile Methodology and Scrum Methodology as the development process for the project implementation
  • Used Sqoop tool to load data from RDBMS into HDFS.
  • Worked on real time streaming data received by Kafka and processed the data using Spark and this data was further stored into HDFS cluster using python.
  • Experienced in managing and reviewing Hadoop log files.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Developed front-end screens using JSP, HTML, CSS and JavaScript.
  • Supported Map Reduce Programs those are running on the cluster
  • Cluster coordination services through Zookeeper.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.

Environment: HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera, PL/SQL, UNIX Shell Scripting, and Eclipse.

Confidential,San Francisco, CA

Hadoop Developer


  • Primary responsibilities include building scalable distributed data solutions using Hadoopecosystem
  • Datasets will be loaded from two different sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
  • Installed and configured Hive on the Hadoop cluster.
  • Worked on HBase Java API to populate operational HBase table with Key value.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
  • Experience in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating buckets in S3 and storing data in them.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
  • Developed HIVE scripts for analyst requirements for analysis.
  • Developed java code to generate, compare & merge AVRO schema files.
  • Developed complex MapReduce streaming jobs using Java language that are implemented Using Hive and Pig.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Analysed the data by performing Hive queries (HiveQL) and running Pig Latin scripts to study customer behaviour.
  • Developed Data Cleaning techniques / UDFs using Pig scripts / Hive QL, Map/Reduce.
  • Worked on NoSQL including MongoDB, Cassandra and HBase.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.

Environment: Hadoop, HDFS, Pig, Pig Latin Eclipse, Hive, Map Reduce, Java, Avro, HBase, Sqoop, Storm, LINUX, Cloudera, Big Data, Java, My SQL, NoSQL, MongoDB, Cassandra, JSON, XML, CSV.

Confidential, Tampa, FL

Hadoop Developer


  • Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Debugging/Troubleshoot issues on UDF's in Hive.
  • Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
  • Experience in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Developed client-side validations using JavaScript.
  • Transforming unstructured data into structured data using PIG.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
  • Good experience on Hadoop tools like MapReduce, Hive and HBase.
  • Worked on both External and Managed HIVE tables for optimized performance.
  • Developed HIVE scripts for analyst requirements for analysis.
  • Hands-on experience in using Hive partitioning, bucketing and execute different types of joins on Hive tables and implementing Hive SerDes like JSON and Avro.
  • Worked on Developing custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
  • Maintenance of data importing scripts using Hive and Map reduce jobs.
  • Data design and analysis in order to handle huge amount of data.
  • Cross examining data loaded in Hive table with the source data in oracle.
  • Working close together with QA and Operations teams to understand, design, and develop and end-to-end data flow requirements.
  • Utilising oozie to schedule workflows.
  • Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
  • Storing, processing and analysing huge data-set for getting valuable insights from them.

Environment: Hadoop, HDFS, Pig, Hive, HBase, Map Reduce, Sqoop, Oozie, LINUX, Cloudera, BigData, Java, SQL

Hire Now