We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

San Francisco, CA


  • 6 years of professional IT experience which includes 3 years of experience in Big data ecosystem and related technologies and 3 years of experience as Core Java Developer.
  • Experience in Hadoop Ecosystem providing and implementing solutions for Big Data Applications with good knowledge of Hadoop architecture.
  • Good understanding of Hadoop architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts and HDFS Framework.
  • Good understanding of MapReduce2 with YARN framework such as Resource Manager, Node Manager, Application Master.
  • Experience in importing/exporting from RDBMS to HDFS system using Sqoop in different formats and vice versa.
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume.
  • Experience in implementing Mappers, Reducers, Combiners and partitioners to deliver the best results for the large datasets.
  • Experience in analyzing data using HiveQL
  • Experience in writing Pig Latin scripts, working with grunt shells.
  • Experience in working with Spark RDD, Spark-SQL, Data Frame using Scala.
  • Good understanding of different File Formats like Avro, Parquet, Json, ORC, CSV, Sequence File.
  • Extensively worked with Cloudera Distribution Hadoop 5.x.
  • Good hands on experience on concepts of Core Java.
  • Strong work experience on Eclipse, MS SQL 2008 and SQL Queries.
  • Well-developed skills in testing, debugging and troubleshooting different types of technical issues.


Big Data: Hadoop HDFS, MapReduce2, YARN, Hive, Pig, Flume, Scala, Apache Spark Core, Spark SQL, Sqoop, Impala, Oozie.

File Formats: Text, Sequence, JSON, ORC, AVRO, and Parquet

Database: MySQL, SQL Server 2008 R2Tools: SSMS, Maven

Hadoop Distribution: CDH 5.x

IDE: Eclipse

Programming Language: Java, Scala

Operating Systems: Windows 7/8, Centos


Confidential, San Francisco, CA

Hadoop/Spark Developer

Technologies: MapReduce, HDFS, Sqoop, Flume, LINUX, Pig, Hive, Spark Core, Spark SQL, Oozie, Impala


  • Created Sqoop jobs to import data from Transaction Data mart(oracle) to HDFS, Hive in Text File Format for further processing.
  • Involved in collecting and aggregating large amounts of logs into HDFS using Flume.
  • Created Sqoop jobs to Export the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • MapReduce Programming/optimization in core java on data imported into HDFS.
  • Designed Hive Internal and External tables in ORC format as per business requirements. implemented Hive partitioning and bucketing to improve query performance.
  • Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
  • Developed PIG Scripts for transactions and web logs Cleansing and used HiveQL for Web logs analysis.
  • Worked on developing customized UDF's in java to extend Hive and Pig functionality.
  • Using Scala for programming in Spark.
  • Working on converting some of the existing Spark applications developed in Spark RDD into Spark DataFrames.
  • Working on converting Hive Queries into spark transformations using Spark SQL.
  • Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
  • Working on the spark job optimizations.
  • Using Oozie Workflow and Coordinator for Scheduling Sqoop, Map Reduce, Pig and Hive actions.
  • Cloudera Manger was used to Monitor the Jobs which are running on the CDH cluster.
  • Extensively worked with Cloudera Distribution Hadoop 5.x.


Core Java/Hadoop Developer

Technologies: Core Java, SQL Server 2008, Postilion Real-time Framework, Python, unit, XML, Perforce, My SQL, HDFS, Map Reduce, Hive, Sqoop


  • Involved in analysis, design and development of the system components.
  • Core Java Developer, worked on Eclipse and SSMS to migrate Java and SQL respectively.
  • Used Junit framework and Test Harness for unit testing of Java Components.
  • Component development using Postilion Real-time Framework.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Wrote Map Reduce jobs to transform large sets of structured, semi-structured and unstructured data.
  • Importing and exporting data into HDFS from MySQL and vice versa using Sqoop.
  • Designed internal and external Hive tables to load data to and from external tables.
  • Worked as a code reviewer to check the design, vulnerability and scalability of the code.
  • Worked on Automation to reduce manual efforts.
  • Documented Initial Analysis Documents, Functional Specifications and WBS.
  • Documented Requirement Traceability Matrix(RTM) for mapping function/code with Test cases.
  • Mentored team members by assisting with regular knowledge transfer sessions and training new team member on core java.


Core Java Developer

Technologies: Core Java, SQL Server 2008


  • Involved in different phases of the project life cycle from requirements gathering to testing.
  • Involved in developing Data Transformation and Manipulation classes using core java.
  • Created Junit test cases and created set up manuals and user guides.

Hire Now