We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/spark Developer Resume

Indianapolis, IN


  • Overall 7+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application and web applications using java/J2EE technologies.
  • Hadoop
  • Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce programming paradigm.
  • Experience in writing Hive Queries for processing and analyzing large volumes of data.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
  • Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.
  • Automated all the jobs, for pulling data from upstream server to load data into Hive tables, using Oozie workflows.
  • Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Partitioner to speed up the jobs.
  • Used Hbase in accordance with Hive as and when required for real time low latency queries.
  • Spark
  • Hands on experience with Spark Core, Spark SQL, Spark Streaming using Scala. Used Spark-SQL to perform transformations and actions on data residing in Hive.Used Kafka & Spark Streaming for real-time processing.


Languages: Java, Python, Scala, SQL, PL/SQL

BigData Technologies: HDFS, Sqoop, Flume, Hive, Map Reduce, Pig, Spark, Kafka

No Sql Databases: HBase

Business Intelligence: Tools Tableau

Web Servers: Apache Tomcat, Web Logic

Databases: Oracle, MySQL

Operating Systems: Unix, windows, Linux

Build & Development Tools: Ant, Maven, Eclipse

Version Control: SVN, Git


Sr. Hadoop Developer/Spark Developer

Confidential, Indianapolis, IN


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
  • Developed Spark Applications in Scala and build them using Maven.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDAFs with Data frames in Spark 1.6 for Data Aggregation queries.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 400 nodes.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Involved in creating Hive tables, loading and analyzing data using hive queries
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Impala, HBase, Tableau, Oozie, Jenkins, Cloudera, Oracle 12c, Linux

Hadoop Developer

Confidential, Hammond, IN


  • Worked on analyzing data using different big data analytic tools including Pig, Hive and MapReduce.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive queries on Avro format data to meet the business requirements.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Pioneered in design, develop and implementation of data transformation process from Python to HQL scripts for performance tuning of application.
  • Implemented Kafka event log producer to produce the logs into Kafka topic which are utilized by ELK (Elastic Search, Log stash, Kibana) stack to analyze the logs produced by the Hadoop cluster.
  • Implemented Kafka consumers to store the data into HDFS and query it by creating hive tables on top of it.
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
  • Used Snappy compression technique to compress the files before loading it to Hive.
  • Created HBase tables to store the Avro data from different portfolios and app on top of it.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Worked with MySQL for storing of metadata and performing the lookup requests.
  • Worked on automation of the process for efficiency of application.
  • Responsible for continuous Build/Integration with Jenkins and deploying the applications into production using XL Deploy.
  • Actively involved in code review and bug fixing for improving the performance.

Environment: Hadoop YARN, Hive, Pig, Python, Apache Kafka, HBase, Shell Scripting, Java, MySQL, ELK stack, MapR, Pycharm, XL Deploy, Git, Jenkins.

Hadoop Java Developer

Confidential, Wheatfield, IN


  • Development of Map Reduce jobs in cascading for data cleansing and data processing of flat files.
  • Design, developed and implemented main flow component for end to end data flow process within the platform.
  • Responsible for creating the SOAP clients for consuming the web service Requests.
  • Involved in Analysis and design for setting up edge node as per the client requirement.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Expertise in writing the hive scripts for large data sets comparison.
  • Expertise in performance optimization and memory tuning of map-reduce applications.
  • Experience with MySQL for utilizing it for auditing purposes on the cluster.
  • Responsible for MapR upgrade in both production and non-production environment.
  • Written shell scripts for data extraction and data cleansing for performing member specific analytics.
  • Co-ordinate with Administrator team to analyze Map Reduce Jobs performance for resolving any cluster related issues.
  • Expertise in platform related Hadoop Production support tasks by analyzing the job logs.
  • Co-ordinate with different teams to determine the root cause and taking steps to resolve them.
  • Responsible for continuous Integration with Jenkins and deploying the applications into production using XL Deploy.
  • Managed and reviewed Hadoop log files to identify issues when job fails and finding out the root cause.
  • Utilizing service now to provide application support for the existing clients.

Environment: Hadoop YARN, Cascading, Hive, Pig, Java, Shell Scripting, SOAP web service, MySQL, MapR, Mule ESB, Agile, Git, Jenkins, Service Now.

Hire Now