We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Columbus, OH


  • Around 8 Years of experience in the field of Information Technology which includes a major concentration on Big Data Tools and Technologies, various Relational Databases and NoSQL Databases, Java Programming language and J2EE technologies with highly recommended software practices.
  • Hands on experience in developing application using Hadoop ecosystem like Spark, Hadoop MapReduce, HDFS, Yarn, Pig, Hive, Sqoop, Oozie, Avro, HBase, Zookeeper, Flume, Hue, Kafka and Storm.
  • Extensive experience in developing application using Scala, Python, Java and android.
  • Experience in different Hadoop distributions like Cloudera (Cloudera distribution CDH 4 and 5) and knowlede of Horton Works Distributions (HDP).
  • Experience with Cloudera Manager Administration and Monitor Hadoop cluster using Cloudera Manager and Apache Ambari .
  • Expertise in installing, designing, sizing, configuring, provisioning and upgrading Hadoop environments.
  • Excellent understanding of Hadoop architecture and core components such as Master Node (Name Node), Secondary Node (Data Node) .
  • Good experience in both MapReduce MRv1 and MapReduce MRv2 (YARN).
  • Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms.
  • Worked with Spark engine to process large scale data and experience to create Spark RDD.
  • Expert in creating Hive tables and write Hive queries to do analysis of HDFS data.
  • Good experiences in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database systems (RDBMS) and vice - versa.
  • Successfully loaded files to Hive and HDFS from Oracle and SQL Server using SQOOP.
  • Hands on NoSQL database experience with MongoDb, HBase and Cassandra.
  • Excellent relational database ( RDBMS ) experience in Oracle, MySQL, SQL Server .
  • Extensive experience in SQL (Structure Query Language) and PL/SQL - Stored Procedure, Trigger, Sequence, Index.
  • Experience in creating Map Reduce codes in Java as per the business requirements.
  • Extensive experience in developing Java application using Spring MVC , Spring Restful web service, Struts2, JSP (java server pages), Servlet, ORM (Object Relational mapping), Hibernate , Core java and Swing.
  • Strong experience in IO, Bean, String, JDBC, JSTL,HTML, AngularJS , Multithreading, JavaScript, Ajax, CSS, jQuery, Collections, JSON, XML and auto building tool Jenkins.
  • Excellent experience in developing web-base and desktop report using Jasper Report tool.
  • Extensively worked on Amazon web service ( AWS ) using difference services like EC2, S3, Relational Database Service (RDS), DynamoDB, Elastic load balancing (ELB), Auto scaling, Elastic Block Store (EBS), Elastic MapReduce (EMR).
  • Good working knowledge on Eclipse IDE for developing and debugging Java applications.
  • Experience in using version control tools like Subversion (SVN), GIT
  • Experience in working with software methodologies like Agile and Waterfall.
  • Thorough knowledge of Software Development Life Cycle (SDLC) with deep understanding of various phases like Requirements gathering, Analysis, Design, Development and Testing.


Hadoop/Big Data Framework: Apache Spark, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Sqoop, Oozie, Zookeeper, Flume, Kafka and Storm

Programming Languages: Scala, Java (JDK1.5/1.6/1.7), J2EE, Python, Pig Latin, HiveQL, Android, HTML, C, C++, JavaScript, J Query, CSS, Ajax, Shell script

Databases: My SQL 5.6/5.5/5.1, Mogo DB, Oracle 10g, SQL Server, MS Access.

Java Framework and Tools: Spring 4/3, Struts 2, Hibernate 3/4, AngularJS 1.0

IDE Tools: Eclipse 4.5/4.3/3.1/3.0 , Net Beans 4.1/4.0

Database GUI Tools: Robo mongo, SQL Developer, SQL yog 5.26/11.11, MySQL Workbench, Toad, SQL Server Management Studio

Reporting Tool: Jasper Report

Operating Systems: Linux (Fedora10/18, Ubuntu13/16), Windows XP/2007/10

Other skills: AWS, Internet OF things, GIT, SVN, Clear Case, JFrogArtifactory, Control m,Quickbuild

Development Methodologies: Agile/Scrum, Waterfall


Senior Hadoop Developer

Confidential, Columbus, OH


  • Solid Enterprise working knowledge of Scala fundamentals including programming languages, best practices.
  • Development and maintenance of Scala applications that are executed on the Cloudera platform
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Experienced in developing scripts for doing transformations using Scala.
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • Used SparkSQL for reading data from external sources and processes the data using Scala computation framework.
  • Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
  • Good Understanding in Kafka Architecture and designing consumer and producer Applications.
  • Used Scala to develop Scala coded spark projects and executed using spark-submit
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Strong in Core Scala, including experience with Collections, Type Variance, Implicit Parameters and Conversion
  • Functional Programming experience in Scala, including Higher-order Functions, Currying, Partial Functions, Partial Application, Nested Functions
  • Used Control m for scheduling and monitoring jobs in all the environment.
  • Good understanding of Devops Quickbuild setup for continuous integration.
  • Developed customized UDFs and UDAFs in Scala to extend Pig and Hive core functionality.
  • Experience in using Avro, Parquet and JSON file formats, developed UDFs in Hive
  • Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
  • Experienced in using build tools like Gradle, Log4j , Maven to build and deploy applications into the server.
  • Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Worked with Network, database, application and BI teams to ensure data quality and availability.
  • Expertise in creating tasks and working with dependencies using Gradle build tool .
  • Used Scala collection framework to store and process the complex consumer information. Based on the offers setup for each client, the requests were post processed and given offers.
  • Worked with Xml, Csv, Marc, MarcXml, Edifact, Onix file formats.
  • Used Data Frames and Datasets to consume the data and store the output to Parquet files.

Environment: Hadoop, Hive, Mapreduce, Sqoop, Kafka, Spark, Scala, SparkSql, Yarn, Shell Scripting, Gradle, Java, JUnit, agile methodologies, Control m,Ubiquity, MySQL, AWS, EC2, S3, Hortonworks, power BI, Solr

Senior Hadoop/Spark Developer

Confidential, Berkeley Heights, NJ


  • Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Spark code using Scala and Spark-SQL for faster processing and testing.
  • Implemented Spark sample programs in python using pySpark.
  • Analyzed the SQL scripts and designed the solution to implement using pySpark.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Populated HDFS and HBase with huge amounts of data using Apache Kafka.
  • Used Kafka to ingest data into Spark engine.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experienced with different scripting language like Python and shell scripts.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.
  • Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
  • Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Developed Solr web apps to query and visualize and solr indexed data from HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on Spark SQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
  • Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
  • Hands on work administering applications and helping with DevOps tasks
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement and utilizing HiveSerDes like REGEX, JSON and AVRO.
  • Experiencing working in a DevOps model and a passion for automation
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Worked totally in agile methodology and developed Spark scripts by using Scala shell.
  • Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
  • Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
  • Used Hibernate ORM framework with Spring framework for data persistence and transaction management.

Environment: Hadoop, Hive, Mapreduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, NIFI, MySQL, Tableau, AWS, EC2, S3, Hortonworks, power BI, Solr.

Senior Hadoop/Spark Developer



  • Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
  • While developing applications involved in complete Software Development Life Cycle (SDLC).
  • Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
  • Developed Oozie workflow for scheduling ETL process and Hive Scripts.
  • Started using apache NiFi to copy the data from local file system to HDFS.
  • Involved in teams to analyze the Anomaly detection and ratings of data.
  • Implemented custom input format and record reader to read XML input efficiently using SAX parser.
  • Analyze database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
  • Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
  • Involved in working with Impala for data retrieval process.
  • Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Loaded data from Linux file system to HDFS and vice-versa
  • Developed UDF's using both Data Frames/SQL and RDD in Spark for data Aggregation queries and reverting back into OLTP through Sqoop.
  • Experience with DevOps and automation frameworks, including Chef, Docker, Puppet, or Jenkins
  • POC for enabling member and suspect search using Solr.
  • Worked on ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Used CSV Excel Storage to parse with different delimiters in PIG.
  • Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments. Involved in setting QA environment by implementing pig and Sqoop scripts.
  • Got chance working on Apache NiFi like executing Spark script, Sqoop scripts through NiFi, worked on creating scatter and gather pattern in NiFi, ingesting data from Postgres to HDFS, Fetching Hive metadata and storing in HDFS, created a custom NiFi processor for filtering text from Flow files etc.
  • Responsible for designing and implementing ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
  • Developed Pig Latin scripts to do operations of sorting, joining and filtering enterprise data.
  • Implemented test scripts to support test driven development and integration.
  • Developed multiple MapReduce jobs in java to clean datasets.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Developed UNIX shell scripts for creating the reports from Hive data.
  • Manipulate, serialize, model data in multiple forms like JSON, XML. Involved in setting up MapReduce 1 and MapReduce 2.
  • Prepared Avro schema files for generating Hive tables and Created Hive tables and loaded the data in to tables and query data using HQL.
  • Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.

Environment: Hadoop MapReduce 2 (YARN), Nifi, HDFS, PIG, Hive, Flume, Cassandra, Eclipse, Ignite Core Java, Sqoop, Spark, Splunk, Maven, Spark SQL, Cloudera, Solr, Talend, Linux shell scripting.

Hire Now