We provide IT Staff Augmentation Services!

Hadoop/ Spark Developer Resume

Cleveland, OH


  • Overall 8 years of total IT experience in all phases of software development life cycle, 5 years of experience in Hadoop and Big Data Eco System.
  • Great Experience and knowledge in Hadoop architecture and various components such as HDFS, YARN, Job tracker, Task Tracker, Name Node, Data Node and MapReduce.
  • Good experience in Hadoop ecosystem like Hadoop MapReduce, HDFS, NIFI, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume, Spark streaming, Spark SQL, HBase and Cassandra.
  • Expertise in Hadoop 2.0 and YARN architecture.
  • Experience in using Hadoop cluster using Cloudera’s CDH, Horton works HDP.
  • Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive and Pig.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational DatabaseSystems (RDBMS) and vice - versa.
  • Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
  • Expertise in writing custom UDF’s and UDAF’s for extending Hive and Pig core functionalities.
  • Experience in implementation of various Hadoop file-formats and compression techniques like Sequence, Parquet, ORC, Avro, Z-Zip and Text file.
  • Experienced in using NoSQL data bases like HBase, Cassandra, MongoDB.
  • Experience in working with different Databases like Oracle, MySQL, MS SQL.
  • Experience in writing UNIX, SHELL and BASH scripts.
  • Good experience in implementing advanced procedures like text analytics and processing the in-memory computing capabilities with Apache Impala, Scala.
  • Experience in creating RDD, Data frames for the required data and did transformations using Spark RDD’s, Spark SQL
  • Used Spark Structured Streaming to perform necessary transformations.
  • Experience in Writing Producers/Consumers and creating messaging centric applications using Apache Kafka.
  • Hands on experience in Amazon Web Services (AWS) provisioning tools likeEC2, Simple Storage Service (S3), Elastic Map Reduce.
  • Extensive Experience in Java development skills using J2SE, J2EE technologies like Servlets, Spring Hibernate, JSP, JDBC.
  • Experienced in Java components like Frame work collection, Exception handling, Multithreading and I/O system.
  • Experience in SOA using Soap and Restful.
  • Experience in working with Waterfall & Agile development methodology.
  • Proficiency in developing secure enterprise Java applications using technologies such as X-Servlets, Maven, Hibernate, XML, HTML, CSS Version Control Systems.
  • Ability to learn and adapt quickly to new tools and environment with strong communication and analytical skills.


Big Data Eco Systems: Hadoop (HDFS & Map Reduce), PIG, HIVE, HBASE, Zoo Keeper, Sqoop, Flume, Kafka, Apache Spark, Impala, Oozie.

Databases: Oracle, SQL server, My SQL.

No SQL Databases: HBase, Cassandra, Mongo DB.

Hadoop Distributions: Cloudera, Horton works.

Cloud: AWS, AZURE.

Languages: Java, Java SE, Java J2EE, Scala, Python, C.

Web Technologies: JavaScript, J-Query, Boot Strap, AJAX, XML,CSS, HTML, AngularJS.

Web Services: REST, SOAP, JAX-WS, JAX-RPC, JAX-RS, WSDL, Axis2, Apache HTTP, CVS, SVN.

IDE: Eclipse, Net beans, IntelliJ.

Operating Systems: MacOS, Linux, Windows.



Confidential, Cleveland, OH


  • Responsible for building scalable distributed data solutions using Hadoop.
  • ETL - Data cleansing, Transformation and prepping data ready for reporting tools.
  • Developed Spark jobs and Hive Jobs to apply rules, logics and transform data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Used Spark Structured Streaming to perform transformations in data lake which gets data from Kafka and send to HDFS.
  • Created a Spark Streaming task to import live data from Kafka sources and implemented analysis models.
  • Responsible for handling large datasets using repartition, coalesce,broadcast variables and spark’s in-memory capabilities.
  • Converted row-like regular hive external tables into columnar snappy compressed parquet tables with key-value pairs. Also worked on other file formats like CSV and Text formats.
  • Implemented Hashing algorithms like UUID, MD5 for checksum and identifying delta.
  • Applied transformations on data ingested by Informatica team as per business requirements.
  • Used JDBC connectors to access reference tables and lookup-tables from Oracle RDBMS Tables.
  • Written Ad-hoc queries in hive for orchestration and unit testing.
  • Created and scheduled Control-M jobs to run multiple Hive and Spark Jobs, which independently run with time and data availability.
  • Implemented the work flows using Apache Oozie frame work to automate tasks.
  • Built on-premise end-to-end data pipelines.
  • Assisted in setting up Amazon EMR cluster, adding roles in Amazon IAM for Disaster Recovery (DR) Cluster.
  • Created business ready Views on top of Master Table and replicated data into Amazon S3.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Used JIRA for task/Defect tracking, SVN for version control.

Environment: Hadoop, Cloudera, HDFS, Hive, Oozie, SparkSQL, Sqoop, Control-M, Scala, Informatica, Tableau, Shell Scripting, Python, Oracle, AWS.


Confidential, Minneapolis, MN


  • Interacted with the Business users to identify the process metrics and various key dimensions and measures and involved in the complete life cycle of the project.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Developed Map Reduce jobs in Java for data cleaning and preprocessing.
  • Good knowledge in using Apache NIFI to automate the data movement.
  • Used Map Reduce to ingest customer behavioral data and financial histories into HDFS.
  • Used Pig as ETL tool for transforming and pre-aggregations before storing data into HDFS.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
  • Handled importing of data from various data sources, performed transformations.
  • Involved in creating tables, partitioning, bucketing of table.
  • Configured Flume agents on different data sources to capture the streaming log data.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop cluster in virtual servers in EC2 and S3.
  • Experience with different data formats like Avro, Parquet, ORC and compressions like Snappy and Z-zip.
  • Implemented POC in persisting click stream data with Apache Kafka.
  • Optimized existing algorithms in Hadoop using Spark SQL.
  • Troubleshooting and solving migration issues and production issues.

Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, AWS, Horton works, Kafka, Cassandra, UNIX, Tableau.


Confidential, New York City, NY


  • Importing data using Sqoop into HDFS vice versa.
  • Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop System.
  • Responsible to manage data coming from different data sources.
  • Developed simple and complex MapReduce programs in Java for Data Analysis.
  • Load data from various data sources into HDFS using Flume.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Responsible for spooling data from DB2 sources to HDFS using Sqoop.
  • Created HIVE tables and provided analytical queries for business user analysis
  • Extensive knowledge on PIG scripts using bags and tuples.
  • Created tables in HIVE by partitioning and bucketing for granularity and optimization of HIVEQL.
  • Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Pig, Hive and written Pig and Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map way.

Environment: Cloudera, HBase, Java, Hive, Pig, Sqoop, Oozie, Oracle, SVN, Kafka, GitHub, JIRA, Talend.




  • Extensively involved in different stages of Agile Development Cycle including Detailed Analysis, Design, Develop and Test.
  • Implemented the Back-End Business Logic using Core Java technologies including Collections, Generics, Exception Handling, Java Reflection and Java I/O.
  • Wrote and specified Spring Annotation Configuration to define Beans and View Resolutions to configure Spring beans, dependencies and the services needed by beans.
  • Used Spring IC to implement dynamic dependency injection and Spring AOP to implement crosscutting concerns such as transaction management.
  • Wrote Mapping Configuration files to implement ORM Mappings in the Persistence Layer.
  • Using Hibernate DAO support extended Dao Implementation.
  • Hibernate Configuration files were written to connect Oracle database and fetch data.
  • The Hibernate Query Cache was implemented using EhCache to improve the performance.
  • Implemented web services with RESTful standards with the support of JAX-RS APIs.
  • Confirmation of registration and monthly statements are sent to users by integrating and implementing JavaMail API.
  • Manipulated database data with SQL queries, including setting up stored procedures and triggers.
  • Implemented front-end developments such as webpages design, data binding, Single-Page Applications using HTML/CSS, JavaScript, jQuery and AJAX.
  • Used jQuery libraries to simplify the frontend programming works. Performed users' input validation using JavaScript and jQuery.
  • Utilized Node.js and MongoDB to generate tendency charts of the application for Payment History.
  • Performed JUnit test cases to test the service layers of the application.
  • Used JIRA to track the projects and GIT to ensure version control.

Environment: Java, Spring, JavaMail, JavaScript, HTML, CSS, AJAX, jQuery, Junit, JIRA, Oracle DB, MongoDB, GIT.

Hire Now