We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Chicago, IL


  • 8+ years of extensive IT experience with multinational clients this includes 4+ years of recent experience in Big Data/Hadoop Ecosystem.
  • Hands - on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase, Kafka, Oozie and Zookeeper.
  • Excellent knowledge on Hadoop Components such as HDFS, MapReduce and YARN programming paradigm.
  • Experience with installation, configuration, supporting and managing of BigData and underlying infrastructure of Hadoop Cluster.
  • Experience in analyzing data using HiveQL, Pig Latin and extending HIVE and PIG core functionality by using custom UDFs.
  • Proficient in Relational Database Management Systems (RDBMS).
  • Extensive working knowledge of Partitioned table, UDFs, Performance tuning, compression related properties in Hive.
  • Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like HBase.
  • Hands on experience in using Amazon Web Services like EC2, EMR, RedShift, DynamoDB and S3.
  • Hands on using Apache Kafka for tracking data ingestion to Hadoop cluster and implementing Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Experience in Spark Streaming to ingest data from multiple data sources into HDFS.
  • Skillful Hands on Experience on Stream Processing including Storm and Spark streaming.
  • Knowledge in job work-flow scheduling and monitoring tools like Oozie.
  • Experience in analyzing data using HBase and custom MapReduce programs in Java.
  • Proficient in importing and exporting the data using SQOOP from HDFS to Relational Database systems and vice-versa.
  • Excellent knowledge in data transformations using MapReduce, HIVE and Pig scripts for different file formats.
  • Experience with various scripting languages like Linux/Unix shell scripts, Python.
  • Involved in importing Streaming data using FLUME to HDFS and analyzing using PIG and HIVE.
  • Experience in using Flume for aggregating log data from web servers and dumping into HDFS.
  • Experience in scheduling and monitoring Oozie workflows for parallel execution of jobs.
  • Proficient in Core Java, Servlets, Hibernate, JDBC and Web Services.
  • Experience in all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies.
  • Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
  • Experience in Developing and maintaining applications on the AWS platform.
  • Hands on experience in working with RESTful web services using JAX-RS and SOAP web services using JAX-WS.


Hadoop Ecosystem: Pig, Hive, Sqoop, Flume, HBase, Kafka-Storm, Spark with Scala, Oozie, Zookeeper, Impala, Hadoop Distributions (Cloudera, Hortonworks)

Web Technologies: Ajax, jQuery, HTML, CSS, XML

Programing Languages: Java, Scala, C/ C++, Python

Databases: MySQL, MS-SQL Server, SQL, Oracle 11g, NoSQL (HBase, Cassandra)

Web Services: REST, AWS, SOAP,UD, Micro Services

Tools: Ant, Maven, Junit, Apache NiFi, Talend, Airflow

Servers: Apache Tomcat, WebSphere, JBoss

IDE's: MyEclipse, Eclipse, IntelliJ IDEA, NetBeans

AWS: HTML, Java Script, XML, SOAP, EMR, EC2.

ETL/BI Tools: Talend, Tableau, Pig


Confidential, Chicago, IL

Hadoop/Spark Developer


  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Involved in Hadoop along with Map Reduce, Hive and Pig set up.
  • Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Written Map Reduce programs for some refined queries on big data.
  • Managing and scheduling jobs on a Hadoop cluster.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed Simple to complex Map/reduce Jobs using Hive.
  • Implemented Partitioning and bucketing in Hive.
  • Mentored analyst and test team for writing Hive Queries.
  • Worked with Hive QL on big data of logs to perform a trend analysis of user behavior on various online modules.
  • Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB, Couchbase, Cassandra.
  • Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Experience in managing and reviewing Hadoop log files.
  • Extensively used Pig for data cleansing.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Scala2.12.8, Spark 2.1.0, Kafka, SQL, Pig, Sqoop, HBase, Zookeeper, MySQL, DB2, Teradata, AWS,Git, Agile.

Confidential, San Jose, CA

Hadoop Developer


  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, Hbase database and Sqoop.
  • In depth understanding of Classic MapReduce and YARN architectures.
  • Developed Map Reduce programs for some refined queries on big data.
  • Created Azure HDInsight and deployed Hadoop cluster in could platform
  • Used HIVE queries to import data into Microsoft AZURE cloud and analyzed the data using HIVE scripts.
  • Using Ambari in Azure HDInsight cluster recorded and managed the data logs of name node and data node.
  • Built pipelines to move hashed and un-hashed data from Azure Blob to Data lake and Azure Data Factory.
  • Creating Hive tables and working on them for data analysis to cope up with the requirements.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Worked with business team in creating Hive queries for ad hoc access.
  • Implemented Hive Generic UDF's to implement business logic.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Pig UDF's to pre-process the data for analysis.
  • Developed Spark Ingestion Framework to load the data from Hive External Tables to internal tables at one shot.
  • Created pipelines to move data from on-premise servers to Azure Data Lake.
  • Developed Spark code for consumption layer which includes informatica logic and further loaded data into hive fact & dimension tables.
  • Deployed Hadoop Cluster on Azure for Big Data Analytics.
  • Deployed the data in Hadoop Cluster on Azure for datalake.
  • Started using apache NiFi to copy the data from local file system to HDFS.
  • Used sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit.
  • Analyzed the data by performing Hive queries, ran Pig scripts, SparkSQL and SparkStreaming.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Understanding in Machine Learning and statistical analysis with Spark.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
  • Involved in continuous monitoring of operations using Storm.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Implemented indexing for logs from Oozie to Elastic Search.
  • Experienced to implement MapReduce logics on Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3)
  • Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend.

Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Apache Kafka, AZURE, Apache Storm, Oozie, SQL, Flume, Spark1.6.1, HBase and GitHub .

Confidential, Omaha, Nebraska

Hadoop Developer


  • Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
  • Developed data pipeline using Sqoop, Spark, MapReduce, and Hive to ingest, transform and analyze, customer behavioral data.
  • As a Developer, worked directly with business partners discussing the requirements for new projects and enhancements to the existing applications.
  • Wrote extensive shell scripts to run appropriate programs.
  • Wrote multiple queries to pull data from Hbase
  • Reporting on the project based on Agile-Scrum Method. Conducted daily Scrum meetings and updated JIRA with new details.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
  • This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
  • Designed and implemented Mapreduce-based large-scale parallel relation-learning system.
  • Involved in review of functional and non-functional requirements.
  • Installed and configured Hadoop Mapreduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Analyzed the data by performing Hive queries and running Pigscripts and Python Scripts.
  • Used Hive to partition and bucket data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Got good experience with NoSQL database.

Environment: Java 1.6, Hadoop 2.2.0 (Yarn), Map-Reduce, Hive, Pig, Sqoop, Hbase-0.94, Storm-0.9.1, Linux Centos 6.4, Agile, Maven, Jira, Hortonworks Distribution Platform (HDP).


Java Developer


  • Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
  • Used Hibernate Framework for persistence onto oracle database.
  • Written and debugged the ANT Scripts for building the entire web application.
  • Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
  • Implemented Java Message Services (JMS) using JMS API.
  • Coded using Servlets, SOAP Client and Apache CXF Rest API's for delivering the data from our application to external and internal for communication protocol.
  • Created SOAP Web Service using JAX-WS, to enabled client to consume a SOAP Web Service. .
  • Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.

Environment: Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Web Services, SOAP, SOA, JSF, Java, JMS, Oracle, Eclipse, XML, Apache tomcat.


Java Developer


  • Involved in the coding of JSP pages for the presentation of data on the View layer in MVC architecture.
  • Used J2EE design patterns like Factory Methods, MVC, and Singleton Pattern that made modules and code more organized, flexible and readable for future upgrades.
  • Worked with JavaScript to perform client-side form validations.
  • Used Struts tag libraries as well as Struts tile framework.
  • Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency.
  • Actively involved in tuning SQL queries for better performance.
  • Worked with XML to store and read exception messages through DOM.
  • Wrote generic functions to call Oracle stored procedures, triggers, functions.

Environment: Core Java, Maven, Oracle, AJAX, JDK, JSP, Eclipse, JavaScript .

Hire Now