We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Englewood, CO


  • Having over 8 years of professional IT experience in Analysis, Development, Integration and Maintenance of Web based and Client/Server applicationsusing Java and Big Data technologies.
  • 4 years of relevant experience in Hadoop Ecosystem and architecture (HDFS, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Flume, Oozie).
  • Experience in real time analytics with Apache Spark(RDD, DataFrames and Streaming API).
  • Used Spark DataFrames API over Cloudera platform to perform analytics on Hive data.
  • Experience in integrating Hadoop with Apache Storm and Kafka. Expertise in uploading Click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Developed producers for Kafka which compress and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of Hadoop block size.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Extensive hands on experience in writing MapReduce jobs in Java.
  • Performed data analysis using Hive and Pig. Experience in analyzing large datasets using HiveQL and PigLatin.
  • Experience in using Partitioning and Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
  • Good understanding and knowledge of NoSQL databases like MongoDB, Cassandra and HBase.
  • Having Experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Experience in job work-flow scheduling and monitoringusing Oozie.
  • Worked extensively on different Hadoop distributions likeCloudera’sCDH and Hortonworks HDP.
  • Good working knowledge in cloud integration with Amazon Web Services (AWS)components like Redshift, DynamoDB, EMR, S3 and EC2 instances.
  • Worked with Apache NiFi to develop Custom Processors for processing and distributing data among cloud systems.
  • Having good knowledge ofScala programming concepts.
  • Expertise in distributed and web environments focused in Core Java technologies like Collections, Multithreading, IO, Exception Handling and Memory Management.
  • Expertise in development of Web applications using J2EE technologies like Servlets, JSP, Web Services, Spring, Hibernate, HTML5, JavaScript, jQuery, AJAX etc.,
  • Knowledge of standard build and deployment tools such as Eclipse, ScalaIDE, Maven, Subversion, SBT.
  • Extensive knowledge in Software Development Lifecycle (SDLC) using Waterfall, Agile methodologies.
  • Facilitate Sprint planning, daily scrums, retrospectives, stakeholder meetings, and software demonstrations.
  • Excellent communication skills with the ability to communicate complex issues to technical and non-technical audiences that includes peers, partners, and Senior IT and Business management.


Languages: Java, XML, SQL, PL/SQL, Pig Latin, Hive QL, Python, Scala

Web Technologies: JEE (JDBC, JSP, SERVLET, JSF, JSTL), AJAX, JavaScript

Big Data Systems: Hadoop, HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Flume, Oozie, Impala, Spark, Kafka, Storm

RDBMS: Oracle, MySQL, SQL Server, PostgreSQL, Teradata

NoSQL Databases: HBase, MongoDB, Cassandra

App/Web Servers: Apache Tomcat, WebLogic

SOA: Web services, SOAP, REST

Frameworks: Struts 2, Hibernate, Spring 3.x

Version Control Systems: GIT, CVS, SVN

IDEs: Eclipse, Scala IDE, NetBeans, IntelliJ IDEA, PyCharm

Operating Systems: UNIX, Linux, Windows


Confidential - Englewood, CO

Hadoop/Spark Developer


  • Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
  • Designed and developed Hive tables to store staging and historical data.
  • Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.
  • Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and used them using Impala process engine
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Created Oozie workflows for sqoop to migrate the data from source to HDFS and then to target tables.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and DataFrames API to load structured and semi-structured data into Spark clusters.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Developed Oozie workflow jobs to execute Hive, Pig, Sqoop and Mapreduce actions.
  • Configured Flume to transport web server logs into HDFS.
  • Experience on Amazon Web Services(AWS), Amazon Cloud Services like Elastic Compute Cloud(EC2), Simple Storage Service(S3), Elastic Map Reduce(EMR), Amazon Simple DB and Amazon Cloud Watch.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Used Apache Kafkafor importing real time network log data into HDFS.
  • Worked on numerous POCs to prove if Big Data is the right fit for a business case.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Created web-based User interface for creating, monitoring and controlling data flows using Apache Nifi.

Environment: Apache Hadoop, CDH 4.7, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Scala, Spark, Spark Streaming, Kafka, Linux

Confidential - Madison, WI

Hadoop Developer


  • Extracted the data from Teradata into HDFS using Sqoop.
  • Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
  • Implemented MapReduce programs on log data to transform into structured way to find user information.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views and visit duration.
  • Utilized Flume to filter out the JSON input data read from the web servers to retrieve only the required data needed to perform analytics.
  • Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
  • Developed a well-structured and efficient ad-hoc environment for functional users.
  • Export the analyzed data to relational databases using Sqoop for visualizations and to generate reports for the BI team.
  • Loaded cache data into HBase using Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Wrote ETLs using Hive and processed the data as per business logic.
  • Hands on experience on Amazon EC2 Spot integration & and Amazon S3 integration.
  • Optimizing the EMRFS for Hadoop to directly read and write in parallel to AWS S3 performantly.
  • Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading using Informatica.
  • Extensively used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Created Talend ETL jobs to read the data from Oracle Database and import in HDFS.
  • Worked on data serialization formats for converting complex objects into sequence bits by using Avro, RC and ORC file formats.

Environment: Apache Hadoop, Hortonworks HDP 2.0, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Teradata, Talend, Avro, Java, Linux

Confidential - Omaha, NE

Hadoop Developer


  • Worked on live 8 node Hadoop cluster running CDH 4.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS).
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed several MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data into HDFS.
  • Responsible for creating Hive External tables and loaded the data into tables and query data using HiveQL.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
  • Integrated Oozie with the rest of Hadoop stack supporting several types of Hadoop jobs as well as the system specific jobs (such as Java programs and shell scripts).
  • Created HBase tables to store various data formats coming from different portfolios, worked on NoSQL databases including HBase, Cassandra and MongoDB.
  • Used Jenkins for build and continuous integration for software development.
  • Worked with application teams to install Operating systems, Hadoop updates, patches and version upgrades as required.

Environment: Apache Hadoop, CDH 4, Sqoop, Flume, MapReduce, Pig, Hive, HBase, Cassandra, MongoDB, Oozie, Zookeeper, Jenkins

Confidential, Waukesha, WI

Java Developer


  • Involved in life cycle, User Interaction, Requirement gathering, Design, Development, Testing, and Implementation, Prepare Business Requirement Document.
  • Analysis and Coding of Java components for Refills and Appointments modules.
  • Used Spring MVC Framework Dependency Injection for integrating various Java Components.
  • Experience in working with Spring MVC Controllers and Spring Hibernate template.
  • Hands on experience with data persistency using Hibernate and Spring Framework.
  • Developed Spring Controllers, Service Components, DAOs, Web Services and UI Integrationfor processing the member request for the two modules.
  • Helped UI to integrate the Java Beans data using JSTL, spring tags.
  • Developed the DAO layer using the Hibernate and for real time performance used the caching system for Hibernate.
  • Developed, Consumed Enterprise Web Services & generated client using Jersey & Axis Frameworks in RAD IDE.
  • Configured the Spring, Hibernate, and Log4j Configuration files.
  • Used ANT, Maven Scripts to build and deploy applications and helped to deployment for Continuous Integration using Jenkin and Maven.
  • Wrote SQL queries and Stored Procedures for interacting with the Oracle database.
  • Was part of production support team to resolve the production incidents
  • Documentation of common problems prior to go-live and while actively in a Production Support role.

Environment: Java J2EE, Struts MVC, Tiles, JSP, XML, JavaScript, Spring IOC, Websphere Application Server, PostgreSQL

Confidential, SFO, CA

Java Developer


  • Work involved providing support to the production environment for various applications and actively work on incidents and issues raised by users. This also involved in on call support during off hours.
  • Developed service layer logic for core modules using JSPs and Servlets and involved in integration with presentation layer.
  • Involved in complete project such as Business Delegate, Data Transfer Object, Service Locator, Data Access Object and Singleton.
  • Developed XML configuration and data description using Hibernate. Hibernate Transaction Manager is used to maintain the transaction persistence.
  • Developed the user interface using JSP and DHTMLlifecycle of the project from gathering business requirements to creating an architecture and build applications on Java/J2EE with Spring MVC framework.
  • Involved in fixing bugs and minor enhancements for the front-end module.

Environment: Java, Servlets, JSP, Spring, Hibernate, XML, XPath, jQuery, JavaScript, WebSphere Application Server

Hire Now