We provide IT Staff Augmentation Services!

Sr Hadoop/spark Developer Resume

Sacramento, CA


  • Having 7 years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying BIG DATA technologies to efficiently solve Big Data processing requirement.
  • 4 years of Experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, Map Reduce, HIVE, PIG, FLUME, OOZIE, SQOOP, TALEND and ZOOKEEPER.
  • Around 2 years of experience on apache SPARK STORM and KAFKA.
  • Experience in data analysis using HIVE, PIG LATIN, HBASE and custom Map Reduce programs in JAVA.
  • Pretty good Experience with Cloudera and Horton works distributions.
  • Experience in working with FLUME, SHELL SCRIPTING to load the log data from multiple sources directly into HDFS.
  • Worked on data load from various sources i.e., Oracle, MySQL, DB2, MS SQL Server, Cassandra, MongoDB, Hadoop using Sqoop and PYTHON SCRIPT.
  • Excellent understanding /knowledge on Hadoop (Gen - 1 and Gen-2) and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
  • I have been experienced with SPARK SREAMING API to ingest data into SPARK ENGINE from KAFKA.
  • Developed analytical components using SCALA, SPARK, STORM and SPARK STREAM.
  • Excellent understanding and knowledge of NOSQL database HBASE and CASSANDRA.
  • I have been experience with AWS, AZURE, EMR and S3.
  • Worked extensively with Dimensional MODELING, DATA MIGRATION, DATA CLEANSING, DATA PROFILING, and ETL Processes features for data warehouses.
  • Implemented Hadoop based data warehouses, INTEGRATED HADOOP with ENTERPRISE DATA WAREHOUSE systems. Extensive experience in ETL Data Ingestion, In-Stream data processing, BATCH ANALYTICS and Data PERSISTENCE STRATEGY.
  • I have been experienced with Informatics for ETL processing.
  • Experience with creating the TABLEAU dashboards with relational and multi-dimensional databases including Oracle, MySQL and HIVE, gathering and manipulating data from various sources.
  • Experience on monitoring, performance tuning, SLA, SCALING and security in Big Data systems.
  • Design and document REST/HTTP, SOAP APIs, including JSON data formats and API versioning strategy.
  • Installed and configured JENKINS FOR AUTOMATING Deployments and providing automation solution.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in J2EE technologies like Struts, JSP/Servlets, and spring.
  • Good Exposure on scripting languages like JAVASCRIPT, ANGULAR JS, JQUERY and XML.
  • Created a JAVADOC TEMPLATE for engineers to use to develop API documentation.
  • Expert in JAVA 1.8 LAMBDAS, STREAMS, Type annotations.
  • Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
  • Extensive experience working IN ORACLE, DB2, SQL SERVER and My SQL database.
  • Experience in software testing, JMETER, JUNIT, MOCKITO, Regression testing, defect tracking and management using Quality Center. Wrote JUNIT test cases for Controller, Service and DAO layer using MOCKITO, DBUNIT.
  • Ability to work in high-pressure environments delivering to and managing stakeholder expectations
  • Application of structured methods to: Project Scoping and Planning, risks, issues, schedules and deliverables.
  • Strong analytical and Problem solving skills.
  • Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines


Technology: Hadoop Ecosystem/J2SE/J2EE/JDK1.7,1.8 / Data base .

Operating Systems: Windows Vista/XP/NT/2000/ LINUX (Ubuntu, Cent OS), UNIX

DBMS/Databases: DB2, My SQL, PL/SQL

Programming Languages: C, C++, Core Java, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services, Xml

Big Data Ecosystem: HDFS, Map Reducing, Oozie, Hive, Pig, Sqoop, Flume, splunk, Zookeeper, Kafka and Hbase

Methodologies: Agile, Water Fall

NOSQL Databases: Hbase

Version Control Tools: SVN, CVS

ETL Tools: IBM data stage 8.1, Informatica


Confidential, Sacramento, CA



  • Developed data pipeline using Spark, Hive, Pig and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
  • Explored the usage of Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL and Spark Yarn .
  • Developed Spark Code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala .
  • Worked on the Spark SQL and Spark Streaming modules of Spark and used Scala to write code for all Spark use cases.
  • Involved in converting the JSON data into Data frame and stored into Hive tables .
  • Configure Flume to ingest log file data into HDFS .
  • Used Pig to do transformations, event joins and some Pre-Aggregations before storing the data onto HDFS.
  • Involved in using Sqoop for importing and exporting data between RDBMS and HDFS .
  • Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting.
  • Performed hive performance tuning aspects like Map join, cost based optimization and column level statistics .
  • Created logical view instead of tables in order to enhance the performance of hive queries.
  • Involved in developing Hive DDLS to create, alter and drop Hive tables.
  • Involved in loading data from Linux file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs .
  • Involved in creating Hive tables , loading with data and writing hive queries which will run internally in map reduce way.
  • Experience in working with NOSQL database like HBase .

Environment: Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, Cloudera, Scala IDE (Eclipse), Scala, Linux Shell Scripting, HDFS.

Confidential, GA

Hadoop Developer


  • Developed data pipeline using FLUME, SQOOP, PIG AND JAVA MAPREDUCE to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in Sqoop, HDFS Put or Copy from Local to ingest data and Map Reduce jobs.
  • Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto HDFS.
  • Extensive experience in ETL Data Ingestion, In-Stream data processing, BATCH ANALYTICS and Data PERSISTENCE STRATEGY.
  • Implemented Hadoop based data warehouses, INTEGRATED HADOOP with ENTERPRISE DATA WAREHOUSE systems.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Involved in developing PIG UDFS for the needed functionality that is not out of the box available from Apache Pig.
  • Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, MAP REDUCE, SQOOP, KAFKA, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
  • Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
  • Exploring with the SPARK improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, SPARK YARN.
  • Import the data from different sources like HDFS/Hbase into SPARK RDD.
  • Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed KAFKA PRODUCER and consumers, HBase clients, SPARK and Hadoop Map Reduce jobs along with components on HDFS, Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in TABLEAU.
  • Experience in Converting csv files to .tde files using TABLEAU extract API.
  • Involved in developing HIVE DDLS to create, alter and drop Hive tables and storm.
  • Create scalable and high-performance web services for data tracking.
  • Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zoo Keeper.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Experienced in managing Hadoop Cluster using CLOUDERA MANAGER TOOL.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Computed various metrics using Java Map Reduce to calculate metrics that define user experience.
  • Responsible for developing data pipeline using FLUME, SQOOP and PIG to extract the data from weblogs and store in HDFS.
  • Extracted and updated the data into MONOD USING MONGO import and export command line utility interface.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and Map Reduce) and move the data files within and outside of HDFS.
  • Involved in Hadoop testing, Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.


Confidential, Bloomington, IL

Hadoop Developer


  • Designed a data warehouse using Hive.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Using Hive, Map-reduce, and loaded data into HDFS.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked extensively with SQOOP for importing metadata from Oracle.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive.
  • Worked with business teams and created Hive queries for ad hoc access.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Mentored analyst and test team for writing Hive Queries.
  • Experience in writing Map Reduce programs with Java API to cleanse Structured and unstructured data.
  • Experience in RDMS such as Oracle, Teradata
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Gained very good business knowledge on claim processing, fraud suspect identification, appeals process etc.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, AWS, Java, Oozie, MySql.

Confidential, Atlanta, GA

Hadoop Developer


  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
  • Used Pig to do data transformations, event joins and some pre-aggregations before storing the data on the HDFS.
  • Exploited Hadoop MySQL-Connector to store Map Reduce results in RDBMS.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Worked on loading all tables from the reference source database schema through Sqoop.
  • Worked on designed, coded and configured server side J2EE components like JSP, AWSand JAVA.
  • Collected data from different databases(i.e. Oracle, MySQL) to Hadoop
  • Used Oozie and Zookeeper for workflow scheduling and monitoring.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Working on extracting files from MySQL through Sqoop and placed in HDFS and processed.
  • Supported Map Reduce Programs those running on the cluster.
  • Cluster coordination services through Zoo Keeper.
  • Involved in loading data from UNIX file system to HDFS.
  • Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in Map Reduce.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.

Environment: Apache Hadoop, AWS, Map Reduce, HDFS, Hive, Java, SQL, PIG, Zookeeper, Java (jdk1.6), Flat files, Oracle 11g/10g, MySQL, Windows NT, UNIX, Sqoop, Hive, Oozie, HBase.

Confidential, NY

JAVA/J2EE Developer


  • Designed Use Case and Sequence Diagrams according to UML standard using Rational Rose.
  • Implemented Model View Controller (MVC-2) architecture and developed Form classes, Action Classes for the entire application using Struts Framework.
  • Performed client side validations using JavaScript and server side validations using in built Struts Validation Framework.
  • Implemented the data persistence functionality of the application by using Hibernate to persist java objects to the relational database.
  • Used Hibernate Annotations to reduce time at the configuration level and accessed Annotated bean from Hibernate DAO layer.
  • Worked on various SOAP and RESTful web services used in various internal applications. • Used SOAP UI tool for testing the RESTful web services.
  • Used HQL statements and procedures to fetch the data from the database. • Transformed, Navigated and Formatted XML documents using XSL, XSLT.
  • Used LAMBDA EXPRESSION OF JAVA 1.8 features extensively to remove the boiler plate code and to extend the functionality.
  • Used a LAMBDA EXPRESSION to improve Sack Employees further and avoid the need for a separate class.
  • Used JMS for asynchronous exchange of message by applications on different platforms.
  • Developed the view components using JSP, HTML, Struts Logic tags and Struts tag libraries.
  • Involved in designing and implementation of Session Facade, Business Delegate, Service Locator patterns to delegate request to appropriate resources.
  • Used JUnit Testing Framework for performing Unit testing.

Environment: Struts 2.0 Hibernate 3.0, JSP, JDK 1.7, RAD, JMS, CVS, JavaScript, XSL, XSLT, lambda expression, Servlets 2.5, Web Sphere Application Server, Oracle 10g.


Java Developer


  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC)
  • Designed and developed framework components, involved in designing MVC pattern using Struts and spring framework.
  • Responsible for developing Use case, Class diagrams and Sequence diagrams for the modules using UML and Rational Rose.
  • Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
  • Involved in Deploying and Configuring applications in Web Logic Server.
  • Used SOAP for exchanging XML based messages.
  • Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
  • Developed Custom Tags to simplify the JSP code. Designed UI screens using JSP and HTML.
  • Actively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
  • Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
  • Wrote JUNIT test cases for Controller, Service and DAO layer using MOCKITO, DBUNIT.
  • Developed unit test cases using proprietary framework which is similar to JUNIT.
  • Used JUnit framework for unit testing of application and ANT to build and deploy the application on Web Logic Server.

Environment: Java, J2EE, JDK1.7, JSP, Oracle, VSAM, Eclipse, HTML, Junit, MVC, ANT, Web Logic.

Hire Now