We provide IT Staff Augmentation Services!

Sr. Hadoop/srk Developer Resume



  • Have 10 years of programming experience with skills in analysis design, testing and deploying various software applications that include 4+ years of strong work experience in Hadoop Eco Systems and Big - Data Analytics.
  • Contributed in design and architecture of the data pipeline of the projects also apart from the developments.
  • Built data pipeline using Spark Architecture from Scratch on Hadoop using Yarn as a cluster management service.
  • Worked in Real time analytics projects with Apache Storm and Spark Streaming
  • Expertise in Kafka and Data Pipelines and worked by Producer and Kafka Consumers and advanced features of Kafka like Streams and Connect.
  • Worked on NoSql Databases like Hbase, Cassandra and MongoDB,
  • Worked on Data modeling on Hbase and Cassandra in previous application
  • Used Tools like Ganglia and Nagios for monitoring bigdata applications
  • Worked a lot with Java Enterprise applic
  • Actions using Spring and JBoss Application Server
  • Worked on Micro Services using Spring boot and Kafka
  • Expertise in Big Data technologies and Hadoop ecosystems: HDFS, YARN, Name Node, Data Node, Yarn Architecture and MapReduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
  • Have worked on a multitenant large Scala distribution cluster with more than 100 data nodes.
  • Have expertise in optimizing, replicating Peta Bytes of data and minimizing operational failures across BIG Data Platform
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, ORC, AVRO etc and reading data from various sources like HBase and Hive.
  • Development expertise of RDBMS like ORACLE, SQLSERVER, TERADATA, NETEZZA, MS SQL & No-SQL databases like HBase and Cassandra.
  • Implemented batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
  • Working experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Experience in analyzing data using Hive QL, Pig Latin, Spark and custom MapReduce programs in Java.
  • Developed Pig Latin scripts for data cleansing and Transformation.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Good Knowledge on Python Scripting.
  • Imported data from RDBMS to column families in Cassandra through Storage Handler.
  • Good understanding of NoSQL databases such as HBase and Cassandra and Querying Language called CQL for Apache Cassandra.
  • Experience in implementing Spark using Scala and Spark SQL for faster analyzing and processing of data.
  • Involved in unit testing of Map Reduce programs using Apache MRunit.
  • Good understanding of XML methodologies (XML, XSL, XSD) including Webservices and SOAP.
  • Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat.
  • Developed core modules in large cross-platform applications using JAVA, J2EE, Spring, Struts, Hibernate, JAX-WSWeb Services, and JMS.
  • Experienced with build tool Maven, ANT and continuous integrations like Jenkins.
  • Experience in Data Analysis, Data Cleansing (Scrubbing), Data Validation and Verification, Data Conversion, Data Migrations and Data Mining.


Big data/Hadoop Ecosystem: HDFS, MapReduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Spark, Storm, Kafka, Impala.

Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNI, XML, REST, SOAP, WSDL

Programming Languages: C, C++, Java, Scala, Python, SQL, PL/SQL, Linux shell scripts.

NoSQL Databases: MongoDB, Cassandra, HBase.

Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.

Web Technologies: HTML, XML, JDBC, JSP, CSS, JavaScript, AJAX, SOAP, Angular JS

Frameworks: MVC, Hibernate 3, Spring 3/2/2.5.

Tools: Used: Eclipse, IntelliJ, Putty, NetBeans, Tableau.

Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, RedHat.


Confidential, PA

Sr. Hadoop/Spark Developer


  • Developed Data Pipeline with Kafka Storm and Spark.
  • Contributed in designing the Data Pipeline with Lambda Architecture.
  • Phase I was developed to collect the data for real time analytics and Kafka and Storm to Cassandra.
  • Worked in writing the topologies and saving the records in Cassandra.
  • Worked on Connect to moved Kafka Data to HDFS for Historical Data Processing.
  • Performed hands-on data manipulation, transformation, hypothesis testing and predictive modeling.
  • Developed robust set of codes that are tested, automated, structured and efficient.
  • Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models using Netezza .
  • Extensively worked with all kinds of Un-Structured, Semi-Structured and Structured data.
  • Developed Scala and SQL code to extract data from various databases.
  • Champion new innovative ideas around the Data Science and Advanced Analytics practices.
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Developed statistical models to forecast inventory and procurement cycles.
  • Used Impala to query the Hadoop data stored in HDFS.
  • Deployed the Cassandra cluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
  • Implemented the data backup strategies for the data in the Cassandra cluster.
  • Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Implemented the ETL design to dump the Map-Reduce data cubes to Cassandra cluster.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Worked on Horton works 2.3distribution
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases.

Environment: Apache Spark, Cassandra, Hortonworks (HDP) 2.3, AWS, Scala, Spark SQL, Hive, PIG, BigData.

Confidential, CA

Sr. Hadoop/Spark Developer


  • Involved in installation, configuration, supporting and managing Hadoop clusters, Hadoop cluster administration that includes commissioning & decommissioning of Data Node, capacity planning, slots configuration, performance tuning, cluster monitoring and troubleshooting on Hadoop Cloudera distribution .
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
  • Participated in development and execution of system and disaster recovery processes.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Worked with cloud services like Amazon Web Services (AWS).
  • Real streaming the data using Spark with Kafka and store the stream data to HDFS using Scala .
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Supporting Hadoop developers and assisting in optimization of Map Reduce jobs, Pig Latin scripts, Hive Scripts, and HBase ingest required.
  • Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implement Flume, Spark, and Spark Stream framework for real time data processing.
  • Developed analytical component using Scala, Spark and SparkStreaming.
  • Testing Hadoop components on sample datasets in local pseudo distribution mode.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.

Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, Kafka, HBase, Cassandra, Cloudera Distribution, Oozie, Ambari, Yarn, Shell scripting.

Confidential, Conway, AR

Hadoop/Spark Developer


  • Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Handle the data exchange between HDFS and RDBMS using Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop Hortonworks Distribution.
  • Developed several advanced Map Reduce programs to process data files received.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files into Hadoop.
  • Close monitoring and analysis of the MapReduce job executions on cluster at task level.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Bulk loaded data into Cassandra using SStable loader.
  • Created Cassandra tables using CQL to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS. Performed real time analysis on the incoming data.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
  • Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Developed Scala and SQL code to extract data from various databases.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Experienced with SOLR for indexing and search.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Back-end Java developer for Data Management Platform (DMP) and building RESTful APIs to build and let other groups build dashboards.
  • Worked closely with architect and clients to define and prioritize use cases and develop APIs.
  • Discussions with other technical teams on regular basis regarding upgrades, Process changes, any Special processing and feedback.

Environment: Hadoop, Java, Big Data, HDFS, MapReduce, Spark, Scala, Sqoop, Kafka, Oozie, HBase, Pig, Hive, Flume, Java Eclipse, LINUX.

Confidential, Dublin, OH

Hadoop Developer


  • Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
  • Conducted POC for Hadoop and Cassandra as part of Nextgen platform implementation includes connecting to Hadoop cluster and Cassandra ring and executing sample programs on servers.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop extensively to ingest data from various source systems into HDFS.
  • Bulk loaded data into Cassandra using SStable loader.
  • Hive was used to produce results quickly based on the report that was requested.
  • Integrated multiple sources data (SQL Server, DB2) into Hadoop cluster and analyzed data by Hive-HBase integration.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
  • Developed PIG UDFs for the needed functionality such as custom Pigs loader known as timestamp loader.
  • Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
  • Kerberos security was implemented to safeguard the cluster.
  • Tested the performance of the data sets on various NoSQL databases.
  • Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Spark, Scala, Sqoop, Kerberos, Java Eclipse, SQL Server, Oozie, Zookeeper, Shell Scripting.

Confidential, St. Louis, MO

Java/J2EE Developer.


  • Implemented Model View Controller (MVC) architecture Using Struts at the Web tier level to isolate each layer of the application to avoid the complexity of integration and ease of maintenance along with Validation Framework.
  • Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
  • Involved in Daily Scrum meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
  • Involved in the full life cycle of application development in Java as per use case specification.
  • Responsible for RFW (Request for work) analysis and implementation.
  • Worked on JSP with Struts as a unified MVC framework to develop admin functionality configured action mappings and the form beans in the struts-config.XML file.
  • Used Spring Core for Dependency Injection (DI)/Inversion of control (IOC) and Aspect Oriented Programming (AOP).
  • Implemented rich authentication and authorization features to ensure application is fully controlled with sophisticated and dependable security.
  • Used Hibernate, ORM technology for the database operations. Wrote HQL (Hibernate Query Language) queries as required.
  • Developed the SOAP based web service using the contract first principle by defining the XML schema.
  • Used Maven with Jenkins and SVN for version control.
  • Added tasks for Tax Payment website using Core Java
  • Tested, debugged and troubleshot different applications and components developed by the team and ensured effective resolution
  • Implemented Persistence layer using Hibernate to interact with Oracle.
  • Involved in WebLogic installation and configuration in GUI, and Interactive mode.
  • Used WebLogic Application Server for deploying various components of application on AIX.
  • Used PLSQL to write Stored Procedures.
  • Gained hands on experience in writing shell scripts in UNIX.

Environment: J2EE, Java, Struts framework, Hibernate3.0, JSP, Log4j, Web Services, WSDL, Oracle9i, WebLogic8.1, Maven, Jenkins, Eclipse.


Java/J2EE Developer


  • Involved in Analysis, Design, Development, Integration and Testing of application modules and followed AGILE/SCRUM methodology. Participated in Estimation size of Backlog Items, Daily Scrum and Translation of backlog items into engineering design and logical units of work(tasks).
  • Involved in developing a custom framework like Struts Framework, with more features to meet the business needs.
  • Performed requirement analysis, design, coding and implementation, team co-ordination, code review, testing, and installation.
  • Developed serverside utilities using J2EE technologies Servlets, JSP, Struts.
  • Developed presentation layers using JSP custom tags and JavaScript.
  • Implemented design patterns - Business Delegate, Singleton, Flow Controller, DAO and Value Object patterns.
  • Developed Role Based Access Control to restrict the users to access specific modules based on their roles.
  • Used Oracle as the backend application and used Hibernate Framework for O/R mapping.
  • Deployed the application on WebSphere server using Eclipse as the IDE.
  • Used Tomcat server 5.5 and configured it with Eclipse IDE.
  • Performed extensive Unit Testing for the application.

Environment: J2EE Custom Frame Work, WebSphere 5.1, Tomcat 5.0, Oracle 9i, Hibernate3.0, SAS 9, Eclipse 3.2, JSP, Java Script, Servlets, XML, Eclipse plug-ins (JUnit, Tomcat).


Jr. Java Developer


  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Designed tables and indexes.
  • Wrote complex SQL and stored procedures.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Developed user and technical documentation.

Environment: Java, JSP, Servlets, JDBC, HTML, JavaScript, MySQL, JUnit, Eclipse IDE.

Hire Now