We provide IT Staff Augmentation Services!

Sr. Big Data/ Spark Developer Resume

Baltimore, MD


  • Over 9+ years of IT experience in software analysis, design, development and implementation of Big Data, Hadoop and Java/J2EE technologies.
  • Good experience in ETL tool Informatica, Managing/maintaining the Hadoop cluster with the help of Apache Ambari
  • Worked on migration project from Oracle DB to Hadoop environment thus enhancing the business to next level.
  • Installed and configured Hive, HDFS and the NiFi, implemented CDH cluster. Assisted with performance tuning and monitoring.
  • Expertise in web development applications using Core Java, Servlets, JSP, EJB, JDBC, XML, XSD, XSLT, RMI, JNDI, Java Mail, XML Parsers (DOM and SAX), JAXP, JAXB, Java Beans etc.
  • Good Understanding of RDBMS through Database Design, writing queries using databases like Oracle, SQL Server, DB2 and MySQL.
  • Experience on Unit testing using JUnit, TDD, and BDD.
  • Experience in modeling applications with UML, Rational Rose and Rational Unified Process (RUP).
  • Experience in using CVS and Rational Clear Case for version control.
  • Good Working Knowledge of Ant & Maven for project build/test/deployment, Log4j for logging and JUnit for unit and integration testing.
  • Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables
  • In depth knowledge of Spark concepts and experience with Spark in Data Transformation and Processing.
  • Extensive experience in developing Pig Latin Scripts for transformations and using Hive Query Language for data analytics.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of.
  • Expertise in writing Apache Spark streaming API on Big Data distribution in the active cluster environment.
  • Hands on experience working on NoSQL databases including HBase, Cassandra and its integration with Hadoop cluster.
  • Good Experienced in developing User interfaces using JSP, HTML, DHTML, CSS, Java Script, AJAX, JQuery and Angular JS
  • Implementing database driven applications in Java using JDBC, XML API and using hibernate framework.
  • Expertise in using J2EE Application Servers such as Web Logic 10.3, Web sphere 8.2 and Web Servers such as Tomcat 6.x/7.
  • Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
  • Experience in working with MapReduce programs, Pig scripts and Hive commands to deliver the best results.
  • Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
  • Experienced with IBM Web Sphere Application Server, Oracle Web Logic application servers and Apache Tomcat Application Server.
  • In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, Elastic Search and Zookeeper.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.


Hadoop Ecosystem: Hadoop 3.0, MapReduce, Sqoop, Hive 2.3, Oozie, Pig 0.17, HDFS 1.2.4, Zookeeper, Flume 1.8, Impala 2.1, Spark 2.2, Storm, Hadoop (Cloudera), Hortonworks and Pivotal).

NoSQL Databases:: HBase 1.2, MongoDB 3.6 & Cassandra 3.11

Java/J2EE Technologies:: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

Programming Languages:: Java, Python, SQL, PL/SQL, HiveQL, Unix Shell Scripting, Scala 2.12

Cloud Platform: AWS EC2, AWS Configured and S3, Microsoft Azure.

Methodologies: Agile, RAD, JAD, RUP, Waterfall & Scrum

Database:: Oracle 12c/11g, MYSQL, SQL Server 2016/2014

Web/ Application Servers:: WebLogic, Tomcat, JBoss

Web Technologies:: Html5, CSS3, XML, JavaScript, JQuery, AJAX, WSDL, SOAP

Tools: and IDE:: Eclipse, NetBeans, Maven, DB Visualizer, SQL Server Management Studio

Version Control Tools:: SVN, GIT, GITHUB, TFS, CVS and IBM Rational Clear Case

Major Skills: Java, J2EE, Spark, Apache SparkSQL, Spark Streaming, Agile, SDLC, Nifi, Eclipse, Html, JavaScript, JQuery, SVN, Git, Maven, MongoDB, HDFS, Hive, Hadoop, AWS, MapReduce, Scala, Sqoop, PIG, Python, Pl/Sql, core Java, Advance Java, XML.


Confidential - Baltimore, MD

Sr. Big Data/ Spark Developer


  • As a Big Data/Spark Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
  • Maintain Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Prepared data analytics processing, and data egress for availability of analytics results to visualization systems, applications, or external data stores.
  • Builds large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
  • Responsible for design and development of Spark SQL Scripts based on Functional Specifications.
  • Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Provisioning of Cloudera Director AWS instance and adding Cloudera manager repository to scale up Hadoop Cluster in AWS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Developed Spark Applications by using Scala, Java, and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
  • Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Designed and developed automation test scripts using Python
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Writing Pig-scripts to transform raw data from several data sources into forming baseline data.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
  • Developed data formatted web applications and deploy the script using HTML, XHTML, CSS, and Client- side scripting using JavaScript.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Assisted in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.

Environment: Hadoop 3.0, Spark 2.3, MapReduce, Java, MongoDB, HBase 1.2, JSON, Hive 2.3, Zookeeper 3.4, AWS, MySQL, Scala 2.12, Python, Cassandra 3.11, HTML5, JavaScript

Confidential - Bellevue, WA

Sr. Big data/Hadoop Developer


  • Extensively involved in Design phase and delivered Design documents in Hadoop eco system with HDFS, HIVE, PIG, SQOOP and SPARK with SCALA.
  • Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's written in Java or Python
  • Involved in creating Data Vault by extracting customer's Big Data from various data sources into Hadoop HDFS
  • Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
  • Developed Shell and Python scripts to automate and provide Control flow to Pig scripts.
  • Worked on designing NoSQL Schemas on HBase.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Involved in creating Data Vault by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, Oracle, SQL Server, MongoDB, Cassandra, HBase, Teradata, Netezza and also log data from servers.
  • Designed Data flow to pull the data from Rest API using Apache Nifi with SSL context configuration enabled.
  • Involved in integrating HBase with Spark to import data into HBase and also performed some CRUD operations on HBase.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Experienced on MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for Input and Output.
  • Used the Teradata fast load/Multi load utilities to load data into tables.
  • Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
  • Used HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop Data Lake.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Performed data validation against source system data for analyzing the existing database source files and tables to ingest data into Hadoop Data Vault.
  • Used AWS to produce comprehensive architecture strategy for environment mapping.
  • Implemented Spark RDD transformations, actions to migrate MapReduce algorithms
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
  • Involved in developing Stored Procedures for fetching data from Greenplum and created workflow using Apache Nifi.
  • Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Wrote extensive Map reduce jobs in java to train the cluster and developed Java map reduce programs for the analysis of sample log files stored in cluster.
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.

Environment: Hadoop 3.0, AWS, HDFS, Pig, Hive 2.3, MapReduce, AWS S3, Scala 2.1, Sqoop, SparkSQL, Spark Streaming, Spark LINUX, Teradata 14, Oracle 11g, Java, Python.

Confidential - Hillsboro, OR

Hadoop Developer


  • Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapR distribution.
  • Implemented J2EE Design Patterns like DAO, Singleton, and Factory.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
  • Developed the Java/J2EE based multi-threaded application, which is built on top of the struts framework.
  • Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
  • Implemented MapReduce jobs in HIVE by querying the available data.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
  • Implemented application using MVC architecture integrating Hibernate and spring frameworks.
  • Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax

Confidential - Manchester, NH

Sr. Java/ Hadoop Developer


  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Extracted files from Cassandra through Sqoop and placed them in HDFS and processed them.
  • Performed data modeling to connect data stored in Cassandra DB to the data processing layers and wrote queries in CQL.
  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC) using Agile software development methodology.
  • Used Rational Rose for developing Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
  • Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Implemented Model View Controller (MVC) architecture using Spring Framework.
  • Worked on Java Beans and other business components for the application and implemented new functionalities for the ERIC application.
  • Developed various SQL queries and PL/SQL Procedures in Oracle db for the Application
  • Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Involved in implementation of the presentation layer (GUI) for the application using JSF, HTML4, CSS2/3 and JavaScript.
  • Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Automated all the jobs from pulling data from databases to loading data into SQL server using shell scripts.
  • Developed integration services using SOA, Mule ESB, Web Services, SOAP, and WSDL.
  • Designed UI screens using JSP 2.0 and HTML. Using JavaScript for client side validation.
  • Actively involved in designing and implementing Singleton, MVC, and Front Controller and DAO design patterns.
  • Used log4j to log the messages in the database.
  • Performed unit testing using JUNIT framework.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
  • Used Hibernate to access the database and mapped different POJO classes to the database tables and persist the data into the database.
  • Used Spring Dependency Injection to set up dependencies between the objects.
  • Developed Spring-Hibernate and struts integration modules.
  • Developed Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting
  • Implemented scripts for loading data from UNIX file system to HDFS.
  • Integrated Struts application with Spring Framework by configuring Deployment descriptor file and application context file in Spring Framework.

Environment: Hadoop, Hive, HDFS, Sqoop, Spark, Java, Hibernate 4.0, Oracle 10g, HTML3, CSS2/3, SQL Server 2012, Spring 3.1 framework, Spring Model View Controller (MVC), Servlets 3.0, JDBC4.0, AJAX, Web services, Rest full, JSON, JQuery, JavaScript


Java/ J2EE Developer


  • Analysis and understanding of business requirements and implement the process using Agile (Scrum) methodology
  • Followed Test driven development of Agile Methodology to produce high quality software.
  • Developed the J2EE application based on the Service Oriented Architecture
  • Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
  • Used Hibernate to access Oracle database for accessing customer information in this application.
  • Used Maven script to create WAR and EAR files to work on Defects/Bug fixes as per Weekly Sprint Planning
  • Worked on developing the REST web services and integrating with them from the front-end.
  • Designed and developed the communication tier to exchange data through JMS & XML over HTTP .
  • Used Object-oriented development techniques such as UML for designing Use case, Sequence, Activity and Class and Object diagrams.
  • Used Hibernate as ORM tool to store the persistence data into the MySQL database.
  • Developed application using Spring MVC, JSTL (Tag Libraries) and AJAX on the presentation layer, the business layer is built using spring and the persistent layer uses Hibernate.
  • Developed Web services for consuming Stock details and Transaction rates using JAX-WS and Web services Template.
  • Developed PL / SQL stored procedures and extensively used HQL .
  • Used Spring to develop light weight business component and Core Spring framework for Dependency injection.
  • Developed the project using Waterfall methodologies and Test Driven Development .
  • Code review with the Clients using SmartBear tool.
  • Developed the presentation layer and GUI framework based on spring framework involving JSP , HTML , JavaScript , AJAX , CSS .
  • Configured different layer (presentation layer, server layer, persistence layer) of application using Spring IOC and maintained the Spring Application Framework's IOC container.
  • Implemented Java classes to read data from XLS and CSV Files and to store the data in backend tables using Web Frame APIS.
  • Configured faces-config.xml and navigation.xml to set all page navigations and created EJB Message Driven Beans to use asynchronous service to perform profile additions.
  • Used various Core Java concepts such as Exception Handling, Collection APIs to implement various features and enhancements.
  • Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a WebLogic Application server.
  • Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP , DHTML and JavaScript .
  • Used CVS as version control system for the source code and project documents.
  • Designed and developed a Batch process to for VAT .
  • Followed Test Driven Development ( TDD ), Scrum concepts of the Agile Methodology to produce high Quality Software.
  • Actively participated in development of user interfaces and deploying using web logic Application server.

Environment: Core Java1.5, JSP2.1, JQuery, JavaScript, AJAX, HTML, CSS, XML, WSDL2.0, SOAP, JAX-WS, Struts 2.0 Springs Framework, Struts Tiles, Spring2.5, Hibernate 3.5, SOA, EJB 2.0, MDB, JMS, RAD, WSAD 6.1, DB2, Ivy, UML, Rational Rose, UNIX, Log4j, JUnit, Ant, JSF.

Hire Now