We provide IT Staff Augmentation Services!

Sr. Hadoop Spark Developer Resume

3.00/5 (Submit Your Rating)

Pleasanton, CA


  • Overall 7 + years of IT experience in a variety of industries, which includes 5+ years of experience large scale application using Java/J2EE, Big Data, Scala and Spark Technologies.
  • Excellent Programming skills Confidential a higher level of abstraction using Scala, Java and Python.
  • Excellent knowledge on Hadoop ecosystems such as like HDFS, Map Reduce, Cloudera, Horton works, Mahout, HBase, Oozie, Hive, Sqoop, Pig, and Flume Programming paradigm.
  • Knowledge on implementing Big Data in Amazon Elastic Map Reduce (Amazon EMR) for processing, managing Hadoopframework dynamically scalable Amazon EC2 instances.
  • Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
  • Experience in performance tuning of spark applications by using coalesce, repartitioning, broadcast variables and tuning spark executor memory.
  • Has good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Used Spark SQL, HQL queries for analyzing the data in HDFS.
  • Good understanding of NoSQL databases and hands on work experience in writing applications NoSQL data bases like HBase, Cassandra and Mongo DB.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Hand - on experience in using Scala, Spark Streaming, batch processing for processing the Streaming data and batch data.
  • Experienced in writing complex Map Reduce programs that work with different file formats like Text, Sequence, XML, JSON and Avro.
  • Has working experience on Cloudera Data Platform using VMware Player, Cent OS 6 Linux Environment.
  • Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Good working experience using Sqoop to import data into HDFS or Hive from RDBMS and exporting data back to HDFS or HIVE from RDBMS.
  • Worked on ETL tools like Talend to simplify Map Reduce jobs from the front end. Also has knowledge of Pentaho and Informatics as another working ETL tool with Big Data.
  • Worked with BI tools like Tableau for report creation and further analysis from the front end.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, Spring, Hibernate, JDBC.
  • Good experience in working with various Python Integrated Development Environments like PyCharm, PyScripter, Spyder, PyStudio, PyDev, IDLE, NetBeans and Sublime Text.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Experience working with Build tools like Maven and Ant.
  • Experienced in both Waterfall and Agile Development (SCRUM) methodologies
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions
  • Experience in developing service components using JDBC.
  • Coordinated with the Offshore and Onshore teams for Production Releases.
  • Good analytical, problem solving, communication skills and has the ability to work either independently with little or no supervision or as a member of a team.


Hadoop Technologies: Apache Hadoop, Cloud era Hadoop Distribution (HDFS and Map Reduce) Technologies HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie

Java/J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts

NOSQL Databases: Hbase, MongoDB

Programming Languages: Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting

Web Technologies: HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML

Application Servers: Web Logic, Web Sphere, JBoss

Cloud Computing tools: Amazon AWS.

Build Tools: Jenkins, Maven, ANT

Databases: MySQL, Oracle, DB2

Business Intelligence Tools: Splunk,Talend

Development Methodologies: Agile/Scrum, Waterfall.

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans.

Operating Systems: WINDOWS, MAC OS, UNIX, LINUX.


Confidential, Pleasanton, CA

Sr. Hadoop Spark developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/Map Reduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Good Experience working with Amazon AWS for setting up Hadoop cluster.
  • Knowledge on Amazon EC2 Spot integration & and Amazon S3 integration.
  • Optimizing the EMRFS for Hadoop to directly read and write in parallel to AWS S3 performant.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Environment: Hadoop YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Cloudera, Oracle 10g, Linux.

Confidential, Houston, TX

Hadoop Spark developer


  • Integrate Apache Spark with Hadoop components
  • Extensive experience in writing HDFS and Pig Latin commands.
  • Developed complex queries using HIVE and IMPALA.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Written Hive and Pig scripts as per requirements.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Developed Spark Application by using Scala
  • Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
  • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing Used Spark Data frames, Spark-SQL extensively.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
  • Performance tuning using Partitioning, bucketing of IMPALA tables.
  • Experience in NoSQL database such as HBase, MongoDB.
  • Successful in creating and implementing complex code changes.

Environment: Hadoop, AWS, Java, HDFS, Map Reduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.

Confidential, NY

Hadoop Developer


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. me was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
  • Worked on Installation and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources.
  • Implemented test scripts to support test driven development and continuous integration.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig
  • Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
  • Actively involved in code review and bug fixing for improving the performance.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements. Processed the raw data using Hive jobs and scheduling them in Crontab.
  • Helped the Analytics team with Aster queries using HCatlog.
  • Good Experience with apache storm using Horton Works cluster.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Developed the verification and control process for daily load.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs

Environment: Hadoop, HDFS, Pig, Hive, Sqoop, Solr, Shell Scripting, HBase, Kerberos, Zoo Keeper, Ambari, Horton Works, MySQL.

Confidential, Wilmington, DE

Hadoop Developer


  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications. installed Oozie workflow engine to run multiple Hive and Pig jobs. Used Scala collection framework to store and process the complex consumer information. Used Scala functional programming concepts to develop business logic.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce.
  • Managed and reviewed Hadoop log files. Installing and deploying IBM Web-sphere. Installing and deploying IBM Web-sphere.
  • Implemented the NoSQL database HBase and the management of the other tools and process observed running on YARN
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig. Analyze, validate and document the changed records for IBM web application.
  • Setup and benchmarked Hadoop/HBase clusters for internal use. Assist the development team to install single node Hadoop 224 in local machine.
  • Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Written Hive queries for data analysis to meet the business requirements.
  • These new data items will be used for further analytics/reporting purpose. It TEMPhas Cognos reports as the BI component.Analysis with data visualization player Tableau. Writing Pig scripts for data processing.
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Validating the data using MD5 algorithms.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
  • Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard. Loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
  • Used AVRO, Parquet file formats for serialization of data.
Environment: Big Data/Hadoop, Python, Java, Agile, Spark Streaming, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, NoSQL, HBase, IBM WebSphere, Tomcat and Tableau.


Java/ J2EE Developer


  • Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Implemented Persistence layer using Hibernate to interact with the Oracle database used Hibernate Framework for object relational mapping and persistence.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
  • Developed user interfaces using JSP, JSF frame work with AJAX, Java Script, HTML, DHTML, and CSS.
  • Develop software in JAVA/J2EE, XML, Oracle EJB, Struts, and Enterprise Architecture.
  • Developed Servlets and JSPs based on MVC pattern using Spring Framework.
  • Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
  • Used Struts tiles libraries for layout of web page, and performed struts validations using Struts validation framework.
  • Involved in Daily Scrum meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo Confidential end of the sprint.
  • Experience in writing PL/SQL Stored procedures, Functions, Triggers, Oracle reports and Complex SQL’s.
  • COTS Evaluation and implementation for reporting tool, that resulted in choosing Business Objects.
  • Experience in developing Unit testing & Integration testing with unit testing frameworks like JUnit, Mockito, TestNG, Jersey Test and Power Mocks.
  • Designed and developed entire application implementing MVC Architecture.
  • Developed frontend of application using Bootstrap (Model, View, and Controller), Java Script, and Angular.js framework.
  • Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security.
  • Involved in Java, J2ee, Spring 4.0, Restful Web Services, WebSphere 5.0/6.0 in a fast-paced development environment.
  • Proficient in developing applications having exposure to Java, JSP, UML, Servlets, Struts, Swing DB2, Oracle (SQL, PL/SQL), HTML, Junit, JSF, Java Script, CSS.
  • Proactively found the issues and resolved them.
  • Established efficient communication between teams to resolving the issues.
  • Gave an innovative for logging for all interdepends application.

Environment: java, J2EE, Spring, Hibernate, Struts, JSF, EJB, MYSQL, Oracle, SQL Server, DB2, PL/SQL, JavaScript, JQuery, Servlets, JSP, HTML, CSS, Agile Methodology, Eclipse, WebLogic Application Server, UNIX, XML, Junit, SOAP, Restful Web services, JDBC.


Java /J2EE Developer


  • Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
  • Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing data from GUI Layer to Business Layer)
  • Developed web interface for user's modules using JSP, HTML, XML, CSS, Java script, AJAX.
  • Developed using J2EE design patterns like Command Pattern, Session Facade, Business Delegate, Service Locator, Data Access Object and value object patterns.
  • Analyzed, designed and implemented Online Enrollment Web Application using Struts, JSTL, Hibernate, UML, Design Patterns and Log4J.
  • Developed Custom tags, JSTL to support custom User Interfaces.
  • Designed the user interfaces using JSP.
  • Designed and Implemented MVC architecture using Struts Framework, Coding involves writing Action Classes/Custom Tag Libraries, JSP.
  • Experienced in MS SQL Server 2005, writing Stored Procedures, SSIS Packages, Functions, and Triggers & Views.
  • Developed Action Forms and Controllers in Struts 1.2 framework. Utilized various Struts features like Tiles, tagged libraries and Declarative Exception Handling via XML for the design.
  • Development process the SCRUM, Iterative Agile methodologies for web application.
  • Implemented Business processes such as user authentication, Account Transfer using Session EJBs
  • Worked with Oracle Database to create tables, procedures, functions and select statements.
  • Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
  • Developed the Dao's using SQL and Data Source Object.
  • Developed Stored Procedures, Triggers, Views, and Cursors using SQL Server 2005.
  • Development carried out under Eclipse Integrated Development Environment (IDE).
  • Used JBoss for deploying various components of application.
  • Used Ant for building Scripts.
  • Used JUNIT for testing and check API performance.

Environment: Java EE 5, JSP 2.0, Java Bean, EJB3.0, JDBC, Application Server, Eclipse, Java API, J2SDK 1.4.2, JDK 1.5, JDBC, JMS, Message queues, Web services, UML, XML, HTML, XHTML, JavaScript, log4j, CVS, Junit, Windows and Sun OS 2.7/2.8.

We'd love your feedback!