We provide IT Staff Augmentation Services!

Sr. Spark Developer & Hadoop Engineer Resume

San Jose, Ca


  • IT Professional with 8 years of overall experience in Big Data Analytics as Spark and Hadoop Developer and analysis, architectural design, prototyping, development, Integration and testing of applications using Java/J2EE Technologies.
  • Ability to design, develop, deploy and support solutions using Agile Scrum methodology that leverage the Client and good understanding of various phases in Software Development life cycle (SDLC).
  • Experience in building highly scalable Big Data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (Flume, HBase, Cassandra, Couchbase and MongoDB).
  • Strong Knowledge and experience in Hadoop and Big Data Ecosystem including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, Zookeeper, Kafka, Strom, Sqoop, Flume, Oozie and Impala.
  • Strong experience on Spark Core, Spark Streaming, Hive Context, Spark Sql and MLlib for analyzing streaming data.
  • Expert in developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
  • Experience with Oozie Workflow Engine workflow jobs with actions that run Hadoop Map Reduce and Pig jobs. Good knowledge in Hadoop HDFS Admin Shell commands. Proficient in Big data ingestion tools like Flume, Kafka, Spark streaming and Sqoop for streaming and batch data ingestion.
  • Experience in designing and developing Enterprise applications using Java/J2EE technologies on Hadoop MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, J2EE tools & technologies like JDBC, Spring, struts, MVC, RAD, Hibernate, XML, JBoss, Apache Tomcat and IDEs tools Eclipse 3.0, My Eclipse, RAD.
  • Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets.
  • Experience in converting all file formats like Avro, ORC, RC, CSV, Json, Sequence, and Parquet for compressing and uncompressing data for Data analysis.
  • Experience in Data Extraction, Transformation and Loading (ETL) processes using (SSIS), DTS Import/Export Data, Bulk Insert, BCP and DTS Packages.
  • Hands on experience in Nosql database HBase scales linearly to handle huge data sets with billions of rows and millions of columns, which is use to provide read/write access to large datasets.
  • Proficiency in UNIX/Linux fundamentals in relation to UNIX scripting and administration, experience on Ubuntu, CentOS.
  • Expertise in using ApacheTomcat, JBoss, WebLogic, WebSphere.
  • Good knowledge on Machine learning and related tools like Octave, Python, and R libraries.
  • Strong experience on Scala using scala collections and Singleton object, Anonymous object, Companion object.
  • Technical skills encompass Java, J2EE (JDBC, Servlets, Custom tags, EJB, JMS, JNDI, JQuery, Struts, Web Services (SOAP, RESTFUL), Spring & Hibernate Frameworks, ORM,XML, HTML 5.0, DHTMLX, UML, JSON, JQuery, JSTL, Apaches Log4J, ANT, Maven, Shell Script and JavaScript).


BigData Technologies: HDFS, MapReduce, Pig, Hive, HBase, Sqoop, Flume, Oozie, Hadoop Streaming, Zookeeper, AWS, Kafka, Impala, Apache Spark, Apache Storm, YARN and Mahout

Hadoop Distributions: Cloudera (CDH4/CDH5), HortonWorks

Languages: Hadoop, Java, Scala, Python, C, C++, JavaScript, SQL

IDE Tools: Eclipse, NetBeans, IntelliJ IDEA, Microsoft visual studio

Framework: Hibernate, Spring, Struts, Junit

Web Technologies: HTML5, CSS3, JavaScript, JQuery, AJAX, Servlets, JSP, JSON, XML, XHTML, JSF, Angular JS

Web Services: SOAP, REST, WSDL, JAXB, and JAXP

Operating Systems: Windows (XP, 7, 8), UNIX, LINUX, Ubuntu, CentOS

Application Servers: JBoss, Tomcat, Web Logic, Web Sphere, Glass Fish

Tools: Adobe, SQL Developer, Flume, Sqoop and Storm

J2EE Technologies: JSP, Java Bean, Servlets, JPA1.0, EJB3.0, JDBC

Databases: Oracle, MySQL, DB2, Derby, PostgreSQL, No - SQL Database (HBase, Cassandra)


Confidential, San Jose, CA

Sr. Spark Developer & Hadoop Engineer

Roles & Responsibilities:

  • The main aim of the project is tuning the performance of the existing Hive Queries and preparing Spark jobs that are scheduled daily in Tez.
  • Member of Spark COE in Data Simplification project.
  • Extensively worked on Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Handled Data Skewness in Spark-SQL.
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Worked extensively on importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Extensively used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Worked under Mapr Distribution and familiar with HDFS.
  • Migrated tables from SQLServer to Cassandra, which are being used actively till date.
  • Worked on ETL off-loading from Teradata to Hadoop.
  • Worked on Real time streaming the data using Spark with Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Designed and Maintained Tez workflows to manage the flow of jobs in the cluster.
  • Worked with the testing teams to fix bugs and ensure smooth and error-free code.
  • Involved in preparation of docs like Functional Specification document and Deployment Instruction documents.
  • Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.

Environment: Hadoop, Spark Core, Spark-SQL, Spark-Streaming, MapReduce, HDFS, Hive, Java, Scala, Hue, SQL, Teradata, Pig, Sqoop, Tez, HBase, Cassandra, Zookeeper, PL/SQL, MySQL, DB2, Teradata

Confidential, Albany, NY

Sr. Spark Developer & Hadoop Engineer

Roles & Responsibilities:

  • Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment (Used IBM Big Insights Ver-4.1 platform).
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Worked on batch processing of data sources using Apache Spark.
  • Developed predictive analytics using Apache Spark Scala APIs.
  • Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
  • Developed Kafka producer and consumers, Hbase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in parquet format.
  • Implemented Spark Data Frames transformations, actions to migrate Map reduce algorithms.
  • Explored Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Data Frame developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).me API in Java for converting the distributed collection of data organized into named columns.
  • Automated and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Worked on Database designing, Stored Procedures, and PL/SQL.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Responsible for managing existing data extraction jobs, but also play a vital role in building new data pipelines from various structured and unstructured sources into Hadoop.
  • Effectively followed Agile Scrum methodology to design, develop, deploy and support solutions that leverage the Client big data platform.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
  • Design and code from specifications, analyzes, evaluates, tests, debugs, and implements complex software apps.
  • Developed Sqoop Scripts to extract data from DB2 EDW source databases into HDFS.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
  • VCreated Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Implemented Cloudera Manager on existing cluster. .
  • Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x
  • Designed & developed various SSIS packages (ETL) to extract & transform data & involved in Scheduling SSIS Packages.
  • Created ETL packages with different data sources (SQL Server, Flat Files, Excel source files, XML files etc.) and then loaded the data into destination tables by performing complex transformations using SSIS/DTS packages.
  • Troubleshooting experience in debugging and fixed the wrong data or data missing problem for both Oracle Database and Mongo DB.

Environment: HDFS, MapReduce, Java API, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark Streaming, Storm, Yarn, Eclipse, Spring, PL/SQL, Unix Shell Scripting, Cloudera

Confidential, Camden, NJ

Hadoop Developer

Roles & Responsibilities:

  • Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
  • We were getting on an average of 60 GB on daily basis. Overall the data warehouse for my project was having 4 PB of data and we used 110 node cluster to process the data
  • Developed different components of system like Hadoop process that involves Map Reduce, and Hive.
  • Developed interface for validating incoming data into HDFS before kicking off Hadoop process.
  • Wrote hive queries using optimized ways like user-defined functions, customizing Hadoop shuffle & sort parameters.
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs.
  • Developing map reduce programs for different types of Files using Combiners with UDF's and UDAF's.
  • Worked on multiple node cluster tool which offer several commands to return HBase usage.
  • Worked on creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
  • Used HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Worked on pre-processing the logs and semi structured content stored on HDFS using PIG.
  • Worked on structured data imports and exports into Hive warehouse which enables business analysts to write Hive queries.
  • Involved managing and reviewing Hadoop log files.
  • Worked on UNIX shell scripts for business process and loading data from different interfaces to HDFS.
  • Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS
  • Designed and implemented various metrics that can statistically signify the success of the experiment.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Responsible for processing ingested raw data using MapReduce, Apache Pig and Hive.
  • Developed Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Worked on Pivot the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
  • Effectively involved in few workshops on Spark, RDD & spark-streaming.

Environment: Linux 6.7, CDH5.5.2, MapReduce, Hive 1.1, PIG, HBase, Yarn, Hive, Pig, HBase, Oozie, Shell Script, AWS SQOOP 1.4.3, Eclipse, Java 1.8

Confidential, Minneapolis, MN

Java/J2EE Developer

Roles & Responsibilities:

  • Involved in all the Web module UI design and development using HTML, CSS, jQuery, JavaScript, Ajax.
  • Designed and modified User Interfaces using JSP, JavaScript, CSS and jQuery.
  • Developed UI screens using Bootstrap, CSS and jQuery.
  • Developed user interfaces using JSP, JSF frame work with AJAX, Java Script, HTML, DHTML, and CSS.
  • Implemented Spring AOP for admin services.
  • Involved in multi-tiered J2EE design utilizing MVC architecture Struts Framework, Hibernate and EJB deployed on Websphere Application Server connecting to an Oracle database.
  • Develop software in JAVA/J2EE, XML, Oracle EJB, Struts, and Enterprise Architecture
  • Developed and Implemented Web Services and used Spring Framework.
  • Implemented the caching mechanism in Hibernate to load data from Oracle database.
  • Implemented application level persistence using Hibernate and Spring.
  • Implemented Persistence layer using Hibernate to interact with the Oracle database used Hibernate Framework for object relational mapping and persistence.
  • Developed Servlets and JSPs based on MVC pattern using Spring Framework.
  • Maintained the business standards in EJB and deployed them on to WebLogic Application Server
  • Developed Rest architecture based web services to facilitate communication between client and servers.
  • Developed AJAX scripting to process server side JSP scripting.
  • Used the Eclipse as IDE, configured and deployed the application onto WebLogic application server using Maven.
  • Created applications, connection pools, deployment of JSPs, Servlets, and EJBs in Weblogic.
  • Created SQL queries, PL/SQL Stored Procedures, Functions for the Database layer by studying the required business objects and validating them with Stored Procedures using DB2. Also used JPA with Hibernate provider.
  • Implemented ftp utility program for copying the contents of an entire directory recursively up to two levels from a remote location using Socket Programming.
  • Wrote test cases using JUnit testing framework and configured applications on Weblogic Server.

Environment: Java, J2EE, Spring, Hibernate, Struts, JSF, EJB, MySql, Oracle, Sql Server, DB2, PL/SQL, JavaScript, JQuery, Servlets, JSP, HTML, CSS, Agile Methodology, Eclipse, Weblogic Application Server, UNIX, XML, Junit, SOAP, Restful Webservices, JDBC


Java/J2EE Developer

Roles & Responsibilities:

  • Developed front-end screens using JSP, HTML and CSS.
  • Developed server side code using Struts and Servlets.
  • Developed core java classes for exceptions, utility classes, business delegate, and test cases.
  • Developed SQL queries using MySQL and established connectivity.
  • Worked with Eclipse using Maven plugin for Eclipse IDE.
  • Designed the user interface of the application using HTML5, CSS3, JSP, and JavaScript.
  • Tested the application functionality with JUnit Test Cases.
  • Developed all the User Interfaces using JSP framework and Client Side validations using JavaScript.
  • Wrote Client Side validations using JavaScript.
  • Extensively used JQuery for developing interactive web pages.
  • Developed the user interface presentation screens using HTML, XML, and CSS.
  • Developed the Shell scripts to trigger the Java Batch job, Sending summary email for the batch job status.
  • Coordinated with the QA lead for development of test plan, test cases, test code and actual testing responsible for defects allocation and those defects are resolved.
  • Application was developed in Eclipse IDE and was deployed on Tomcat server.
  • Involved in Agile scrum methodology.
  • Supported for bug fixes and functionality change.

Environment: Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, Hibernate, WebLogic 8.0, HTML, AJAX, Java Script, JDBC, XML, UML, JUnit, Eclipse


Java Associate

Roles & Responsibilities:

  • Actively participated in all phases of the Software Development Life Cycle SDLC.
  • Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing data from GUI Layer to Business Layer)
  • Developed web interface for user's modules using JSP, HTML, XML, CSS, Java script, AJAX.
  • Developed using J2EE design patterns like Command Pattern, Session Facade, Business Delegate, Service Locator, Data Access Object and value object patterns.
  • Used J-Unit test cases to test the application and performed random checks to analysis the portability, reliability, and flexibility of the project.
  • Analyzed, designed and implemented Online Enrollment Web Application using Struts, JSTL, Hibernate, UML, Design Patterns and Log4J.
  • Used advanced level JQUERY, AJAX, JavaScript, CSS and pure CSS layouts and database using JDBC for ORACLE.
  • Involved in writing application level code to interact with APIs, Web Services using AJAX, JSON and XML.
  • Created Servlets and Java Server Pages, which route submittals to the appropriate Enterprise Java Bean EJB.
  • Development process the SCRUM, Iterative Agile methodologies for web application.
  • Responsible for the performance PL/ SQL procedures and SQL queries.
  • Involved in deployment components on Weblogic application server.
  • Deployed applications on Linux client machines.

Environment: Java EE 5, JSP 2.0, Java Bean, EJB3.0, JDBC, Application Server, Eclipse, Java API, J2SDK 1.4.2, JDK 1.5, JDBC, JMS, Message queues, Web services, UML, XML, HTML, XHTML, JavaScript, log4j, CVS, Junit, Windows and Sun OS 2.7/2.8

Hire Now