We provide IT Staff Augmentation Services!

Hadoop And Spark Developer Resume

Fremont, CA


  • Having 6+ years of relevant experience in IT Industry in Designing, Development and Implementation of Big data, Hadoop, MapReduce, Pig, Hive, Sqoop, NiFi, Oozie, Zookeeper, Flume with CDH4&5 distributions
  • Replaced existing jobs with Spark data transformations for efficient data processing and performance
  • Developed the Hive UDF's (User Defined Functions) to preprocess the data for analysis.
  • Wrote Hive Queries, Pig Scripts for data analysis to meet the requirements
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Having experience on developing RDD, DATA Frames and SQL Queries in Spark SQL
  • Worked with different flavors of Hadoop distributions which includes Cloudera and Hortonworks.
  • Having good knowledge of Spark SQL and Spark using Scala.
  • Hands on experience with Spark batch process and did POC on kafka - spark streaming process
  • Experienced in using IDEs and Tools like Eclipse, NetBeans, GitHub, Maven and IntelliJ
  • Monitored and managed Hadoop cluster using the Cloudera Manager web-interface.
  • Hands Experience with NOSQL Databases like HBASE and MongoDB.
  • Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
  • Experience with Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
  • Implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Experience in analyzing the data using HQL, Pig Latin, HBase and custom Map Reduce programs in Java
  • Experienced in data formats like JSON, PARQUET, AVRO, RC and ORC formats
  • Utilized Flume to analyze log files and write into HDFS.
  • Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement
  • Worked with YARN, MESOS and Spark default schedulers.
  • Having experience in working with data ingestion, storage, processing and analyzing the big data.
  • Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Used GitHub version control tool to push and pull functions to get the updated code from repository


Hadoop and Spark Developer

Confidential - Fremont, CA


  • Worked with large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Hive and NoSQL database
  • Created Partitions and Bucketing on the hive tables for improving the performance of the process and faster access of data from hive table.
  • Data is exported by sqoop export to RDBMS from HDFS which is useful to BI team to analyze and for generating reports.
  • I had worked with 35 Node cluster it contains 2 Name Nodes and 33 Data nodes. Size of each name node is 3TB
  • Worked with spark eco system using Spark SQL queries on data formats like Text file, CSV file and XML files
  • Performed Coalesce and repartition on the Data Frames.
  • Used Snappy Technique for Saving the Storage in the HDFS.
  • Worked with Shell-Scripting to clean the base data.
  • Having good knowledge of Spark SQL and Spark using Scala.
  • Created RDD'S, Data Frame, Dataset to process the data using spark.
  • Used Impala for faster and better processing of data.
  • Spark and Hive is integrated for Business Requirements.
  • Worked with ORC, Parquet, Avro, Json File formats
  • Used Hive joins for joining the multiple tables to achieve the business requirements.
  • Replacing Spark SQL with Hive QL for good performance.
  • Worked with Agile Methodology.
  • Migrate the data from all existing jobs too Spark for better performance and to save the time of execution.
  • Created Hive UDF's for hive table depending on the business requirement.
  • After processing the data with hive and Spark will send the data to BI people for analysis.
  • Used Oozie for Automation of tasks.
  • Created nifi flows to trigger spark jobs.In case if we have any failures we got email notifications regarding the failures.
  • Created Pig scripts to transform the HDFS data and loaded the data into HIVE external table.
  • Having Experience working with NOSQL database like HBASE for loading the huge amount of semi-structure coming from different sources.
  • Monitoring all the nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
  • Exported the data using Sqoop to RDBMS and processed the data for ETL operations.
  • Import the data from different sources like HDFS/HBase into SparkRDD
  • Having experience in developing POC on streaming data using Apache Kafka, Spark Streaming

Environment: Hadoop, HDFS, Oozie, Hive, Impala, HBase, Spark, Intellij, Linux, Java, Horton Works, Nifi, MapReduce, Sqoop, Shell Scripting, Apache Kakfa

Hadoop Developer

Confidential - Fremont, CA


  • Developed automated scripts to install Hadoop clusters
  • Worked on loading and transforming of large sets of structured, semi structured and unstructured data
  • Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
  • I had worked with 25 Node cluster and it contains 2 Name nodes and 23 Data Nodes. Size of Each data node is
  • 3TB.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Used Bzip2 compression technique to compress the files before loading it to Hive
  • Performed load and retrieve unstructured data.
  • Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
  • Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows
  • Used SQOOP to export the analysed data to relational database for analysis by data analytics team.
  • Implemented the workflows using Apache Oozie framework to automate tasks
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Data analysis in running Hive queries.
  • Used Impala for faster processing of Data.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Responsible for building scalable distributed data pipelines using Hadoop.
  • Assisted in exporting analyzed data to relational databases using Sqoop
  • Implemented Partitioning, Bucketing to hive table to meet the business requirements
  • Load and Transform large datasets of structured and semi-structured which includes different file formats like Avro, Parquet and Sequence file

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Scala, Hbase, Impala, Java.Eclipse.

SQL Developer



  • Developed ER diagrams/ Dimensional diagrams for business requirements using ERWIN.
  • Involved in the data analysis and data discrepancy reduction for the source and target schemas.
  • Developed SQL Scripts to perform nested querying, join, subqueries, Insert, Update and Delete data in My Sql database tables.
  • Involved in the data analysis and data discrepancy reduction for the source and target schemas.
  • Worked with developing and implementing Stored Procedures, packages and triggers.
  • Worked to extract the data from xml file to Sql Table and with the help of SQLServer 2008 data file reporting is produced.
  • With the help of MS SQL SERVER data connection was built to the database
  • Worked for the advance SQL queries, procedure, cursor and triggers.

Environment: My SQL, SQL Server 2008(SSRS & SSIS), PL\SQL, Visual studio 2000/2005

Java Developer



  • Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
  • Designed the user interfaces using JSPs, developed custom tags, and used JSTL Taglib.
  • Developed various java business classes for handling different functions.
  • Developed controller classes using Struts and tiles api
  • Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case Transaction diagrams.
  • Participated in design and code reviews
  • Developed User Interface using AJAX in JSP and also performed client-side validation
  • Developed JUnit test cases for all the developed modules. Used SVN as version control

Environment: Java, J2EE, JSP, Struts 1.x, JNDI, DB2, HTML, XML, DOM, SAX, ANT, AJAX, Rational Rose, Eclipse Indigo 3.5, SOAP, Apache Tomcat, Oracle 10g, LOG4J, SVN.

Hire Now