We provide IT Staff Augmentation Services!

Hadoop /spark Developer Resume



  • Over 8+ years of Professional experience in IT Industry involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java, J2EE, MicroServicesImplementations on AWS platform.
  • Hands - on experience on Pig, Hive, HBase, Sqoop, Flume, Oozie and MapReduce framework which are the major components of Hadoopecosystem, and providing ETL Solutions in various Data Integration and Data Warehouse and Hive data modeling.
  • S3 and other relevant Servicesrequired for implementing Micro ServicesDesign strategy.
  • Excellent knowledge of Hadooparchitecture and on Hadoopdaemons such as Name Node, Data Node.
  • Knowledge on Apache NIFIfor real-time analytical processing.
  • Created ETL/Talendjobs both design and code to process data to target databases
  • Experience in writing Pig and Hive scripts and extending Hive and Pig core functionality by writing custom UDFs.
  • Created prototype for large data set analytics using Python, Hive, Amazon Web Services that enables business efficiency.
  • Map Reduce programs and performance tuning of the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Hands on experience in Importing and exporting data from different databases like MySQL, MongoDB, Cassandra, Oracle, Teradata and Netezza into HDFS and vice-versa using Sqoop.
  • Have good experience creating real time data streaming solutions using Apache Spark/ Spark Streaming / Apache Storm, Kafka and Flume.
  • Create a hive table view in Impalaand validate the table and get the sampling data.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Web Services, Oracle, SQL Server and other relational databases.
  • Automation tasks involving Linux's Bash Shell Scripting, PL/Sql, SQL Scriptingfor monitors, and custom reports.
  • Extensive experience in Scriptinglanguages - bash shell scriptingand Python and UNIX/Linux Commands.
  • Experience in developing applications using Java, J2EE, Spring framework, SOAP/RESTful web services, Mark logic.
  • Load and transform data into HDFS from large set of structured data from Oracle/Sql server using TalendBig data studio.
  • Working with data delivery teams to setup new Hadoop users, which includes setting up Linux users, setting up Kerberosprincipals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Experience in utilizing Java tools in business, Web, and client-server environments including Java Platform J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
  • Good experience in developing and implementing web applications using Java, CSS, HTML, HTML5, XHTML, Java script, JSON, XML and JDBC.
  • A great team player& ability to effectively communicate with all levels of the organization such as technical, management and customers.


Big Data Technologies: Hadoop, MapReduce, Hdfs, Hive, Pig, HBase, Sqoop, Flume, Zookeeper, Oozie, Kafka, Yarn, Spark,Scala MongoDB and Cassandra.

Databases: Oracle, MySQL, Teradata, Microsoft SQL Server, MS Access,DB2 and NOSQL

Programming Languages: C, C++, Java, J2EE, Scala, SQL, PL/SQL and Unix Shell Scripts,Bash Shell Scripting.

Frameworks: MVC, Struts, Spring, Junit and Hibernate

Development Tools: Eclipse, NetBeans, Toad, Maven and ANT

Web Languages: XML, HTML, HTML5, DHTML, DOM, JavaScript, AJAX, JQuery, JSON and CSS

Operating Systems & others: Linux(Cent OS, Ubuntu), Unix, Windows XP, Server 2003, Putty, Winscp, FileZilla, AWS and Microsoft Office Suite


Confidential - NJ



  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Oozie, Zookeeper, HBase, Flume and Sqoop.
  • Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
  • Development pipelinedesigns and local, state and federal pipelinerelocations.
  • Worked in a team with 30 node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
  • Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
  • Involved in converting the files in HDFS into RDD's which are multiple data formats and performing Data Checking using RRD Operations.
  • Used Different Spark Modules like Spark core, Spark RDD's, Spark Dataframe, Spark SQL.
  • Loading data from different source (database & files) into hive using Talend tool.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Created data pipeline for different Drugs from mobile, to filter and load consumer response data by using AWS S3 bucket into Hive external tables in HDFS location.
  • Used Apache kafka to get the data from kafka producer which in turn pushes data to broker.
  • Used various Hive optimization techniques like partitioning, bucketing and Mapjoin.
  • Designed and built unit tests and executed operational queries on HBase.
  • Implemented a script to transmit information from Oracle to HBase using Sqoop.
  • Worked on migrating MapReduce Pythonprograms into Spark transformations using Spark
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Experience in working with NoSQL database HBase in getting real time data analytics using Apache Spark with Python.
  • Installed Oozie workflow engine to run multiple Map Reduce, HiveQL and Pig jobs.
  • Implemented a script to transmit information from Webservers to Hadoop using Flume.
  • Used Zookeeper to manage coordination among the clusters.
  • Used Apache Kafka and Apache Storm to gather log data and fed into HD FS.
  • Developed Scala program for data extraction using Spark Streaming.
  • Write Spark Map Reduce code in Java for data preprocessing when consuming unstructured text data.
  • Setting up and managing Kafka for Stream processing.
  • Created Produce, consumer and Zookeeper setup to Kafka replication.
  • Integrate Splunk with AWSdeployment using puppet to collect data from all EC2 systems into Splunk
  • Experienced with batch processing of data source using Apache Spark and Elastic search.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

Environment: Hadoop, MapReduce, HDFS, Hive, Cloudera, Core Java, Scala, SQL, Flume, Spark, Pig, Sqoop, Oozie, impala, Pyhton, AWS, HBase, Kafka, Cassandra, ETL, Oracle, Unix.




  • Design and develop components of big data processing using HDFS, MapReduce, PIG, and Hive.
  • Analyzed data using Hadoop components Hive and Pig.
  • Import the data from different sources like HDFS/HBase into Kafka.
  • Wrote MapReduce jobs using Scala and Pig Latin.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on NoSQL databases including HBase and Cassandra. Configured SQL Database to store Hive Teradata.
  • Producing reports and documentation for all Automated testingefforts, results, activities, data, logging and tracking.
  • Used SQLProfiler for troubleshooting, monitoring, optimization of SQLServer and non-production database code as well as T-SQLcode from developers and QA.
  • Participated in development/implementation of Cloudera impala Hadoop environment
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on developing applications in Hadoop Big Data Technlogies-Pig, Hive, Map-Reduce, Oozie.
  • Worked in EC2, S3, ELB, Autoscaling Servers, Glacier, Storage Lifecycle rules, Amazon EMR.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Design technical solution for real-time analytics using Spark and HBase.
  • Created ETL (Informatica)jobs to generate and distribute reports from MySQL database
  • Involved in loading data from LINUX file system to HDFS using Sqoop and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Business intelligence(BI) team.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Extracted the Teradata from Oracle into Hive using the Sqoop.
  • Worked on Agile Methodology.

Environment: Hadoop, MapReduce, HDFS, Hive, Cloudera, Core Java, Scala, SQL, Flume, Spark, Pig, Sqoop, Oozie, impala, Ruby, AWS HBase, Kafka, Cassandra, ETL, Oracle, Python, Unix.

Hire Now