We provide IT Staff Augmentation Services!

Hadoop Big Data/spark Developer Resume

Berkley Heights, NJ


  • Over 5+ years of experience in IT and 2+ years of experience Hadoop/Big Data eco systems and Java technologies like HDFS, MapReduce, Apache Pig, Hive, Hbase, Spark Kafka and Sqoop.
  • In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
  • Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big Data.
  • Hands on experience in writing Ad - hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
  • Experience in importing and exporting data using SQOOP from Relational Database Systems to HDFS.
  • Experience in writing Hadoop Jobs for analyzing data using Pig Latin Commands.
  • Good Knowledge of analyzing data in HBase using Hive and Pig.
  • Working Knowledge in NoSQL Databases like HBase and Cassandra.
  • Hands on Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Good Knowledge in Amazon AWS concepts like EMR, EC2, EBS, S3 and RDS web services which provides fast and efficient processing of Big Data.
  • Experience in Integrating BI tools like Tableau and pulling required data to in-memory of BI tool.
  • Experience in Launching EC2 instances in Amazon EMR using Console.
  • Extending Hive and PIG core functionality by writing custom UDFs like UDAFs and UDTFs.
  • Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
  • Experience in using Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
  • Passionate towards working in Big Data and Analytics environment.
  • Knowledge on Reporting tools like Tableau which is used to do analytics on data in cloud.
  • Extensive experience with SQL, PL/SQL, Shell Scripting and database concepts.
  • Experience with front end technologies like HTML, CSS and JavaScript.
  • Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data,SQL, XML, HTML, Core Java, Shell Scripting etc.


Database: DB2, MySQL, Oracle, MS SQL Server

Languages: Core Java, PIG Latin, SQL, Hive QL, Shell Scripting and XML

API s/Tools: NetBeans, Eclipse, MYSQL workbench, Visual Studio

Web Technologies: HTML, XML, JavaScript, CSS


Operating System: Unix, Linux, Windows XP

Visualization Tools: Tableau, Zeppelin

Virtualization Software: VMware, Oracle Virtual Box.

Cloud Computing Services: AWS (Amazon Web Services).


Confidential, Berkley Heights, NJ

Hadoop Big Data/Spark Developer


  • Analyzing the requirement to setup a cluster.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Writing PIG scripts to process the data.
  • Developed and designed Hadoop, Spark and Java components.
  • Developed Spark programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Explored the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, spark YARN and converted Hive queries into Spark transformations using Spark RDDs.
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
  • Developed Unix/Linux Shell Scripts and PL/SQL procedures.
  • Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Involved in creating Hive tables, loading with data and writing hive queries using the HIVEQL which will run internally in MAPREDUCE way.
  • Loaded some of the data into Cassandra for fast retrieval of data.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
  • Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports by our BI team.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Implementation of Big Data solutions on the Hortonworks distribution and AWS Cloud platform.
  • Developed Pig Latin scripts for handling data formation.
  • Extracted the data from MySQL into HDFS using SQOOP.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager.

Environment: Hadoop, Cloudera distribution, Hortonworks distribution, AWS, EMR, Azure cloud platform, HDFS, MapReduce, DocumentDB Unix Shell Scripting, Kafka, Pig, Hive, Sqoop, Flume, Oozie, Zoo keeper, Core Java, impala, HiveQL, Spark, UNIX/Linux Shell Scripting.

Confidential, Newark, CA

Big Data Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Working experience in HDFS Admin Shell commands.
  • Experience in ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node and Data Node concepts.
  • Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
  • Used Kafka to transfer data from different data systems to HDFS.
  • Migrated complex map reduce programs into Spark RDD transformations, actions.
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Used spark to parse XML files and extract values from tags and load it into multiple hive tables.
  • Experience on different Hadoop distribution Systems such as: Cloudera & Hortonworks
  • Hands on experience on Cassandra DB.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Hands on using SQOOP to import and export data into HDFS from RDBMS and vice-versa.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Used SQOOP, AVRO, HIVE, PIG, Java, MAPREDUCE daily to develop ETL, Batch Processing and data storage functionality.
  • Supported implementation and execution of MAPREDUCE programs in a cluster environment.

Environment: Hadoop, MapReduce, Hive,Pig, Hbase, Sqoop, Kafka, Cassandra, Flume, Java, SQL, Cloudera Manager, Eclipse, Unix Script, YARN.

Confidential, Columbus, OH

Hadoop Developer


  • Written MapReduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
  • Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
  • Worked on a stand-alone as well as a distributed Hadoop application.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Used Oozie and Zookeeper to automate the flow of jobs and coordination in the cluster respectively.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Extensive knowledge on PIG scripts using bags and tuples and Pig UDF'S to pre-process the data for analysis.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers.
  • Used Teradata to build Hadoop project and also as ETL project.
  • Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
  • Involved in writing query using Impala for better and faster processing of data.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in collecting and aggregating large amounts of log data using Apache and staging data in HDFS for further analysis.
  • Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.

Environment: HDFS, MapReduce, Python, CDH5, Hbase, NOSQL, Hive, Pig, Hadoop, Sqoop, Impala, Yarn, Shell Scripting, Ubuntu, Linux Red Hat.


Java / Hadoop Developer


  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive and Spark.
  • Developed the Map Reduce programs to parse the raw data and store the pre-Aggregated data in the portioned tables.
  • Involved in start to end process of Hadoop cluster installation, configuration and monitoring
  • Responsible for building scalable distributed data solutions using Hadoop and Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Worked with HBase in creating tables to load large sets of semi structured data coming from various sources.
  • Created design documents and reviewed with team in addition to assisting the business analyst / project manager in explanations to line of business.
  • Responsible for understanding the scope of the project and requirement gathering.
  • Involved in analysis, design, construction and testing of the application
  • Developed the web tier using JSP to show account details and summary.
  • Designed and developed the UI using JSP, HTML, CSS and JavaScript.
  • Used Tomcat web server for development purpose.
  • Involved in creation of Test Cases for JUnit Testing.
  • Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
  • Developed application using Eclipse and used build and deploy tool as Maven.

Environment: Hadoop, HBase, HDFS, Pig Latin, Sqoop, Hive,Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JavaScript, Maven, Eclipse, Apache Tomcat, and Oracle.

Hire Now