We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Albany New, YorK


  • Overall 8 years of IT experience in a variety of industries, which includes hands on experience on Big Data Analytics, and Development.
  • Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce Programming paradigm.
  • Experience in installation, configuration, Management, supporting and monitoring Hadoop using various distributions such as Apache SPARK, Cloudera and AWS Service console.
  • Experienced in writing complex Map Reduce programs that work with different file formats like Text, Sequence, XML, JSON and Avro.
  • Have good experience in Big data related technologies like Hadoop frameworks, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, Zookeeper, Oozie, and Storm.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text Sequence, XML, JSON and Avro.
  • Hand - on experience in using Spark Streaming, batch processing for processing the Streaming data and batch data.
  • Have working experience on Cloudera Data Platform using VMware Player, Cent OS 6 Linux Environment. Strong experience on Hadoop distributions like Cloudera and HortonWorks.
  • Good knowledge of No-SQL databases MongoDB and HBase.
  • Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries.
  • Used Spark SQL, HQL queries for analyzing the data in HDFS.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Good working experience using Sqoop to import data into HDFS or Hive from RDBMS and exporting data back to HDFS or HIVE from RDBMS.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Worked with Big Data distributions like Cloudera (CDH 3 and 4) with HDP 2.4.
  • Worked on ETL tools like Talend to simplify Map Reduce jobs from the front end. Also have knowledge of Pentaho and Informatics as another working ETL tool with Big Data.
  • Worked with BI tools like Tableau for report creation and further analysis from the front end.
  • Extensive knowledge in using SQL queries for backend database analysis.
  • Involved in unit testing of Map Reduce programs using Apache MRunit.
  • Worked on Amazon Web Services and EC2.
  • Experience in working with CI/CD pipeline using tools like Jenkins and Chef.
  • Worked on creating the Docker containers and Docker consoles for managing the application life.
  • Experience working with Build tools like Maven and Ant.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, Spring, Hibernate, JDBC.
  • Experienced in both Waterfall and Agile Development (SCRUM) methodologies


  • Hadoop Technologies
  • Apache Hadoop, Cloud era Hadoop Distribution (HDFS and Map Reduce)
  • Technologies HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie
  • Java/J2EE Technologies
  • Core Java, Servlets, Hibernate, Spring, Struts XML, HTML, XHTML, JNDI, HTML5, AJAX, jQuery, CSS, JavaScript, AngularJS, VB Script, WSDL, SOAP, JDBC, ODBC Architectures REST, MVC architecture.
  • NOSQL Databases
  • Hbase, MongoDB
  • Programming Languages
  • Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting
  • Web Technologies
  • HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML
  • Application Servers
  • Web Logic, Web Sphere, JBoss,
  • Cloud Computing tools
  • Amazon AWS.
  • Build Tools
  • ANT, MAVEN, make file, Hudson, Jenkins, BAMBOO, Code Deploy.
  • Databases
  • MySQL, Oracle, DB2MS SQL Server, MY SQL, Oracle 9i/10g, MS access, Teradata TeradataV2R5
  • Business Intelligence Tools
  • Splunk,Talend
  • Development Methodologies
  • Agile/Scrum, Waterfall.
  • Development Tools
  • Microsoft SQL Studio, Toad, Eclipse, NetBeans.
  • Operating Systems


Hadoop/Spark Developer

Confidential, Albany, New York


  • The main aim of the project is tuning the performance of the existing Hive Queries and preparing Spark jobs that are scheduled daily in Tez.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Responsible for design development of Spark,SQL Scripts bases on Functional Specifications.
  • Responsible for Spark streaming configuration based on type of Input.
  • Real time streaming the data using Spark, Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Developed the services to run the Map-Reduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
  • Responsible to manage data coming from different sources.
  • Monitoring the running MapReduce programs on the cluster.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive(Hadoop) tables.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Developed scripts and automated data management from end to end and sync up b/w all the clusters.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Partitioning data streams using Kafka. Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
  • Involved in gathering the requirements, designing, development and testing.
  • Followed agile methodology for the entire project.
  • Prepare technical design documents, detailed design documents.

Environment: Hadoop, Spark Core, Spark-SQL, Spark-Streaming, HDFS, MapReduce, Hive, HBase, Flume, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Unix, Cloudera, Flume, Sqoop, HDFS, Java, Scala, Python.

Confidential, Wilmington, DE

Hadoop Developer


  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications. installed Oozie workflow engine to run multiple Hive and Pig jobs. Used Scala collection framework to store and process the complex consumer information. Used Scala functional programming concepts to develop business logic.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Managed and reviewed Hadoop log files. Installing and deploying IBM Web-sphere. Installing and deploying IBM Web-sphere.
  • Implemented the NoSQL database HBase and the management of the other tools and process observed running on YARN
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig. Analyze, validate and document the changed records for IBM web application.
  • Setup and benchmarked Hadoop/HBase clusters for internal use. Assist the development team to install single node Hadoop 224 in local machine.
  • Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Written Hive queries for data analysis to meet the business requirements.
  • These new data items will be used for further analytics/reporting purpose. It has Cognos reports as the BI component.Analysis with data visualization player Tableau. Writing Pig scripts for data processing.
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Validating the data using MD5 algorithms.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
  • Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard. Loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
  • Used AVRO, Parquet file formats for serialization of data.

Environment: Big Data/Hadoop, Python, Java, Agile, Spark Streaming, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, NoSQL, HBase, IBM WebSphere, Tomcat and Tableau.

Bigdata/Hadoop Developer



  • Working as a Big Data Developer for providing solutions for big data problem.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Defined the application architecture and design for Big Data Hadoop initiative to maintain structured and unstructured data; create reference architecture for the enterprise.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
  • Designed and maintain scalable solutions on the big data analytics platform for enterprise module.
  • Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.
  • Migrated physical data center environment to AWS also designed, built, and deployed a multitude applications utilizing almost all of the AWS stack (EC2, S3, RDS, )
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Created real time data ingestion of structured and unstructured data using Kafka and Spark streaming to Hadoop and MemSQL.
  • Populate the data into dimensions and fact tables, efficiently involved in creating Talend Mappings.
  • Started using Apache Nifi to copy the data from local file system to HDP.
  • Implement solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, MapR, etc.
  • Use Input and Output data as delimited files into HDFS using Talend Big data studio with different Hadoop Component like Hive, Pig and Spark.
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapR in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Install, configured, and operate Zookeeper, Pig, Falcon, Sqoop, Hive, HBase, Kafka, and Spark for business needs.
  • Involved in MapR Converged Data Platform was built with the idea of data movement in mind, with a real-time.
  • Create a table inside RDBMS, insert some data after load the same table into HDFS, Hive using Sqoop.
  • Worked with Business stakeholder and translate Business objectives, requirements into technical requirements and design.
  • Identify data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning.
  • Developed scripts for data ingestion using Sqoop and Flume, Spark SQL and Hive queries for analyzing the data, and Performance optimization
  • Wrote DDL and DML files to create and manipulate tables in the database
  • Developed the UNIX shell/Python scripts for creating the reports from Hive data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Analyzed data using Hadoop components Hive and Pig and created tables in hive for the end users
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis

Environment: Hadoop, MLlib, MapReduce, MySQL, MongoDB, HDFS, Yarn, Hive, Pig, Sqoop 1.6, Flume, Amazon Web Services EC2, Hive, Spark, Scala, Pig, jQuery, Spring, JUnit, XML, Python.

Hire Now