We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Atlanta, GA


  • 7 years of professional experience in information technology, which includes 4 years of experience in the development of Bigdata and Hadoop ecosystem components.
  • Over 3 years of extensive experience in JAVA, J2EE Technologies, Database development and Data analytics.
  • Hands on experience in development of Big Data projects using Hadoop, Hive, Sqoop, Oozie, PIG, Flume, and MapReduce open source tools/technologies.
  • Experience in writing Pig Latin, HiveQL scripts and extended their functionality using User Defined Functions (UDF’s).
  • Hands on experience with performance optimization techniques for data processing in Hive, Impala, Spark, Pig, Map - Reduce.
  • Written complex Map-Reduce code by implementing custom writable and writable comparable for analysis of large datasets.
  • Had a very good exposure working with various File-Formats (Parquet, Avro, JSON) and Compressions (Snappy, Bzip & Gzip).
  • Hands on experience with Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API.
  • Developed applications using Spark for data processing.
  • Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions.
  • Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
  • Good knowledge on Spark architecture and real-time streaming using Spark.
  • Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and serialization.
  • Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience in Java, JSP, Servlets, Web Logic, Web Sphere, Java Script, JQuery, XML, and HTML.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
  • Knowledge on ETL methods for data extraction, transformation and loading in corporate-wide ETL solutions and Data warehouse tools for reporting and data analysis.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Well-Versed with Agile and Waterfall methodologies.
  • Strong team player with good communication, analytical, presentation and interpersonal skills.


Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Cassandra, Mongo DB Zookeeper, Hive, Pig, Sqoop, Flume and Oozie.

Operating Systems: Windows, UNIX, LINUX.

Programming Languages: C, Java, PL/SQL, Scala

Scripting Languages: JavaScript, Shell Scripting

Web Technologies: HTML, XHTML, XML, CSS, JavaScript, JSON, SOAP, WSDL.

Hadoop Distribution: Cloudera, Hartonworks.

Java/J2EE Technologies: Java, J2EE, JDBC.

Database: Oracle, MS Access, MySQL, SQL, No SQL.

IDE: Eclipse, IntellIj, SBT.

Methodologies: J2EE Design patterns, Scrum, Agile, Water Flow

Version Control: SVN, Git, GitHub, BITBUCKET


Confidential, Atlanta, GA

Hadoop/Spark Developer


  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Created Partitioning and Bucketing on Hive tables in Parquet File Formats with Snappy compression
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through Spark SQL using Scala .
  • Involved in performance tuning of Hive for design, storage, and query perspectives.
  • Collected the JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Worked with Spark - SQL context to create data frames to filter input data for model execution.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, the correct level of Parallelism and memory tuning.
  • Developed Kafka consumer to consume data from Kafka topics .
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive with Tableau to generate reports for the end user.
  • Developed shell scripts for running Hive scripts in Hive and Impala .
  • Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
  • Used Jira for bug tracking, BitBucket to check-in, and checkout code changes.

Environment: HDFS, Yarn, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Hortonworks.

Confidential, Minneapolis, MA

Hadoop Developer/Spark Developer


  • Involved in Importing and exporting the data into HDFS and Hive using Sqoop and Kafka.
  • Converted complex Teradata and Netezza SQL into HiveQL.
  • Developed ETL using Hive, Oozie, shell scripts and Sqoop. Used Scala for coding the components, & Utilized Scala pattern matching in coding.
  • Used Flume to collect, aggregate and store the weblog data into HDFS .
  • Designed NoSQL schemas in HBase .
  • Developed MapReduce ETL in Java and Pig .
  • Loaded log data into HDFS using Flume .
  • Developed simple to complex MapReduce Jobs using Hive and Pig.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs .
  • Implemented Partitioning, Dynamic Partition, Bucket in HIVE .

Environment: Map Reduce, HDFS, Hive, Pig, Sqoop, Scala, Oozie, SQL, Flume, Python, Shell Script, DataStage, Horton works.

Confidential, Tampa, FL

Hadoop Developer


  • Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
  • Performed Sqoop Incremental imports by using Oozie based on every day.
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in the map-reduce pattern.
  • Performed Optimizations of Hive Queries using Map-side joins, dynamic partitions, and Bucketing.
  • Responsible for executing Hive queries using Hive Command Line under Cloudera Manager.
  • Implemented Hive Generic UDF’s to implement business logic around custom data types.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Coordinated the Pig and Hive scripts using Oozie workflow.
  • Loaded the data into HBase from HDFS.
  • Loaded and transformed large sets of structured, semi-structured, and unstructured data that includes Avro, sequence files, and XML files.

Environment: Hadoop, Cloudera, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, HiveLinux, Java, Eclipse.


Hadoop Developer


  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design, and development.
  • Analyze large datasets to provide strategic direction to the company.
  • Collected the logs from the physical machines and integrated into HDFS using Flume.
  • Involved in analyzing the system and business.
  • Developed SQL statements to improve back-end communications.
  • Loaded unstructured data into Hadoop File System (HDFS).
  • Created reports and dashboards using structured and unstructured data.
  • Involved in importing data from MySQL to HDFS using SQOOP.
  • Involved in writing Hive queries to load and process data in Hadoop File System.
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce.
  • Involved in working with Impala for data retrieval process.
  • Sentiment Analysis on reviews of the products on the client's website.
  • Developed custom Map-Reduce programs to extract the required data from the logs.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.


Java/J2EE Developer


  • Involved in Full Life Cycle Development in Distributed Environment using Java and J2EE Framework.
  • Designed the application by implementing Struts Framework based on MVC Architecture.
  • Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information Apache Axis 2 Web Service.
  • Extensively worked on User Interface for few modules using JSP, JavaScript.
  • Developed framework for data processing using Design patterns, Java, XML.
  • Used the lightweight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
  • Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
  • Designed and developed Session beans to implement the Business logic.
  • Developed EJB components that are deployed on Web Logic Application Server.
  • Written unit tests using JUnit Framework and Logging is done using Log4J Framework.
  • Designed and developed various configuration files for Hibernate mappings.
  • Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
  • Developed Web Services for sending and getting data from different applications using SOAP messages.
  • Actively involved in code reviews and bug fixing.
  • Applied CSS (Cascading Style Sheets) for entire site for standardization of the site.

Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse 3.3, Oracle 10g, JUnit 4.2, Maven, Windows XP, HTML, CSS, JavaScript, and XML.

Hire Now