We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Atlanta, GeorgiA


  • Sr. Hadoop Developer having 8 years of experience in Information and Technology which includes 4+ years of experience in Big data and Hadoop Ecosystem components and 4 years in Java development.
  • Solid hands - on experience in dealing with Apache Hadoop components like HDFS, MapReduce, HBase, Pig, Hive, Yarn, Sqoop, Oozie, Cassandra, Flume, Zookeeper, Apache Spark.
  • IT experience in all phases of Hadoop Development, Java Development along with experience in Application Development & Data modelling through various roles over the years.
  • Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  • Experienced working with Spark Streaming, Spark SQL and Kafka for real-time data processing.
  • Strong experience troubleshooting Spark applications and various performance considerations to take for efficient memory handling.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs) and DataFrames in Spark.
  • Experience integration of Kafka with Spark for real time data processing.
  • Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics
  • Extensive experience in working with various distributions of Hadoop Enterprise versions of Cloudera(CDH5), Hortonworks and good knowledge on Amazon's EMR (Elastic MapReduce).
  • Knowledge in end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie, Flume and Zookeeper.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Cassandra, HBase, Mongo DB 3.0.1.
  • Managed data coming from different sources and involved in HDFS loading of structured and unstructured data.
  • Experience in developing data pipeline using Sqoop, and Flume to extract the data from weblogs and store in HDFS.
  • Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Imported the data from various sources like AWS S3, Local file system into Spark RDD.
  • Experience in importing and exporting the data using Sqoop from Relational Database to HDFS and reverse on Linux systems. Developed Spark SQL to load tables into HDFS.
  • Worked on loading CSV/AVRO/PARQUET files using Scala/Java language in Spark Framework
  • Expertise in writingHadoopJobs for analysing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Developed various Shell Scripts and python scripts to automate Spark jobs and hive scripts.
  • Created User Defined Functions (UDF’s), User Defined Aggregated Functions (UDAF’s) in PIG and Hive.
  • Designed and implemented Hive and Pig UDF's using Python, java for evaluation, filtering, loading and storing of data.
  • Experience in loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Good understanding of service oriented architecture (SOA) and web services like XML and SOAP.
  • Experience in object oriented analysis and design (OOAD), used modelling language (UML) and design patterns.
  • Experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Involved in Agile Scrum methodology that leverages the Client big data platform and used version control tool Git 2.12.0.
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
  • Excelled in using version control tools like SVN, VSS and GIT. Used web-based UI development using JavaScript, JQuery UI, CSS, JQuery, HTML, HTML5, XHTML and JavaScript.
  • Implemented Web-Services to integrate between different applications (internal and third-party components using SOAP and RESTFUL service.


Operating System: Windows, Unix, Linux distributions like Ubuntu, CentOS, Redhat

Hadoop Distribution: Cloudera, Hortonworks

Languages: Java, Scala, Python, JavaScript

Data stores: MySQL, SQL Server

Big data: MapReduce, HDFS, Flume, Hive, Pig, Oozie, HBase, Sqoop, Spark, NiFi and Kafka

Amazon Stacks: AWS EMR, S3, EC2, Lambda, Route 53, EBS

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

ETL: Tableau, Talend and Informatica

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, AngularJs, ExtJS and JSON

Development/Build tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

No SQL Database: Cassandra, MongoDB, HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, Spring and Struts


Confidential, Atlanta, Georgia

Sr. Hadoop Developer


  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Developed Apache Spark applications by using spark for data processing from various streaming sources.
  • Migrated Map reduce jobs to Spark jobs to achieve better performance.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame.
  • Worked on Kafka and REST API to collect and load the data on Hadoop file system also used sqoop to load the data from relational databases.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning and Bucketing.
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
  • Implemented usage of Amazon EMR for processing Big Data across aHadoopCluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Imported the data from different sources like AWS S3, Local file system into Spark.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Stored data in AWS S3 like HDFS and performed EMR programs on data stored
  • Performed data analysis with MongoDB using Hive external tables.
  • Exported the analysed data using Sqoop into Database to generate reports for the BI team.
  • Experience building batch, real-time and streaming analytics pipelines with data from event data streams, NoSQL and APIs.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Developed Oozie workflow engine to run multiple Hive, Pig, Sqoop and Spark jobs.
  • Experience in various data modelling concepts like star schema, snowflake schema in the project.
  • Worked on NoSQL database MongoDB in storing images and URIs.
  • Worked on MongoDB database concepts such as transactions, indexes, replication, locking and schema design.
  • Designed, configured and managed public/private cloud infrastructures utilizing AWS.
  • Worked on auto scaling the instances to design cost effective, fault tolerant and highly reliable systems.
  • Designed ETL workflows on Tableau and deployed data from various sources to HDFS and generated reports using Tableau.

Environment: Cloudera(CDH5), Spark, Hadoop (HDFS), AWS, UNIX Shell Scripting, Sqoop, HDFS, Pig, Hive, Oozie, Java, Oracle 11g, GIT, Centos, Tableau, NiFi, Windows, Python, MongoDB

Confidential, Dayton Ohio

Data Engineer


  • Built scalable distributed Hadoop cluster running Hortonworks Data Platform.
  • Develop data set processes for data modelling, and mining. Recommend ways to improve data reliability, efficiency and quality.
  • Working experience with data streaming process with Kafka, Apache Spark, Hive, Pig, etc.
  • Importing and exporting data into HDFS Sqoop and Flume and Kafka.
  • Utilized Flume to filter out the input data read to retrieve only the data needed to perform analytics by implementing flume interception.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka.
  • Developed a Nifi Workflow to pick up the data from SFTP server and send that to Kafka broker
  • Worked on analysing Hadoop Cluster and different big data analytic tools including Pig, Hive.
  • Extracted files from MongoDB through Sqoop and placed in HDFS for processed.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Analysed the SQL scripts and designed the solution to implement using Scala.
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled structured data using Spark SQL.
  • Tested Apache Tez for building high performance batch and interactive data processing applications on Pig and Hive jobs.
  • Implemented messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Implemented Real time analytics on Cassandra data using thrift API.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed transformations and exported the data to Cassandra.
  • Queried and analyzed data fromCassandrafor quick searching, sorting and grouping throughCQL.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Involved in complete Big Data flow of the application starting from data ingestion upstream to HDFS, processing the data in HDFS and analysing the data and involved.
  • Worked on Apache Nifi to decompress and move JSON files from local to HDFS.
  • Developed and designed automate process using shell scripting for data movement.
  • Involved in loading data from UNIX file system to HDFS using Shell Scripting.

Environment: Java, J2EE 1.7, Eclipse, Apache Hive, HDFS, Github, Jenkins, NiFi, Python, Scala, Pig, Hadoop, Scripting and AWS S3, EC2, Impala, Shell Scripting, Apache Web Server, Spark, Spark SQL, JIRA.

Confidential, Little Rock, AR

Big Data Developer


  • Written MapReduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
  • Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
  • Worked on a stand-alone as well as a distributed Hadoop application.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Used Oozie and Zookeeper to automate the flow of jobs and coordination in the cluster respectively.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Extensive knowledge on PIG scripts using bags and tuples and Pig UDF'S to pre-process the data for analysis.
  • Implemented usage of Amazon EMR for processing Big Data across aHadoopCluster of virtual servers.
  • Used Teradata to build Hadoop project and also as ETL project.
  • Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
  • Involved in writing query using Impala for better and faster processing of data.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in collecting and aggregating large amounts of log data using Apache and staging data in HDFS for further analysis.
  • Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.

Environment: HDFS, MapReduce, Python, CDH5, Hbase, NOSQL, Hive, Pig, Hadoop, Sqoop, Impala, Yarn, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential ., New York, NY

Jr. Hadoop Developer


  • Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, Hive, Pig and Sqoop
  • Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Created Hive Tables, loaded values and generated adhoc-reports using the table data.
  • Extending HIVE and PIG core functionality by using custom UDF’s.
  • Managing and scheduling Jobs on Hadoop cluster using Oozie.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Involved in writing MapReduce programs for analytics and also for structuring the data coming from flume sinks.
  • Written Java program to retrieve data from HDFS and providing REST services.
  • Integrated multiple sources of data (SQL Server, DB2, MySQL) into Hadoop cluster and analysed data by Hive-HBase integration.
  • Worked on Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
  • Managing and monitoring the Hadoop cluster through Cloudera Manager.

Environment: Java, Eclipse, HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, MySQL, Linux, Apache Impala, Apache Sqoop.


Java Developer


  • Involved in Requirement gathering, Analysis and Design using UML and OOAD.
  • Developed application on Struts MVC architecture utilizing Action Classes, Action Forms and validations.
  • Interacted with Developers to follow up on Defects and Issues.
  • Coded SQL, PL/SQL for backend processing and retrieval logic. Involved in build and deploying the application using ANT builder.
  • Deployed J2EE web applications in BEA WebLogic. Ported the Application onto MVC Model 2 Architecture in Struts Framework.
  • Experience in development of extracting, transforming and loading (ETL), maintain and support the enterprise data warehouse system and corresponding marts.
  • Participated in technical discussion for architecture design, database and code enhancement.
  • Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Involved in designing and Development of SOA services using Web Services
  • Used ANT and MAVEN as build tools on Java projects for the development of build artifacts on the source code.
  • Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios.
  • Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle.
  • Responsible for Coding, Unit Testing and Functional Testing and Regression Testing.
  • Implemented mid-tier business services to integrate UI requests to DAO layer commands.

Environment: Java JDK (1.5), Java J2EE, Servlets, Water Fall, JSPs, EJBs, DB2, XML, Web Server, JUNIT, Hibernate, MS ACCESS, Microsoft Excel, XML, CSS, HTML, JavaScript, Struts, Spring MVC


Junior Java Developer


  • Extensive Involvement in Requirement Analysis and system implementation. Actively involved in SDLC phases like Analysis, Design and Development.
  • Developed user interface using JSP, Struts and JavaScript.
  • Developed web components using JSP Servlets, JDBC and coded JavaScript for AJAX and client- side data validation.
  • Assisted in designing and programming for the system, which includes development of Process Flow Diagram, Entity Relationship Diagram, Data Flow Diagram and Database Design.
  • Created Stored Procedures and Triggers using SQL/PL-SQL for data modification.
  • Skills gained on web-based REST API, SOAP API, and Apache for real-time data streaming.
  • Developed user interface using JSP, JavaScript and CSS Technologies.
  • Created SQL queries, Sequences, Views for the backend database in Oracle database.
  • Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
  • Involved in Tool development, Testing and Bug Fixing. Performed unit testing for various modules.
  • Used Log4j package for debugging, info and error tracings

Environment: Java, J2EE, Servlets, JSP, SQL, PL/SQL, HTML, JavaScript, Eclipse, CSS, Oracle, MYSQL, Oracle10/11g, MS SQL SERVER, JIRA, REST API, SOAP API, Windows, Linux.

Hire Now