We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

San Diego, CA


  • Over 8+ years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
  • 4+ years of experience in developing applications that perform large scale Distributed Data Processing using Big Data ecosystem tools such as Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Kafka, Oozie, Zookeeper, Flume, Yarn and Avro.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce programming paradigm and good hands - on experience in Pyspark and SQL queries.
  • Experience in the successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.
  • Proficient knowledge on Apache Spark and Apache Storm to process real time data.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
  • Experience with Big Data ML toolkits such as Mahout and Spark ML.
  • Experience in writing Producers/consumers and creating messaging centric applications using Apache Kafka.
  • Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Set up standards and processes for Hadoop based application design and implementation.
  • Migrating various Hive UDF's and queries into Spark SQL for faster requests.
  • Developed data pipelines using Kafka and Spark Streaming to store data into HDFS and performed the real-time analytics on the incoming data.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Experience in configuring Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB.
  • Good knowledge in performance troubleshooting and tuning Cassandra clusters and understanding of Cassandra Data Modeling based on applications.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Implemented Flume for collecting, aggregating and moving large amount of server logs and streaming data to HDFS.
  • Delivery experience on major Hadoop ecosystem Components such as Pig, Hive, Spark, Kafka, Elasticsearch, HBase and monitoring with Cloudera Manager. Extensive working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Hands-on experience with Hortonworks and Cloudera Distributed Hadoop (CDH).
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Involved in best practices for Cassandra, migrating application to Cassandra database from the legacy platform for Choice, upgraded Cassandra 3.
  • Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
  • Good Experience in integrating BI tools (Spotfire, Crystal Reports, Lumira, Tableau) with Hadoop.
  • Setting up data in AWS using S3 bucket and configuring instance backups to S3 bucket.
  • Worked with various formats of files like delimited text files, clickstream log files, Apache log files, Avro files, JSON files, XML Files.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
  • Experience in using Design patterns, Java, JSP, Servlets, JavaScript, HTML, jQuery, Angular JS, Mobile jQuery, JBOSS 4.2.3, XML, Weblogic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
  • Hands on experience on Solr to Index the files directly from HDFS for both Structured and Semi Structured data.
  • Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Experienced the integration of various data sources like Java, RDBMS, Shell Scripting and Spreadsheets.
  • Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
  • Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.


Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Programming Languages: Java, Python, C/C++, JavaScript, Ruby, SQL, HTML, DHTML, Scala and XML.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapReduce and Apache.

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle.

No SQL Databases: Cassandra, MongoDB and HBase.

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB.

Development Methodology: Agile, waterfall.

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, AngularJs, ExtJS and JSON.

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Frameworks: Struts, spring and Hibernate.

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat.

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2.

Operating systems: UNIX, LINUX, MacOS and Windows XP/Vista/7/8/10.

Data analytical tools: R, MATLAB and Tableau.

ETL Tools: Talend, Informatica, Pentaho.


Confidential - San Diego, CA

Sr. Hadoop Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files. enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs.
  • Implemented REST call to submit the generated CDAs to vendor website Implemented Impala to support JDBC/ODBC connections for Hiveserver2.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Performed real-time analysis of the incoming data using Kafka consumer API, Kafka topics, Spark Streaming utilizing Scala.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Good Understanding of DAG cycle for entire Spark application flow on Spark application WebUI.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Written Storm topology to emit data into Cassandra DB.
  • Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Beanstalk, and AWS Cloud Formation.
  • Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive.
  • Experienced in designing and deployment of Hadoop cluster and various Big Data components including HDFS, MapReduce, Hive, Sqoop, Pig, Oozie, Zookeeper in Cloudera distribution.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Implemented Kafka Custom partitioners to send data to different categorized topics.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
  • Implemented messaging system for different data sources using apache Kafka and configuring High level consumers for online and offline processing.
  • Written Shell scripts that run multiple Hive jobs which helps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Worked with data delivery teams to setup new Hadoop users. This job included setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Done Scaling Cassandra cluster based on lead patterns.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Performed real-time analysis of the incoming data using Kafka consumer API, Kafka topics, Spark Streaming utilizing Scala.
  • Good understanding of Cassandra Data Modeling based on applications.
  • Worked on Talend metadata manager, analyzed and implemented different use cases for handling various types of metadata.
  • Created PIG script jobs in maintaining minimal query optimization.
  • Involved in daily SCRUM meetings to discuss the development/process.
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in the environment.

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Linux, Python, Spark, Impala, Scala, Kafka Storm, Shell Scripting, XML, Eclipse, Cloudera, DB2, SQL Server, MySQL, Autosys, Talend, AWS, HBase.

Confidential - Houston, TX

Hadoop Developer


  • Responsible for planning, organizing, and implementation of complex business solutions, producing deliverables within stipulated time.
  • Developed Scripts and Batch Job to schedule various Hadoop Programs.
  • Created Hive Generic UDF's to process business logic with Hive QL.
  • Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD and Python.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS (SNS, SQS).
  • Used Kafka to load data in to HDFS and move data into NoSQL databases(Cassandra)
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Developed multiple Spark jobs in Scala/Python for Data cleaning, pre-processing and Aggregating.
  • Developed Spark programs using Scala. Involved in Creating Spark SQL Queries and Developed Oozie workflow for Spark jobs.
  • Integrated Kafka with Storm for real time data processing and written some storm topologies to store the processed data directly to MongoDB and HDFS.
  • Used Avro data serialization system with Avro tools to handle Avro data files using Mapreduce programs.
  • Implemented Data Validation using map reduce programs to remove unnecessary records before move data into Hive tables.
  • Experience in Hive partitioning, bucketing and perform joins on Hive tables and implementing REGEX, JSON and Avro.
  • Optimized Hive analytics Sql queries, created tables/views, written custom UDF's and Hive based exception processing.
  • Implemented custom interceptors for Flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Experience implementing machine learning techniques in Spark by using Spark MLlib.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Writing user console page in lift along with the snippets in Scala. The product is responsible to give access to the user to all their credentials and privileges within the system.
  • Involved in works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting.

Environment: Hadoop, AWS, Mapreduce, Hive, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle.

Confidential - Brentwood, CA

Hadoop Developer


  • Create a Hadoop design which replicates the Current system design.
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Coordinate with offshore and onsite team to understand the requirements and prepare High level and Low-level design documents from the requirements specification.
  • Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Written multiple MapReduce programs in Java for Data Analysis.
  • Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades
  • Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
  • Experience in optimization of Mapreduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Supported the existing MapReduce Programs those are running on the cluster.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
  • Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on tuning the performance Pig queries and involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS using Sqoop (version 1.4.3) and Kafka.
  • Optimized Hive analytics SQL queries, created tables/views, written custom UDF's and Hive based exception processing.
  • Developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
  • Developed Shell, and Python scripts to automate and provide Control flow to Pig scripts.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
  • Monitored systems and services through Ambari dashboard to make the clusters available for the business.
  • Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.

Environment: Hadoop, Flume, Mapreduce, Hive, Spark, Scala, Java, Python, HBase, Oracle, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, YARN.

Confidential - Chicago, IL

Hadoop Developer


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture Modeling, Development, Implementation, Testing.
  • Responsible to managing data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data into HDFS.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
  • Used HIVE join queries to join multiple tables of a source system and load them into Elasticsearch Tables.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Created HBase tables to store variable data formats coming from different portfolios Performed real time analytics on HBase using Java API and Rest API.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database system and vice-versa. Loading data into HDFS.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in running MapReduce jobs for processing millions of records.

Environment: Hadoop, Big Data, HDFS, Mapreduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, Windows, UNIX Shell Scripting, and Eclipse.


Java Developer


  • Coded front-end components using HTML, JavaScript and jQuery, Back End components using Java, spring, Hibernate, Services Oriented components using Restful and SOAP based web services, and Rules based components using JBoss Drools.
  • Involved in design and development phases of Software Development Life Cycle (SDLC).
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
  • Integrated Spring Dependency Injection among different layers of an application with Spring and O/R mapping tool of Hibernate for rapid development and ease of maintenance.
  • Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
  • Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
  • Implemented the Connectivity to the Database Server Using JDBC.
  • Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
  • Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB.
  • Configured domains in production, development and testing environments using configuration wizard.
  • Created SOAP Handler to enable authentication and audit logging during Web Service calls.
  • Created Service Layer API's and Domain objects using Struts.
  • Used AJAX and JavaScript for validations and integrating business server-side components on the client side within the browser.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
  • Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
  • Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
  • Developed a Dojo based front end including forms and controls and programmed event handling.
  • Used XSLT to transform my XML data structure into HTML pages.
  • Deployed EJB Components on Tomcat. Used JDBC API for interaction with Oracle DB.
  • Developed the UI panels using JSF, XHTML, CSS, DOJO and jQuery.

Environment: Java 6 - JDK 1.6, JEE, Spring 3.1 framework, Spring Model View Controller (MVC), Java Server Pages (JSP) 2.0, Servlets 3.0, JDBC4.0, AJAX, Web services, Rest API, JSON, Java Beans, jQuery, JavaScript, Oracle 10g, JUnit, HTML Unit, XSLT, HTML/DHTML.


Java Developer


  • Involved in the design, development and deployment of the Application using Java/J2EE Technologies.
  • Performed Requirements gathering and analysis and prepared Requirements Specifications document.
  • Provided high level systems design specifying the class diagrams, sequence diagrams and activity diagrams
  • Involved in designing user interactive web pages as the front-end part of the web application using various web technologies like HTML, JavaScript, Angular JS, AJAX and implemented CSS for better appearance and feel.
  • Integrated AEM to the existing web application and created AEM components using JavaScript, CSS and HTML.
  • Programmed Oracle SQL, T-SQL Stored Procedures, Functions, Triggers and Packages as back-end processes to create and update staging tables, log and audit tables, and creating primary keys.
  • Provided further Maintenance and support, this involves working with the Client and solving their problems which include major Bug fixing.
  • Deployed and tested the application using Tomcat web server.
  • Analysis of the specifications provided by the clients.
  • Developed JAVABEAN components utilizing AWT and SWING classes.
  • Extensively used Transformations like Aggregator, Router, Joiner, Expression, Lookup, Update Strategy, and Sequence Generator.
  • Used Exception handling and Multi-threading for the optimum performance of the application.
  • Used the Core Java concepts to implement the Business Logic.
  • Provided on call support based on the priority of the issues.
  • Designed and implemented a generic parser framework using SAX parser to parse XML documents which stores SQL.
  • Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).

Environment: Core Java, Servlets, struts, JSP, XML, XSLT, JavaScript, Apache, Oracle 10g/11g.

Hire Now