We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Austin, Tx


  • About 9 years of experience in Application analysis, Design, Development, Maintenance and Supporting web, Client - server based applications in Java/J2EE technologies which includes around 6+ years of experience with Big Data and Hadoop related components like HDFS, Map Reduce, Pig, Hive, YARN, Sqoop, Flume, Spark, Scala, and Kafka.
  • Experience in multiple Hadoop distributions like Cloudera, and Horton works.
  • Excellent understanding of NoSQL databases like HBase, Cassandra and MongoDB.
  • Experience on working structuredand unstructured data with various file formats such as Avro data files, xml files, JSON files, sequence files using Map Reduce programs.
  • Work experience with cloud configurations like Amazon web services (AWS).
  • Implemented custom business logic and performed join optimization, secondary sorting, custom sorting using Map Reduce programs.
  • Experienced testing and running of Map Reduce pipelines.
  • Expertise in Data ingestion using SQOOP, Apache Kafka, Spark Streaming and FLUME.
  • Implemented business logic using Pig scripts. Wrote custom Pig UDF’s to analyze data.
  • Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
  • Hands on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
  • Experience in designing, configuring and installing Datastax Cassandra.
  • Good understanding of Conceptual, Logical and Physical Data Modeling.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop, Map Reduce and Pig jobs.
  • Extensive experience with wiring SQL queries using HiveQL to perform analytics on data.
  • Experience in performing data validation using HIVE dynamic partitioning and bucketing.
  • Experienced in importing and exporting data between RDBMS and Teradata into HDFS using Sqoop.
  • Experienced in handling streaming data like web server log data using flume.
  • Good knowledge analyzing data using Python development and scripting for Hadoop Streaming.
  • Worked with Spark Data Frames, Spark SQL and Spark Mlib extensively.
  • Experience in implementing Spark using Scala and SparkSQL for faster processing of data.
  • Extensive Hands on experience with Accessing and perform CURD operations against HBase data using Java APIand implementing time series data management.
  • Hands-on experience with message broker such as Apache Kafka.
  • Employed in planning different stages of migrating data from RDBMS to Cassandra.
  • Expertise in benchmarking and load testing a Cassandra cluster Cassandra-stress tool.
  • Involved in various datamining tasks such as pattern mining, classification and clustering.
  • Experienced in J2EE, Spring, Hibernate, SOAP/Rest web services, JMS, JNDI, EJB .
  • Expertise with Application servers and web servers like Oracle Web Logic, IBM Web Sphere, Apache Tomcat, JBOSS and VMware .
  • Experienced in developing the unit test cases using MRUnitand JUnit.
  • Experience in using Maven and ANT for build automation.
  • Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
  • Expertise in database modeling, administration and development using SQLand PL/SQL in Oracle (8i, 9i and 10g), MySQL,DB2and SQL Server environments.


BigData / Hadoop: HDFS / Map Reduce / Hive / Pig / HBase / YARN / Sqoop / Flume/ Oozie / Scala / Kafka / Apache Spark / Spark Sql / AWS / Talend.

Databases / NoSQL: Cassandra / MongoDB / HBase / Hive / SQL / Pl/SQL / Oracle.

Web Technologies: HTML / CSS / AJAX / JavaScript / JQuery.

Web Services: Soap / Rest / XML / XSD.

J2EE Frameworks: Hibernate / Springs / JMS / JSF.

Operating Systems: Windows / Unix / Linux.

Methodologies: Agile, Waterfall.

Ide s / Tools: Eclipse / NetBeans/ Microsoft Visio.

Build Tools: Maven / Apache- ANT / Log4j.


Confidential, Austin, TX.

Sr. Hadoop Developer


  • Designed a pipeline to collect, clean, and prepare data for analysis using Map reduce, Spark, Pig, Hive and HBase and reporting using Tableau.
  • Developed and implemented script to send large amount of data to any Http Server, which is configurable in number of users, operations and a range of dates.
  • Created reports using Tableau using HiveQL.
  • Created/modified UDF and UDAFs for Hive and PIG whenever necessary.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Used Apache Kafka for handling log messages that are handled by multiple systems.
  • Worked with Data Staging Validation using Talend.
  • Involved with Unit testing and integration testing with Hue
  • Worked with Spark Data Frames, Spark SQL and Spark Mlib extensively.
  • Worked with Data Science team in developing Spark Mlib applications to develop various predictive models
  • Worked with Kafka streaming to fetch the data from real time.
  • Hands on experience in importing and exporting data from relational databases to HDFS and vice versa using Sqoop.
  • Worked on Impala for creating tables and querying data.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Processed the source data to structured data and stored to Cassandra.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in creating Hive tables and loading with data.

Environment: Hortonwork’s HDP, Java, Kafka, Pig, Hive, HDFS, Cassandra, UNIX, Spark, Scala, HBase,HiveQL, AWS, Tableau.

Confidential, Atlanta, GA

Hadoop Developer


  • Worked on analyzing and writing Hadoop MapReduce jobs using API, Pig and Hive.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Involved in benchmarking the Cassandra cluster for performance using Cassandra-stress tool.
  • Supported MapReduce Programs those are running on the cluster.
  • Installed and configured Pig, experienced in writing Pig Latin scripts.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Suggested data modeling performance and tuning techniques.
  • Configured internode communication between Cassandra nodes and client using SSL encryption.
  • Developed Java Map Reduce programs on mainframe data to transform into structured way.
  • Involved in installing Hadoop Ecosystem components.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Developed optimal strategies for distributing the mainframe data over the cluster. Importing and exporting the stored mainframe data into HDFS and Hive.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Implemented HBase API to store the data into HBase table from hive tables.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Created Hive tables and working on them using Hive QL.
  • Conducted POC for Hadoop and Spark as part of NextGen platform implementation.
  • Involved for Cassandra Database Schema design.
  • Worked on migrating data from relational databases to Cassandra.
  • Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
  • Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: CDH4, Java, MapReduce, HDFS, Hive, Spark, Scala, Cassandra,Pig, Linux, XML, MySQL, MySQL Workbench, Cloudera, Maven, Java 6, Eclipse, PL/SQL, SQL connector.

Confidential, Long Beach, CA

Hadoop Developer


  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Developed MapReduce Input format to read specific data format
  • Developing and maintaining Workflow Scheduling Jobs in Oozie.
  • Used Sqoop to transfer data from external sources to HDFS
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume.
  • Worked on different file formats like XML files, Sequence files, CSV and Map files.
  • Continuously monitored and managed Hadoop cluster using Cloudera Manager.
  • Performed POC’s using latest technologies like spark, Kafka, scala.
  • Created Hive tables, loaded them with data and wrote hive queries.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Analysis of Web logs using Hadoop tools for operational and security related activities.
  • Used all complex data types in Pig for handling data.
  • Developed efficient Map Reduce programs in java for filtering out the unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Managed and reviewed Hadoop log files to identify issues when job fails.

Environment: Hadoop,MapReduce, HDFS, HBase, Hive, Pig Java, XML, SQL, MySql, Scala, Pig, Sqoop, Oozie

Confidential, Cupertino, CA

Hadoop Consultant


  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
  • Developed MapReduce programs to parse the raw data, populate tables and store the refined data in partitioned tables.
  • Installed and configured Hadoop and Hadoop stack on a 4 node cluster.
  • Experienced in managing and reviewing application log files.
  • Ingest the application logs into HDFS and processes the logs using map reduce jobs.
  • Create and maintain Hive warehouse for Hive analysis.
  • Generate test cases for the new MR jobs.
  • Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing
  • Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
  • Created HBase tables to store various data formats of personally identifiable information data coming from different portfolios.
  • Involved in managing and reviewing Hadoop log files.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.

Environment: HDFS, Hive, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, MySQL, Tableau, Eclipse, Webservices, Oracle11g/SQL, JDBC and Websphere Applications.

Confidential, Fremont, CA

Software Engineer


  • Worked on Marshalling and Un Marshalling the XML using the JIBX Parser.
  • Interpreted and manipulated spring and hibernate configure files.
  • Worked on JMS and Messaging Queue (MQ) configurations.
  • Designed and developed GUI Screens for user interfaces using JSP, JavaScript, XSLT, AJAX, XML, HTML, CSS, JSON.
  • Good in Configure, Design, implement and monitor Kafka Cluster and connectors.
  • Generated the Class diagrams, Sequence diagrams extensity for the entire process flow using RAD.
  • Consumed external web services by creating service contract through WSRR from different Development centers.
  • Worked on SOAP based Web services, tested Web Services using SOAP UI.
  • Used Jenkins tool to build the application on the server.
  • Developed documentation for QA Environment.
  • Loaded the records from Legacy database to Cassandra.
  • Synchronized the create, Update and delete of records between Legacy Database and Cassandra.
  • Created stored procedures, SQL Statements and triggers for the effective retrieval and storage of data into database.
  • Application developed on Agile methodologies scrum and iterative method process.
  • Used Apache Log4j logging API to log errors and messages.
  • Deployed applications on Unix Environment for Dev, QA-Smoke
  • Unit tested the application using JUnits and Easy Mock.

Environment: JDK, Spring Framework, XML, HTML, Cassandra, JSP, Hibernate, ANT, Java Script, XSLT, CSS, AJAX, JMS, SOAP Web Services, Web Sphere Application Server, PL/SQL, Junit, Log4j, Shell scripting, UNIX.

Confidential, Long Beach, CA

Java/J2EE Developer


  • Developed front-end screens using JSP, HTML and CSS.
  • Developed server-side code using Struts and Servlets.
  • Developed core java classes for exceptions, utility classes, business delegate, and test cases.
  • Developed SQL queries using MySQL and established connectivity.
  • Worked with Eclipse using Maven plugin for Eclipse IDE.
  • Designed the user interface of the application using HTML5, CSS3, JSF 2.0 JSP and JavaScript.
  • Tested the application functionality with JUnit Test Cases.
  • Developed all the User Interfaces using JSP and Struts framework.
  • Writing Client-Side validations using JavaScript.
  • Extensively used JQuery for developing interactive web pages.
  • Experience in developing web services for production systems using SOAP and WSDL.
  • Developed the user interface presentation screens using HTML, XML, and CSS.
  • Experience in working with spring using AOP, IOC and JDBC template.
  • Developed the Shell scripts to trigger the Java Batch job, Sending summary email for the batch job status and processing summary.
  • The application was developed in Eclipse IDE and was deployed on Tomcat server.
  • Supported for bug fixes and functionality change.

Environment: Java, Struts 1.1, Servlets, JSP, HTML, CSS, JavaScript, Eclipse 3.2, Tomcat, Maven, MySQL, Windows and Linux, JUnit.

Hire Now