We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Tampa, FL


  • 8+ years of total IT experience which includes Java Development, Web application Development, Database Management and Big Data ecosystem technologies.
  • Around 5+ years of Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, Map Reduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Cassandra, Hortonworks, Cloudera, Mahout, Avro and Scala .
  • Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning)
  • Skilled programming in Map-Reduce framework and Hadoop ecosystems.
  • Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
  • Extracted and updated the data into MongoDB using MONGO import and export command line utility interface.
  • Worked on a live 60 nodes Hadoop cluster running Cloudera CDH4 .
  • Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3).
  • Implemented Commissioning and Decommissioning of new nodes to existing cluster.
  • Extracted the data from Relational Database (SQL, Oracle, DB2) into HDFS using Sqoop .
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables .
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • In-depth understanding of scala Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
  • Experience in using Sequence files, RCFile, AVRO and HAR file formats.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Implemented authentication using Kerberos and authentication using Apache Sentry .
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4
  • Good knowledge of Amazon Web Service (AWS) components like EC2, EMR, S3 etc
  • Experience and knowledge NoSQL database like HBase, Cassandra or MongoDB .
  • Handled Administration, installing, upgrading and managing distributions of Cassandra.
  • Advanced knowledge in performance troubleshooting and tuning Cassandra clusters .
  • Done Scaling Cassandra cluster based on lead patterns.
  • Good understanding of Cassandra Data Modelling based on applications.
  • Experience with Cassandra Performance tuning.
  • Highly involved in development/implementation of Cassandra environment.
  • Good Working knowledge of BI Tools like Tableau, Spotfire Qlikview .
  • Extensive experience with SQL, PL/SQL and database concepts
  • Participated in requirement analysis, reviews and working sessions to understand the requirements and system design
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Experience in developing Internet Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.
  • Experience with Enterprise Java Beans (EJB) components Technical expertise & demonstrated high standards of skills in J2EE frameworks like Struts (MVC Framework).
  • Experience in developing MVC architecture using Servlets, JSP, Struts Framework, Hibernate Framework and Spring Framework.
  • Good knowledge of Data Modeling, Data Warehousing concepts, OLAP Concepts, Star Schema, Snowflake Schema, and Entity-Relationship Diagrams.
  • Proficient using version control tools like SVN and GIT.
  • Proactive and well organized with effective time management skills and problem solving skills
  • Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines


Hadoop/Big Data: Hadoop (Yarn), HDFS, Scala, MapReduce, Spark, Hive, Pig, Sqoop, Flume, Kafka, Storm, Zookeeper, Oozie, Tez, Impala, Mahout, Ganglia, Nagios.

Java/J2EE Technologies: Java Beans, JDBC, Servlets, RMI & Web services

Development Tools: Eclipse, IBM DB2 Command Editor, QTOAD, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access), VM Ware

Web/Application Servers: Apache Tomcat, WebLogic, WebSphere Application Server, WebSphere.

Frameworks: Hibernate, EJB, Struts, Spring

Programming/Scripting Languages: Java, SQL, Unix Shell Scripting, Python, AngularJS.

Databases: Oracle 11g/10g/9i, MySQL, SQL Server2005,2008

NoSQL Databases: HBase, Cassandra, MongoDB

ETL Tools: Informatica

Visualization: Tableau and MS Excel.

Modeling languages: UML Design, Use case, Class, Sequence, Deployment and Component diagrams.

Version Control Tools: Sub Version (SVN), Concurrent Versions System (CVS) and IBM Rational ClearCase.

Methodologies: Agile/ Scrum, Rational Unified Process and Waterfall.

Operating Systems: Windows 98/2000/XP/Vista/7/8, 10, Macintosh, Unix, Linux and Solaris.


Confidential, Tampa, FL

Hadoop Developer


  • Involved in the high-level design of the Hadoop architecture for the existing data structure and Problem statement and setup the 64-node cluster and configured the entire Hadoop platform.
  • Implemented Data Interface to get information of customers using Rest API and Pre-Process data using MapReduce 2.0 and store into HDFS (Hortonworks)
  • Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Hortonworks.
  • Used NoSQL, Cassandra, Mongo databases and assisted in troubleshooting and optimization of MapReduce jobs and Pig Latin scripts.
  • Extensively worked with Cloudera Distribution Hadoop, CDH5.x, CDH4.x
  • Worked on a live 60 nodes Hadoop cluster running Cloudera CDH4
  • Created Airflow Scheduling scripts in Python to automate the process of sqooping wide range of data sets.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark)
  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3)
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Athena and Snowflake.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Designed, developed and maintained data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as we as RDBMS and NoSQL data stores for data access and analysis.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Developed multiple Map Reduce jobs for data cleaning and preprocessing.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance
  • Implemented Partitioning, Dynamic partitions and Bucketing in Hive.
  • Extensively used Pig for data cleansing and processing and performing transformations.
  • Used Mahout to understand the machine learning algorithms for efficient data processing.
  • Developed and configured Oozie workflow engine for scheduling and managing the Pig, Hive and Sqoop jobs.
  • Used Zookeeper for various types of centralized configurations.
  • Used Apache Tez for performing batch and interactive data processing applications on Pig and Hive jobs.

Environment: Hadoop, AWS, CDH, Elastic Map Reduce, Hive, Spark, Airflow, Zepplin, Source Tree, Bit Bucket, SQL Workbench, GenieLogs, Snowflake, Athena, Jenkins.

Confidential, Farmers Branch, TX

Hadoop/ Scala Developer


  • Extracted, Updated and loaded the data from the different data sources into HDFS utilizing SQOOP import/export command line utility.
  • Loaded data from UNIX file system to HDFS and created Hive tables, loaded and analyzed data using Hive queries.
  • Data was loaded back to the Teradata for the BASEL reporting and for the business users to analyze and visualize the data using Datameer.
  • Developed UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using Scala
  • Defined some PIG UDF for some financial functions such as swap, hedging, Speculation and arbitrage.
  • Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Streamlined the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive queries.
  • Created custom python/shell scripts to import data via SQOOP from Oracle databases.
  • Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script.
  • Experienced in Spark Context, Spark SQL, Pair RDD and Spark YARN.
  • Handled moving of data from various data sources and performed transformations using Pig.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Experience in improving the search focus and quality in Elastic Search by using aggregations.
  • Worked with Elastic Mapreduce and setup Hadoop environment in AWS EC2 Instances.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.

Environment: Hadoop, MapReduce, Sqoop, HDFS, HBase, Hive, Pig, Oozie, Spark, Kafka, Cassandra, AWS, Elastic Search, Java, Oracle 10g, MySQL, Ubuntu, HDP.

Confidential, San Ramon, CA

Hadoop Developer


  • Implemented data movement process and migrated large amounts of data from RDBMS to Hadoop and vice-versa using SQOOP.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE (EMR) and setting up environments on Amazon AWS EC2 instances.
  • Created custom python/shell scripts to import data via SQOOP from various SQL databases such as Teradata, SQL Server, and Oracle.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
  • Worked with Elastic Mapreduce and setup Hadoop environment in AWS EC2 Instances
  • Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script.
  • Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
  • Written Storm Bolt to emit data into Hbase, HDFS, and Rabbit-MQ Web Stomp.
  • Deployed an Apache Solr search engine server to speed up the search process.
  • Created a wrapper library to help the rest of the team use the Solr database.
  • Customized Apache Solr to handle fallback searching and provide custom functions.
  • Involved in managing and reviewing Hadoop log files for any warnings or failures.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, the database staging, job scheduling and Error Handling
  • Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.

Environment: Hadoop, Cloudera (CDH 4), HDFS, Hive, HBase, Flume, Sqoop, Pig, Kafka Java, Eclipse, Teradata, AWS, Tableau, Talend, MongoDB, Ubuntu, UNIX, and Maven.

Confidential, Atlanta, GA

Hadoop Developer/ Admin


  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Extensively worked with Cloudera Distribution Hadoop, CDH5.x, CDH4.x
  • Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
  • Developed cost effective and fault tolerant systems using AWS EC2 instances, auto scaling and Elastic Load Balance.
  • Participated in configuring Hadoop cluster on AWS.
  • Integrated tools SAS and Tableau with Hadoop for easier access of data from HDFS and HIVE.
  • Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
  • Optimized HDFS manually to decrease network utilization and increase job performance.
  • Set up automated processes to archive/clean the unwanted data on the cluster, in particular on HDFS and Local file system.
  • Managed Hadoop services like Namenode, Datanode, Jobtracker, Tasktracker etc.
  • Monthly Linux server maintenance, shutting down essential Hadoop name node and data node.
  • Implemented data ingestion techniques like Pig and Hive on production environment.

Environment: Hadoop 1.x and 2.x, MapReduce, HDFS, Hive, SQL, AWS, Cloudera Manager, Pig, Sqoop, Oozie, CDH3 and CDH4, Apache Hadoop.


Java Developer


  • Designed and developed the server-side layer using XML, JSP, JDBC, JNDI, EJB and DAO patterns using eclipse IDE.
  • Involved in design and development of a front-end application using JavaScript, CSS, HTML and JSPs and spring MVC for registering a new entry and management and configured it to connect to database using Hibernate.
  • Developed java beans and JSP's by using spring and JSTL tag libs for supplements.
  • Development of EJB’s, Servlets and JSP files for implementing Business rules and Security options using IBM Web Sphere.
  • Developed and tested the Efficiency Management module using EJB, Servlets, and JSP &Core Java components in WebLogic Application Server.
  • Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
  • Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
  • Configured the deployment descriptors in Hibernate to achieve object relational mapping.
  • Involved in developing Stored Procedures, Queries and Functions.
  • Write SQL queries to pull some information from the Backend.
  • Designed and implemented the architecture for the project using OOAD, UML design patterns.
  • Extensively involved working in the usage of HTML, CSS, JavaScript and Ajax for client-side development and validations.
  • Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server, Oracle and DB2.
  • Participated in requirement gathering and converting the requirements into technical specifications.

Environment: Java, JSF Framework, Eclipse IDE, Ajax, Apache Axis, OOAD, Web Logic, Java script, HTML, XML, CSS, SQL Server, Oracle, Web services, Ajax, Spring, OOAD and UML, Windows.


Java Developer


  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Designed JSP’s and Servlets for navigation among the modules.
  • Developed various EJBs for handling business logic and data manipulations from database.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
  • Provided technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
  • Built and deployed Java applications into multiple UNIX based environments and produced both unit and functional test results along with release notes.
  • Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.

Environment: Java, spring, Jsp, Hibernate, XML, HTML, JavaScript, JDBC, CSS, SOAP Web services.

Hire Now