We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Santa Clara, CaliforniA


  • Over 8 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
  • Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
  • Strong understanding of various Hadoop services, MapReduce and YARN architecture.
  • Responsible for writing Map Reduce programs.
  • Experienced in importing-exporting data into HDFS using SQOOP.
  • Experience loading data to Hive partitions and creating buckets in Hive.
  • Developed Map Reduce jobs to automate transfer the data from HBase.
  • Expertise in analysis using PIG, HIVE and MapReduce.
  • Experienced in developing UDFs for Hive, PIG using Java.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Scheduling all Hadoop/hive/Sqoop/HBase jobs using Oozie.
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
  • Experience in gathering and defining functional and user interface requirements for software applications.
  • Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
  • Expert in utilizing Kafka for messaging and publishing subscribe messaging system.


Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala

No SQL Databases: HBase, Cassandra, mongo DB

Languages: Java, Python, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: SQL Server, MySQL

Tools: and IDE: Eclipse, IntelliJ.


Confidential, Santa Clara, California

Hadoop Developer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark
  • Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations.
  • Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
  • Developed the MapReduce programs to parse the raw data and store the pre-Aggregated data in the partitioned tables.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in MapReduce.
  • Involved in using HCATALOG to access Hive table metadata for MapReduce code
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Converted unstructured data to structured data by writing Spark code.
  • Indexed documents using Apache Solr.
  • Set up Solr Clouds for distributing indexing and search.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR, PIG, and Hive jobs using Kettle and Oozie (Work Flow management)
  • Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Used Talend ETL tool to develop multiple jobs and in setting workflows.
  • Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
  • Worked on MongoDB for distributed storage and processing.
  • Designed and implemented Cassandra and associated RESTful web service.
  • Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
  • Involved in agile methodologies, daily scrum meetings, Spring planning's.
  • Handling All Azure Management Tools on Daily basis
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and maintain them.
  • Utilized data fabrics to
  • Configure data fabric which provides seamless, real-time integration and access across the multiple data silos of a big data system
  • Enable the processing, management, storage and analysis of data using data fabric.
  • Use data mesh to make predictions future sales and predictions of the company
  • Leverage the data and utilized machine learning algorithm.

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, Data Mesh.

Confidential, Detroit, Michigan

Hadoop developer


  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, managing. Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK.
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera.
  • Troubleshooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
  • Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters on Cloudera (CDH 5.5.2) distribution.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Migrated Flume with Spark for real time data and developed the Spark Streaming Application with java to consume the data from Kafka and push them into Hive.
  • Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to HDFS. Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Using Flume and Spool directory loading the data from local system to HDFS.
  • Retrieved data from HDFS into relational databases with Sqoop.
  • Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Involved in chef-infra maintenance including backup/security fix on Chef Server.
  • Deployed application updates using Jenkins. Installed, configured, and managed Jenkins
  • Triggering the SIT environment build of client remotely through Jenkins.
  • Deployed and configured Git repositories with branching, forks, tagging, and notifications.
  • Experienced and proficient deploying and administering GitHub
  • Deploy builds to production and work with the teams to identify and troubleshoot any issues.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication, schema design.
  • Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Viewing the selected issues of web interface using SonarQube.
  • Developed a fully functional login page for the company's user facing website with complete UI and validations.
  • Installed, Configured and utilized AppDynamics (Tremendous Performance Management Tool) in the whole JBoss Environment (Prod and Non-Prod).
  • Reviewed OpenShift PaaS product architecture and suggested improvement features after conducting research on Competitors products.
  • Migrated data source passwords to encrypted passwords using Vault tool in all the JBoss application servers
  • Participated in Migration undergoing from JBoss 4 to Web logic or JBoss 4 to JBoss 6 and its respective POC.
  • Responsible for upgradation of SonarQube using upgrade center.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
  • Conduct performance tuning of the Hadoop Cluster and map reduce jobs. Also, the real-time applications with best practices to fix the design flaws.
  • Implemented Oozie work-flow for ETL Process for critical data feeds across the platform.
  • Configured Ethernet bonding for all Nodes to double the network bandwidth
  • Implementing Kerberos Security Authentication protocol for existing cluster.
  • Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.

Environment: HDFS, Map Reduce, Hive 1.1.0, Kafka, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Apache Hadoop 2.6, Spark, SOLR, Storm, Cloudera Manager, Red Hat, MySQL, Prometheus, Docker, Puppet.

Confidential, Bentonville, Arkansas

Hadoop Developer


  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential, New York, New York



  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system.
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

Hire Now