We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Santa Clara, CA


  • 5+ years of IT industry experience with 4 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
  • 3+ years of experience in the Application Development and Maintenance of SDLC projects using Java technologies.
  • Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
  • Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Namenode, and MapReduce concepts.
  • Developed applications for Distributed Environment using Hadoop, Mapreduce and Python.
  • Experience in data extraction and transformation using MapReduce jobs.
  • Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
  • Performed data analysis using Hive and Pig.
  • Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Well versed with job workflow scheduling and monitoring tools like Oozie
  • Developed MapReduce jobs to automate transfer of data from HBase.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Loaded streaming log data from various webservers into HDFS using Flume.
  • Experience in using Sqoop, Oozie and Cloudera Manager.
  • Hands on experience in application development using RDBMS, and Linux shell scripting.
  • Have experience with working on Amazon EMR and EC2 Spot instances
  • Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
  • Support development, testing, and operations teams during new system deployments.
  • Solid understanding of relational database concepts.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Hands on experience in Tableau to generate Hadoop data report.
  • Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments


Programming languages: C, C++, Java, Python, Scala, R

HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, Elastic Search

Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Operating Systems: Windows, Unix, Linux, Ubuntu.

Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.

Reporting Tools: Tableau

Web/Application Servers: Apache Tomcat, Sun Java Application Server

IDE Tools: IntelliJ, Eclipse, NetBeans

Scripting: BASH, JavaScript

Version Controls: GIT, SVN

Cloud Services: Amazon Web Services

Monitoring Tools: Nagios, Ganglia

Build Tools: Maven


Confidential, Santa Clara, CA

Hadoop Developer


  • Implemented workflows to process around 400 messages per second and push the messages to the DocumentDB as well as Event Hubs.
  • Developed a custom message producer which can produce about 4000 messages per second for scalability testing.
  • Implemented call-back architecture and notification architecture for real time data.
  • Implemented spark streaming in scala to process the JSON Messages and push them to the kafka topic.
  • Created Custom Dashboards Using Aplication Insights and Aplication Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
  • Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
  • Developed a custom message consumer to consume the data from the kafka producer and push the messages to service bus and event hub (Azure Components).
  • Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
  • Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB.
  • Implemented Zero Down Time deployment for the entire production pipelines in Azure.
  • Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
  • Experienced in implementing the pipelines in Jenkins.
  • Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
  • Used APP Insights, DocumentDB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.

Environment: Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-sql, scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.

Confidential, San Francisco

Hadoop Developer


  • Collaborating with our CD team to design, deploy, manage and operate scalable, highly available, and fault tolerant systems on AWS.
  • Ensure data integrity and data security within our production system.
  • Develop and deliver within our Continuous Delivery Framework.
  • Shifting legacy applications to AWS.
  • Handle billions of log lines coming from several clients and analyze those using big data technologies like Hadoop (HDFS), Apache Kafka and Apache Storm.
  • Continuous improvement of code to handle more events coming into the cluster.
  • Scaling the cluster accordingly to handle sudden spike in the incoming logs.
  • Monitoring the entire cluster in Ambari and troubleshooting the storm supervisors, Kafka brokers and zoo keeper.
  • Query for huge sets of data for the event generation in No SQL database such as Vertica.
  • Query for feeds/regexes in MS SQL for URL module in the cluster.
  • Implement log metrics using log management tool such as Elastic Search, Log stash, Kibana (ELK stack) and visualize them in dashboards like Grafana and Kibana.
  • Use of YAML templating to send the metrics through Filebeat.
  • Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics.
  • Query in Amazon Athena with the alerts coming from S3 buckets and finding out the alerts generation difference from the Kafka cluster and Kinesis cluster.
  • Extensive use of Python for managing services in AWS using boto library.
  • Use of cloud orchestration technologies like Terraform to spin up the clusters.
  • Use terraform to setup security groups and CloudWatch metrics in AWS.
  • Aggressive unit testing of java code using Junit and Mockito.

Environment: AWS, MongoDB, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Linux, Agile development

Confidential, Stamford, CT

Hadoop Developer


  • Responsible for gathering all required information and requirements for the project.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Real time streaming the data using Spark with Kafka.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS using Sqoop and Kafka.
  • Experience working on processing unstructured data using Spark and Hive.
  • Involved in writing custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Created Hive tables and implemented Partitioning, Dynamic Partitions, Buckets on the tables.
  • Supported Map Reduce Programs those are running on the cluster.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in writing code with Scala which has support for functional programming.
  • Involved in scheduling Oozie workflow engine to run multiple pig jobs.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Monitored and scheduled the UNIX scripting jobs.
  • Gained knowledge in NoSQL database with Cassandra and MongoDB.
  • Experience in Agile Programming and accomplishing the tasks to meet deadlines
  • Exported the result set from Hive to MySQL using Shell scripts.
  • Actively involved in code review and bug fixing for improving the performance.

Environment: Hadoop Cloudera Distribution(CDH4), Java 7, Hadoop 2.5.2, Spark, SparkSQL, MLib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig 0.14.0, Apache Hive 1.0.0, HDFS, Sqoop, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.


Java Developer


  • Involved in Analysis, Design, Implementation and Bug Fixing Activities.
  • Designing the initial Web-WAP pages for a better UI as per the requirement.
  • Involved in Functional & Technical Specification documents review and also the code review.
  • Undergone training on the Domain Knowledge.
  • Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
  • Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Prepared the Support Guide containing the complete functionality.

Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.

Hire Now