We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York, NY


  • Around 6 years of IT Experience in analysis, implementation and testing of enterprise - wide application, Data warehouse, client-server technologies and web-based applications.
  • Over 3 years of experienced in administrative tasks such as multi-node Hadoop installation and maintenance
  • Experience in deploying Hadoop 2.0 (YARN) and administration of Hbase, Hive, Sqoop, HDFS
  • Installed, configured, supported and managed Apache Ambari in Hortonworks Data Platform 2.5, Cloudera Distribution Hadoop 5.x, Linux, Rackspace and AWS cloud infrastructure.
  • Understand the security requirements for Hadoop and integrated with Kerberos infrastructure
  • Good knowledge on Kerberos security while successfully Maintained the cluster by adding and removal of nodes.
  • Handsome experience in Linux admin activities.
  • Experience in minor and major upgrades of Hadoop and Hadoop eco system.
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Involved in bench marking Hadoop / Hbase cluster file systems various batch jobs and workloads.
  • Set up the Linux environments, Password less SSH, creating file systems, disabling firewalls and installing Java.
  • Experienced in job scheduling using oozie.
  • Hands on experience in analyzing Log files for Hadoop ecosystem services and finding root cause.
  • Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
  • Involved with File transmission and electronic data interchange, trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
  • Experience in dealing with Hadoop cluster and integration with its Ecosystem like HIVE, HBase, Sqoop, Spark, Oozie etc.
  • Experienced in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
  • Monitor health of the platforms and Generate Performance Reports and Monitor and provide continuous improvements.


Hadoop Developer

Confidential, New York, NY


  • Responsible for building scalable distributed data solutions using Hadoop
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
  • Involved in loading the created HFiles into Hbase for faster access of large customer base without taking Performance hit.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Involved in creation and designing of data ingest pipelines using technologies such as Kafka.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Created Hbase tables to store various data coming from different portfolios.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Involved in managing and reviewing Hadoop log files.
  • Responsible to manage data coming from different sources.
  • Transferred the data using Informatica tool from AWS S3 to AWS Redshift. Involved in file movements between HDFS and AWS S3.
  • Create a complete processing engine, based on Hortonworks' distribution, enhanced to performance.
  • Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
  • Handled Administration, installing, upgrading and managing distributions of Cassandra.
  • Assisted in performing unit testing of Map Reduce jobs using MRUnit.
  • Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduce jobs that extract the data on a timely manner.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Worked with Hue GUI in scheduling jobs with ease and File browsing and Job browsing.
  • Worked with Talendon a POC for integration of data from the data lake.
  • Involved in development/implementation of Cassandra environment.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, DataMesh.


Confidential, New York, NY


  • Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Hive and Sqoop
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed data pipeline using Flume, Sqoop, and MapReduce to ingest behavioral data into HDFS for analysis.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Experienced in using Kafka as a data pipeline between JMS and Spark Streaming Applications.
  • Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
  • Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
  • Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
  • Created Hive Generic UDF's, UDAF's, UDTF's in java to process business logic that varies based on policy.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Experienced on Loading streaming data into HDFS using Kafka messaging system.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Worked with NoSQL database HBase to create tables and store data.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Design technical solution for real-time analytics using Kafka and HBase.
  • Created UDF's to store specialized data structures in HBase and Cassandra.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
  • Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
  • Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
  • Imported structured data, tables into HBase.
  • Involved in Backup, HA, and DR planning of applications in AWS.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
  • Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Used AWS Patch Manager to select and deploy operating system and software patches across EC2 instances.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
  • Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
  • Set-up configured and optimized the Cassandra cluster. Developed real-time java-based application to work along with the Cassandra database.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Converting queries to Spark SQL and using parquet file as storage format.
  • Developed analytical component using Scala, Spark and Spark Stream.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Written spark programs in Scala and ran spark jobs on YARN.
  • Assembled Hive and HBase with Solr to build a full pipeline for data analysis.
  • Written Storm topology to emit data into Cassandra DB.
  • Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, Microsoft Azure.


Confidential, McLean, VA


  • Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Hire Now