We provide IT Staff Augmentation Services!

Senior Big Datahadoop Developer Resume

Buffalo New, YorK


  • Around 7 years of experience in Information Technology which includes experience in Big data, Hadoop Ecosystem like HDFS, MapReduce, Yarn, Pig Hive,HBase, Sqoop, Oozie, Flume, Zookeeper.
  • Cloudera Certified Administrator for Apache Hadoop (CCAH)
  • 5 Years’ experience installing, configuring, testing Hadoop ecosystem components.
  • Experience in Writing Map Reduce programs in Java.
  • Excellent work Experience with 5 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS .
  • Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks HDP.
  • Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
  • Experience in managing Hadoop cluster using Cloudera Manager and AMBARI.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
  • Hands on experience working with Java project build managers Apache MAVEN.
  • Knowledge of UNIX and shell scripting.
  • Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
  • In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance , Polymorphism, Exception handling and Templates and Development experience with Java technologies.
  • Experienced in configuring Workflow scheduling using Oozie.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
  • Designed, and implemented a hybrid cloud virtual data center utilizing AWS to provide servers, storage, networks. High - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
  • Worked on NoSQL databases including Hbase , Cassandra and Mongo DB.
  • Cluster planning and engineering of POC and Production Clusters.
  • Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
  • Used A pache Kafka for tracking data ingestion to Hadoop cluster.


Operating systems: WINDOWS, LINUX (Fedora, CentOS), UNIX

Languages and Technologies: C, C++, Java, SQL, PLSQL

Scripting Languages: Shell scripting

Databases: Oracle, MySQL, Postgre SQL

IDE: Eclipse and Net Beans, SBT

Application Servers: Apache Tomcat server, Apache HTTP webserver

Hadoop Ecosystem: HadoopMapReduce, HDFS, Flume, Sqoop, Hive, Pig, Oozie, Cloudera Manager Zookeeper. AWS EC2

Apache Spark: Spark, Spark SQL, Spark Streaming.SCALA, spark with python

Cluster Mgmt.& Monitoring: Cloudera Manager, Horton works Ambari, Ganglia and Nagios.

Security: Kerberos.


Confidential, Buffalo, NEW YORK

Senior Big DataHadoop Developer


  • Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
  • Experience using HORTONWORKS data platform.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked in AWS environment for development and deployment of Custom HADOOP Applications.
  • Experience working with Apache NIFI.
  • Developed Pig UDF s to pre-process the data for analysis.
  • Experience in implementing ETL/ELT processes with MapReduce, PIG, Hive.
  • Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
  • Dealing with high volume of data in the cluster.
  • Collected logs data from web servers and integrated into HDFS using Apache Flume.
  • Tuned the developed ETL jobs for better performance.
  • Imported logs from web servers with Flume to ingest the data into HDFS
  • Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX
  • Good troubleshooting skills on Hue , which provides GUI for developer’s/business users for day to day activities
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Used Oozie and Zookeeper for workflow scheduling and monitoring.
  • Created Hive Managed and External tables defined with static and dynamic partitions.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Spark Streaming is used to get the Web server log files.
  • Integrated Apache Spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.


Confidential, Des Monies, IA

Senior Hadoop Developer


  • Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode high Availability, capacity planning, and slots configuration.
  • Good Expertise in ETL & Reporting Tools (AB INITIO, Informatica, Talend) .
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
  • Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
  • Used HORTONWORKS data platform.
  • Experienced on adding/installation of new components and removal of them through Ambari.
  • Monitoring systems and services through Ambari dashboard to make the clusters available for the business
  • Worked on analyzing, writing HadoopMapReduce jobs using Java API, Pig and Hive
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
  • Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoop programs using Oozie.
  • Works with ETL workflow, analysis of big data and loaded them into Hadoop cluster
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Developed Spark Application by using Scala
  • Load and transform large sets of structured, semi structured and unstructured data
  • ConfiguredMySQL Database to store Hive metadata.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
  • Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
  • Responsible for loading unstructured data into Hadoop File System (HDFS).
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Worked on Map reduce Joins in querying multiple semi-structured data as per analytic needs.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.


Confidential, Seattle, WA

Hadoop developer


  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary NameNode, Resource Manager, Node Manager and Data Nodes.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Installed Oozie workflow engine to run multiple Hive Jobs
  • Worked with Kafka for the proof of concept for carrying out log processing on distributed system.
  • Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
  • Worked on Hue interface for querying the data
  • Automating system tasks using Puppet.
  • Created Hive tables to store the processed results in a tabular format.
  • Utilized cluster co-ordination services through ZooKeeper.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Nagios
  • Configuring Sqoop and Exporting/Importing data into HDFS
  • Configured NameNode high availability and NameNode federation.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Configured NameNode high availability and NameNode federation.
  • Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
  • Data analysis in running Hive queries.
  • Generated reports using the Tableau report designer.

Environment: HDFS, Cloudera Manager, Map Reduce, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Puppet, Tableau, and Java.


Java Developer


  • Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped the developers to implement successfully.
  • Designed the Hadoop jobs to create the product recommendation using collaborative filtering.
  • Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
  • Integrated the Order Capture system with Sterling OMS using JSON Web service
  • Configured the ESB to transform the Order capture XML to Sterling message.
  • Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
  • Mentored and implemented the test driven development (TDD) strategies.
  • Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
  • Developed the Data transformation script using hive and MapReduce.
  • Designed and implemented the Open API using Spring REST webservice.
  • Proposed the integration pipeline testing strategy-using cargo.

Hire Now