We provide IT Staff Augmentation Services!

Hadoop Admin Resume

New York New, YorK


  • Over 8 years of professional IT experience which includes around 4 years of hands on experience in Hadoop using Cloudera, Hortonworks, and Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Cassandra and Flume.
  • Hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sqoop, Flume, Hive, HBase, Spark, Pig, Oozie.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Experience in installing and configuring Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Experience in deploying a Hadoop cluster using Cloudera 5.X integrated with Ambari for monitoring and Alerting.
  • Experience in launching and setting up of Hadoop Cluster on AWS as well as physical servers, which includes configuring different components of Hadoop.
  • Experience in developing and monitoring Puppet Configuration Manager to automate the configuration files of Hadoop Ecosystem.
  • Experience in configuring various configuration files like core - site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
  • Experience in installing and configuring the Zookeeper to co-ordinate the Hadoop daemons
  • Working knowledge in importing and exporting data into HDFS using Sqoop.
  • Experience in defining batch job flows with Oozie.
  • Experience in Loading log data directly into HDFS using Flume.
  • Experienced in managing and reviewing Hadoop log files to troubleshoot the issues occurred.
  • Experience in following standard Back up Measures to make sure the high availability of cluster.
  • Experience in Implementing Rack Awareness for data locality optimization.
  • Experience in scheduling snapshots of volumes for backup and find root cause analysis of failures and documenting bugs and fixes; scheduled downtimes and maintenance of cluster.
  • Experience in database imports, worked with imported data to populate tables in Hive.
  • Exposure about how to export data from relational databases to Hadoop Distributed File System.
  • Experience in cluster maintenance, commissioning and decommissioning the data nodes.
  • Experience in monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Experience working with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Experience in monitoring multiple Hadoop clusters environments using Ganglia and Nagios as well as workload, job performance and capacity planning using Cloudera Manager.
  • Experience in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
  • Hands on experience in Linux Hadoop activities on RHEL &Cent OS.
  • Knowledge on Cloud technologies like AWS Cloud.
  • Experience in Benchmarking, Backup and Disaster Recovery of Name Node Metadata.
  • Experience in performing minor and major Upgrades of Hadoop Cluster.
  • Experience working with popular frame works like Spring MVC, Hibernate.
  • Experience with Source Code Management tools and proficient in GIT.
  • Excellent interpersonal and communication skills, creative, research-minded with problem solving skills.
  • Ensure that critical customer issues are addressed quickly and effectively.
  • Apply troubleshooting techniques to provide solutions to our customer's individual needs.
  • Troubleshoot, diagnose and potentially escalate customer inquiries during their engineering and operations efforts


Confidential, New York, New York

Hadoop Admin


  • Worked on developing architecture document and proper guidelines
  • Worked on installing Kafka on Virtual Machine.
  • Created topic for different users
  • Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
  • Setup ACL/SSL security for different users and assign users to multiple topics
  • Develop security for users and they can connect with SSL security
  • Assign access to users by multiple user’s login.
  • Experience in performing backup, recovery, failover and DR practices on multiple platforms.
  • Installed and configured Hadoop multi-node cluster and maintenances by Nagios.
  • Install and configure different Hadoop ecosystem components such as Spark, HBase, Hive, Pig etc. as per requirement.
  • Configuring HA to various services such as Name Node, Resource Manager, HUE etc., as required to maintain the SLA of the organization.
  • Run the benchmark tools to test the cluster performance
  • Performed POC to assess the workloads and evaluate the resource utilization and configure the Hadoop properties based on the benchmark result.
  • Tuning the cluster based on the POC and benchmark results.
  • Commissioning and de-commissioning of cluster nodes.
  • Monitoring System Metrics and logs for any problems
  • Experience with AWS Cloud (EC2, S3 & EMR)
  • Expertise building Cloudera, Hortonworks Hadoop clusters on bare metal and Amazon EC2 cloud.
  • Experienced in installation, configuration, troubleshooting and maintenance of Kafka & Spark clusters.
  • Experience in setting up Kafka cluster on AWS EC2 Instances.
  • Monitoring System Metrics and logs for any problems using Check Mk monitoring tool.
  • User provisioning (creation and deletion) of user on Prod and Non-Prod cluster according to client request.
  • Ensure that critical customer issues are addressed quickly and effectively.
  • Apply troubleshooting techniques to provide solutions to our customer's individual needs.
  • Troubleshoot, diagnose and potentially escalate customer inquiries during their engineering and operations efforts.
  • Investigate product related issues both for individual customers and for common trends that may arise.
  • Resolve customer problems via telephone, email or remote access.
  • Maintain customer loyalty through integrity and accountability.
  • Research customer issues in a timely manner and follow up directly with the customer with recommendations and action plans.
  • Created table in hive and use static, dynamic partition for data slicing mechanism
  • Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
  • Good understanding on cluster configurations and resource management using YARN

Environment: Hadoop, Confluent Kafka, Cloudera, Cloudera Manager, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Red hat/Centos 7.6.


Hadoop Admin


  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users’ login.
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Having knowledge on documenting processes, server diagrams, preparing server requisition documents
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound, or CPU bound
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Working on setting up 100 node production cluster and a 400-node backup cluster at two different data centers
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance of MapReduce Jobs.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
  • Developed various algorithms for generating several data patterns. Used JIRA for bug tracking and issue tracking.
  • Developed Python/Django application for Analytics aggregation and reporting.
  • Used Django configuration to manage URLs and application parameters.
  • Generated Python Django Forms to record data of online users
  • Used Python and Django creating graphics, XML processing, data exchange and business logic
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing
  • Involved in the development of Spark Streaming application for one of the data sources using Scala, Spark by applying

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python.


Hadoop Admin


  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Installed, configured, monitored, and maintained HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive.
  • Worked on Scripting Hadoop package installation and configuration to support fully-automated deployments
  • Supported Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts and HBase ingest required.
  • Defined job flows and managed and reviewed Hadoop and HBase log files.
  • Ran Hadoop streaming jobs to process terabytes of text data.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Loaded data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive QL scripts.
  • Managed data coming from different sources.
  • Created Hive tables, loaded with data and wrote hive queries which will run internally in MapReduce way.
  • Built and configured log data loading into HDFS using Flume.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Managed several Hadoop clusters in production, development, Disaster Recovery environments.
  • Trouble shot many cloud related issues such as Data Node down, Network failure and data block missing.
  • Managed cluster coordination services through Zoo Keeper.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Worked on System/cluster configuration and health check-up.
  • Monitored and managed the Hadoop cluster through Ambari.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Resolved tickets submitted by users, troubleshot the error documenting and resolved the errors.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Puppet, Zookeeper, HBase, Flume, Ganglia, Sqoop, Linux, CentOS, Ambari.

Confidential, New York, New York

Selenium Tester


  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system.
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

Hire Now