We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

San Ramon, CA


  • Over 7+ years of experience in IT field including 3 years of experience in Hadoop Administration in diverse industries which includes hands on experience in Big data ecosystem related technologies.
  • Extensive knowledge and experience in Big Data with Map - Reduce, HDFS, Hive, Pig, Impala, Sentry and Sqoop.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Resource Manager, Name Node, Data Node, and MapReduce(MRV1 and YARN) concepts.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Hands on experience in installation, configuration, management and development of big data solutions using MapR , Azure , Cloudera (CDH4, CDH5) and Hortonworks distributions.
  • Good experience with design, management, configuration and troubleshooting of distributed production environments based on Apache Hadoop/ HBase etc
  • Experience in building new OpenStack Deployment through Puppet and managing them in production environment.
  • Working experience on designing and implementing complete end to end Hadoop Infrastructure.
  • Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia .
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • In-depth knowledge of modifications required in static IP (interfaces), hosts, setting up password-less SSH and Hadoop configuration for Cluster setup and maintenance.
  • Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • In which my responsibilities are collecting information from, and configuring, network devices, such as servers, printers, hubs, switches, and routers on an Internet Protocol (IP) network.
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain.
  • Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Redhat Linux and CentOS .
  • Good understanding of HDFS Designs , Daemons and HDFS high availability ( HA )
  • Implementing a Continuous Integrations and Continuous Delivery framework using Jenkins, Puppet, Maven & Nexus in Linux environment. Integration of Maven/Nexus , Jenkins , Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry .
  • Extensive experience in data analysis using tools like Syncsort and HZ along with Shell Scripting and UNIX.


Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4

Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP

Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB


Monitoring Tools: Cloudera Manager,Solr, Ambari, Nagios, Ganglia

Application Servers: Apache Tomcat, Weblogic Server, Websphere

Security: Kerberos

Analytic Tools: ElasticSearch-Logstash-Kibana


Hadoop Administrator

Confidential, San Ramon, CA


  • Worked on the CIP Rating (4-Clusters, Data Aquisation, Anomaly detection, Rating & ML).
  • Working on a project called Datalake project which is a multitenant platform for Analytics, Different small businesses get the data over here and work on different use cases.
  • Worked on Hadoop Stack, ETL TOOLS like TALEND, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, Install their custom softwares, upgrade hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Datalake, and I also manage clusters for other teams.
  • Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCD, Redhat infrastructure for data ingestion, processing, and storage.
  • Im a mix of Devops and hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
  • Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and Mapreduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
  • Hadoop security setup using MIT Kerberos, AD integration(LDAP) and Sentry authorization.
  • Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
  • Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built .designing cloud-hosted solutions, specific AWS product suite experience.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
  • Implementing a Continuous Delivery framework using Jenkins, Puppet, Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry .
  • Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • I have used Service now and JIIRA to track issues, Mostly Managing and reviewing Log files as a part of administration for troubleshooting purposes, meeting the SLA’s on time.

Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, mySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.

Hadoop Administrator

Confidential, San Ramon, CA


  • I’ve Worked on a live Big Data Hadoop production environment with 300 nodes.
  • Involved in up gradation process of the Hadoop cluster from CDH4 to CDH5.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Installed and configured Flume & Oozie on the Hadoop cluster and Managed, Defined and Scheduled Jobs on a Hadoop cluster.
  • Developed MapR Distribution for Apache Hadoop, which speeds up MapReduce jobs with an optimized shuffle algorithm, direct access to the disk, built-in compression, and code written in Scala.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Adding & installation of new components and removal of them through Cloudera Manager.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Installed, Configured & Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Involved in architecting Hadoop clusters using major Hadoop Distributions - CDH4 and CDH5.
  • Bootstrapping instances using Chef and integrating with auto scaling.
  • Manage the configurations of more than 40 servers using Chef
  • Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Aligning with the systems engineering team to propose and deploy new hardware and software required for Hadoop and to expand existing environments.
  • Working with data delivery teams to setup new Hadoop users.
  • It includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Used Informatica Power Center to create mappings, mapplets, User defined functions, workflows, worklets, sessions and tasks.
  • Developed a framework for the automation testing on the ElasticSearch index Validation, Java, MySQL.
  • Created User defined types to store specialized data structures in Cloudera.
  • Followed standard Back up policies to make sure the high availability of cluster.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Screen Hadoop cluster job performances and capacity planning.
  • Built, Stood up and delivered Hadoop cluster in Pseudo distributed Mode with NameNode, Secondary Name node, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo ( NO SQL Google's Big table) is stood up in Single VM environment.
  • Monitored Hadoop cluster connectivity and security and also involved in management and monitoringHadoop log files.
  • Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.

Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, mySQL, Toad 9.6, Windows NT,Sqoop, Hive, Oozie, Cloudera, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.

Hadoop Administrator

Confidential, Mountain View, CA


  • Worked on Administrating Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
  • Designed and developed Hadoop system to analyze the SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
  • Developed custom writable MapReduce JAVA programs to load web server logs into HBase using flume.
  • Worked on Hadoop CDH upgrade from CDH3 to CDH4
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
  • Developed entire data transfer model using Sqoop framework.
  • Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
  • Configured flume agent with flume syslog source to receive the data from syslog servers.
  • Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
  • Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
  • Installed and managed multiple hadoop clusters - Production, stage, development.
  • Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
  • Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters.
  • Designed the Cluster tests before and after upgrades to validate the cluster status.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.
  • Documented and prepared run books of systems processes and procedures for future references.
  • Performed Benchmarking and performance tuning on the Hadoop infrastructure.
  • Automated data loading between production and Disaster Recovery cluster.
  • Migrated hive schema from production cluster to DR cluster.
  • Worked on Migrating application by doing POC's from relation database systems.
  • Helping users and teams with incidents related to administration and development.
  • Onboarding and training on best practices for new users who are migrated to our clusters.
  • Guide users in development and work with developers closely for preparing a data lake.
  • Migrated data from SQL Server to HBase using Sqoop.
  • Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.
  • Replicated the Jenkins build server to a test VM using Packer, Virtual Box, Vagrant, Chef, Perl brew andServerspec
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
  • Created Hive external tables for loading the parse data using partitions.
  • Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
  • Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
  • Extensive knowledge in troubleshooting code related issues.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Auto Populate Hbase tables with data.
  • Designed and coded application components in an agile environment utilizing test driven development approach.

Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, spark, Splunk, solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, Base, cluster health, monitoring security, Redhat Linux, impala, Cloudera Manager, Hortonworks.

Linux/ Unix System Administrator



  • Day - to-day administration on Sun Solaris, RHEL 4/5 which includes Installation, upgrade & loading patch management & packages.
  • Responsible for monitoring overall project and reporting status to stakeholders.
  • Developed project user guide documents which help in knowledge transfer to new testers and solution repository document which gives quick resolution of any issues occurred in the past thereby reducing the number of invalid defects.
  • Identify repeated issues in production by analyzing production tickets after each release and strengthen the system testing process to arrest those issues moving to production to enhance customer satisfaction
  • Designed and coordinated creation of Manual Test cases according to requirement and executed them to verify the functionality of the application.
  • Manually tested the various navigation steps and basic functionality of the Web based applications.
  • Experience interpreting physical database models and understanding relational database concepts such as indexes, primary and foreign keys, and constraints using Oracle.
  • Writing, optimizing, and troubleshooting dynamically created SQL within procedures
  • Creating database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
  • Responsible for creating virtual environments for the rapid development.
  • Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues User management like adding, modifying, deleting & grouping.
  • Responsible for preventive maintenance of the servers on monthly basis.
  • Configuration of the RAID for the servers. Resource management using the Disk quotas.
  • Responsible for change management release scheduled by service providers.
  • Generating the weekly and monthly reports for the tickets that worked on and sending report to the management.
  • Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in Linux environment.
  • Identifying operational needs of various departments and developing customized software to enhance System's productivity.
  • Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
  • Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
  • Accomplished System/e-mail authentication using LDAP enterprise Database.
  • Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
  • Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers. Monitoring System Metrics and logs for any problems.

Environment: Windows 2008/2007 server, Unix Shell Scripting, SQL Manager Studio, Red Hat Linux, Microsoft SQL Server 2000/2005/2008, MS Access, NoSQL, Linux/Unix, Putty Connection Manager, Putty, SSH.

Hire Now