We provide IT Staff Augmentation Services!

Sr. Bigdata Administrator Resume

Franklin, TN


  • Around 8 years of IT experience including 3 + years in Big Data Technologies.
  • Extensive experience in Designing, Installing, Configuring and Tuning Hadoop core and ecosystem components
  • Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Yarn, Zookeeper, Spark and Oozie
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
  • Experience in installation, configuration, support and management of a Hadoop Cluster.
  • Experience in task automation using Oozie, cluster co - ordination through Pentaho and Map Reduce job scheduling using Fair Scheduler.
  • Worked on both Hadoop distributions: Cloudera and Hortonworks
  • Experience in performing minor and major upgrades and applying patches for Ambari and Cloudera Clusters
  • Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring.
  • Capability to configure scalable infrastructures for HA (High Availability) and disaster recovery
  • Experience in analyzing data using HiveQL, Pig Latin.
  • Experience in developing and scheduling ETL workflows, data scrubbing and processing data in Hadoop using Oozie.
  • Experience in balancing the cluster after adding/removing nodes or major data cleanup.
  • Experience in Setting up Data Ingestion tools like Flume, Sqoop, and NDM
  • General Linux system administration including design, configuration, installs, automation.
  • Experience on Oracle, Hadoop, MongoDB, AWS Cloud, Greenplum.
  • Experience in configuring Zookeeper to coordinate the servers in clusters.
  • Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata
  • Experience in setting up Name Node high availability for major production cluster
  • Experience in designing Automatic failover control using zookeeper and quorum journal node
  • Experience in creating, building and managing public and private cloud Infrastructure
  • Experience in working with different file formats and compression techniques in Hadoop
  • Experience in analyzing existing Hadoop cluster, Understanding the performance bottlenecks and providing the performance tuning solutions accordingly.
  • Extensive experience in installation, configuration, maintenance, design, development, implementation, and support on Linux.
  • Experience in Ansible and related tools for configuration management.
  • Experience in working large environments and leading the infrastructure support and operations.
  • Migrating applications from existing systems like MySQL, oracle, db2 and Teradata to Hadoop.
  • Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.


Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, HDP 2.4, HDP 2.5.

Monitoring Tools: Ambari, Cloudera manager, Ganglia, Nagios, Cloud watch.

Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP

Programming Languages: C, Java, SQL, and PL/SQL.

Front End Technologies: HTML, XHTML, XML.

Application Servers: Apache Tomcat, WebLogic Server, Web sphere

Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.

NoSQL Databases: HBase, Cassandra, MongoDB

Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.


Security: Kerberos, Ranger, Rangerkms, Knox.


Sr. Bigdata Administrator



  • Installed, configured, upgraded, and applied patches and bug fixes for Prod, Test and Dev Servers.
  • Installed/Configured/Maintained Hadoop clusters in dev/test/UAT/Prod environments.
  • Install, configure and administer Hdfs, Hive, Ranger, Pig, HBase, Oozie, Sqoop, Spark and Yarn.
  • Worked on the installation and configuration of Hadoop HA Cluster.
  • Involved in capacity planning and design of Hadoop clusters.
  • Setting up alerts in Ambari for the monitoring of Hadoop Clusters.
  • Setting up security authentication using Kerberos security.
  • Creating and dropping of users, granting and revoking permissions to users/Policies as and when required using Ranger.
  • Commission and decommission the data nodes from cluster.
  • Write and modify UNIX shell scripts to manage HDP environments.
  • Installed and configured Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
  • Administer, configure and performance tuning for Spark applications
  • Create directories and setup appropriate permissions for different applications.
  • Backup tables in Hbase to Hdfs dir.’s using export utility.
  • Involved in planning and implementation of Hadoop cluster Upgrade.
  • Installation, Configuration and administration of HDP on Red Hat Enterprise Linux 6.6
  • Used Sqoop to import data into HDFS from Oracle database.
  • Detailed analysis of system and application architecture components per functional requirements.
  • Review and monitor system and instance resources to insure continuous operations (i.e., database storage, memory, CPU, network usage, and I/O contention)
  • On call support for 24x7 Production job failures and resolve the issue in timely manner.
  • Developed UNIX scripts for scheduling the delta loads and master loads using Auto sys Scheduler.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment.
  • Troubleshoots with problems regarding the databases, applications and development tools.

Technical Environment: Hortonworks 2.4/2.5, Ambari, HDP, Ranger, HUE, Sqoop, Kerberos, Hive, Informatica 9.6.1, Oracle 11g/10g, DB2, LINUX, AWS, UNIX - AIX, Autosys.

Bigdata Administrator

Confidential, Franklin, TN


  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
  • Resolving tickets submitted, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Adding new Data Nodes when needed and running balancer.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
  • Done major and minor upgrades to the Hadoop cluster.
  • Done stress and performance testing, benchmark for the cluster.
  • Working closely with both internal and external cyber security customers.
  • Research effort to tightly integrate Hadoop and HPC systems.
  • Compared Hadoop to commercial big-data appliances from Netezza, XtremeData and LexisNexis. Published and presented results.
  • Research effort to tightly integrate Hadoop and HPC systems.
  • Deployed, and administered 300+ nodes Hadoop cluster. Administered two bigger clusters.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Deployed, and administered Hadoop clusters.
  • Compared Hadoop to commercial big-data appliances from Netezza, XtremeData, and LexisNexis. Published and presented results.
  • Worked on developing Linux scripts for Job Automation.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Developing machine-learning capability via Apache Mahout.

Technical Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, Tomcat 6.

Hadoop Administrator

Confidential, Palo Alto, California


  • Involved in Hadoop Implementation project specializing in but not limited to Hadoop Cluster management, write MapReduce Programs, Hive Queries (HQL) and used Flume to analyze the log files.
  • Installation, Configuration and Management of Bigdata 2.x cluster (Hortonworks Data Platform1.3/2.2).
  • Involved and played a key role in the development of an ingestion framework based on Oozie and another framework using Java.
  • Developed Data Quality checks to match the ingested data with source in RDBMS using Hive.
  • Did a POC for data ingestion using ETL tools including Talend, DataStage and ToadDataPoint.
  • Created some custom component detection and monitoring using Zookeeper APIs.
  • Supported Bigdata 1.x cluster (HDP 1.3) with issues related to jobs and cluster-utilization.
  • Deep understanding and related experience with Hadoop, HBase, Hive, and YARN/Map-Reduce.
  • Enabled High-Availability for NameNode and setup fencing mechanism for split-brain scenario.
  • Enabled High-Availability for Resource Manager and several ecosystem components including Hiveserver2, Hive Metastore, and HBase.
  • Configured YARN queues - based on Capacity Scheduler for resource management.
  • Configured Node-Labels for YARN to isolate resources at a node level - separating the nodes specific for YARN applications and HBASE separately.
  • Configured CGroups to collect CPU utilization stats.
  • Setup Bucket Cache on HBASE specific slave nodes for improved performance.
  • Setup rack-awareness for the Bigdata cluster and setup rack-topology script for improved fault tolerance.
  • Setup HDFS ACLs to restrict/enable access to HDFS data.
  • Performance evaluation for Hadoop/YARN - TestDFSIO, Terasort.
  • Performance evaluation of Hive 14 with Tez using HiveTestBench.
  • Configured Talend, Data Stage and Toad Data Point for ETL activities on Hadoop/Hive databases.
  • Backing up HBase data and HDFS using HDFS Snapshots and evaluated the performance overhead.
  • Created several recipes for automation of configuration parameters/scripts using Chef.
  • Management and configured retention period of log files for all the services across the cluster.
  • Involved in development of ETL processes with Hadoop, YARN and Hive.
  • Developed Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.
  • Coordinate with Operation/L2 team for knowledge transfer.
  • Setting up quotas and replication factor for user/group directories to keep the disk usage under control using HDFS quotas.

Technical Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0,Pig 0.11.1,Oozie 3.3.0, H base 0.94.11,Sqoop1.4.4, Flume 1.4.0, Java, SQL, PL/SQL, Oracle 10g, Eclipse HTTP, Jama.

Hadoop Administrator

Confidential, Santa Clara, CA


  • End to End Migration of Hadoop 1.x to Hadoop 2.x.
  • Perform the installation and configuration of a Hadoop cluster using Ambari 2.0 (Hortonworks HDP 2.2)
  • Monitor health of Hadoop ecosystem (HDFS, YARN, HIVE, HBASE, Sqoop, Hue and Slider)
  • Monitor disk, Memory, Heap, CPU utilization on all Master and Slave machines and took necessary measures to keep the cluster up and running on 24/7 basis.
  • Configured Capacity Scheduler to provide service-level agreements for multiple users of a cluster.
  • Experience in performing benchmark tests for Hive, HBASE, HDFS, NameNode benchmark, MapReduce benchmark
  • Created Hive internal and external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Worked on installing cluster, commissioning & decommissioning of data nodes.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for NameNode, Resource Manager, HiveServer2, and HBase Master.
  • Implemented Kerberos for Hadoop Security.
  • Creating and deploying Kerberos key tab Files, creating principals, realm.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Creating and managing HBase clusters dynamically using Slider.
  • Upgrading Apache Ambari from Version 1.7 to 2.0.
  • Involved in collecting metrics for Hadoop clusters using Ganglia.
  • Interacted with developers when we had to deploy new jobs, Jobs throwing exceptions, and Data related issues.

Technical Environment: MapReduce, HDFS, Pig, Hive, HBase, Flume, Slider, Sqoop, Oozie, Nagios Zookeeper, Ganglia and Hortonworks Ambari.

Linux system administrator



  • Designing and configuring of Red Hat Enterprise Linux 5/4/3 User Administration, management and archiving.
  • Expertise in Performance Monitoring and Performance Tuning using Top, prstat, SAR, vmstat, ps, iostat etc.
  • Expertise in package management involves creating, installing and configuring of packages using Red Hat RPM.
  • Used RPM in several Linux distributions such as Red Hat Enterprise Linux, SUSE Linux Enterprises and Fedora.
  • Extensive experience in building servers using Jumpstart and Kick-start Process.
  • Worked in configuration of Redhat Satellite server and Managed, configured and Maintained customer entitlements including upgrading and patching of Linux servers
  • Used RHN Satellite exporter command to channel all the packages and deploying rpm packages.
  • Configuring NFS, NIS, DNS, Auto Mount & Disk Space Management on SUN Servers.
  • Experience in Configuring and Managing SAN Disks, Disk Mirrors & RAID 0, 1 & 5 Levels.
  • Maintained and modified hardware and software components, content and documentation.
  • Provided guidance for equipment checks and supported processing of security requests.
  • Created/expand file systems in Linux (volume groups and Logical Volumes) and Solaris using volume managers
  • Experience in backup and restore operations with Linux Inbuilt utilities.
  • Installed configured and troubleshoot RedHat Linux 6.0 issues.
  • Configured Users & Security administration, backup, recovery and maintenance of various activities.
  • Advance level of experience in researching OS Issues, Applying Patches and opening the vendor Tickets.
  • Maintained and documented all the errors and logs that were new to the environment shared it with the team.
  • Installed and monitored VMware Virtual environments withESX4ESX 3.x, ESXi servers & Virtual Center 2.X.
  • SENDMAIL configurations and administrations, testing the mail servers.
  • Experience in adding and configuring devices like hard disks and backup devices etc.
  • Monitor daily Net Backup activity and reports to proactively avoid issues.
  • Improve automation and productivity of backups through scripting enhancements.
  • Modify and Optimize backup schedules.
  • Used VMware for testing various applications on different operating system.
  • Provided 24x7 supports on pager rotation basis.

Environment: RHEL 5/4/3, SUSE, LVM, VMware, RAID

Hire Now