We provide IT Staff Augmentation Services!

Sr. Hadoop Administrator Resume

Atlanta, GA


  • A qualified Technocrat and a seasoned professional having over 7 years of IT experience in Database management, Administrating Linux, developing Map - reduce applications, designing, building and administrating large scale Hadoop production Clusters, Java EE applications.
  • 3 years of experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Flume, Hcatalog, Sqoop, Zookeeper, NoSQL.
  • Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Cloudera Manager and HortonworksAmbari.
  • Strong knowledge on Hadoop HDFS architecture and YARN and Map-Reduce framework.
  • Strong knowledge of Apache Hive data warehouse, data cubes, Hive server, partitioning, bucketing, clustering and writing UDFS, UDAFS, and UDTFS in Java for hive.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
  • Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
  • Strong knowledge in configuring Name Node High Availability.
  • Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
  • Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Experience in installing and administering PXE Server with kick start, setting up FTP, DHCP, DNS servers and Logical Volume Management.
  • Experience in configuring and managing storage devices NAS (file level access - NFS) and SAN (block level access-iSCSI)
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing keytab using keytabtools.
  • Experience in handling multiple relational databases: MySQL, SQL Server.
  • Familiar with Agile Methodology (SCRUM) and Software Testing.
  • Solid background in Object-Oriented analysis and design
  • Good knowledge on SQL based Hive authorization, Ranger and Knox.
  • In-depth understanding of Data Structure and Algorithms.
  • Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - VMware, Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack Cloud Platform.
  • Experience in Storage management including JBOD, RAID Levels 1 5 6 10, Logical Volumes, Volume Groups and Partitioning.
  • Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.


Hadoop Ecosystem: HDFS, Mapreduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zoo Keeper, Cloudera Manager, Ambari

Security: Kerberos

Scripting Languages: Shell Scripting, Puppet

Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia

Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8)


Confidential, Atlanta, GA

Sr. Hadoop Administrator


  • Worked with the linux administration team to prepare and configure the systems to support Hadoop deployment.
  • Responsible for building a cluster on HDP 2.1.
  • Performed the pre-installation configuration which includes, networking and iptables, resolving hostnames, user accounts, file permissions and SSH key less login.
  • Managed Nodes on HDP 2. 1 cluster using Ambari 2.0 on Linux RHEL OS 6.6. .
  • Setup Hadoop cluster high availability for NameNode, YARN and MySQL.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Created scripts for users, groups, data distribution, capacity planning and system monitoring.
  • Managed Hadoop operations with multi-node HDFS cluster using Ambari. Monitor cluster with Ganglia.
  • Implemented Fair scheduler on the job tracker to allocate fair amount of resources to small jobs.
  • Used Hive queries to process the date for analysis by imposing read only structure on the stream data.
  • Data transfer between MySQL server to HDFS and vice versa was performed using sqoop.
  • Creating and managing Logical volumes. Using Java JDBC to load data into MySQL
  • Developed MapReduce jobs for the users. Maintain, update and schedule the periodic jobs which range from updates on periodic MapReduce jobs to creating ad-hoc jobs for the business users.
  • Benchmarked Map-reduce with Terasort, and NNbench, MRbench and gridmix2.
  • Used Pig to apply transformations, cleaning and deduplication of data from raw data sources
  • Configured and deployed hive metastore using MySQL and thrift server.
  • Unremitting monitoring and managing the Hadoop cluster through Ganglia, Nagios, Icinga.
  • Ensured data recover by implementing system and application level backups.
  • Experienced in scheduling jobs using Tidal job scheduler.
  • Installing and maintaining cluster security using ranger and creating ranger policies.
  • Implemented Kerberos Security Authentication protocol for existing cluster.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Managing server performances, user creation and granting file access and permissions and RAID.
  • In solving hardware related issues and ticker assessment on daily bases.
  • Automate administration tasks through the use of scripting and Job Scheduling using CRON.


Confidential, Atlanta, GA

Hadoop Administrator


  • Performed Installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with CDH3.
  • Responsible for building a cluster for storing 380TB Transactional data with an inflow of 10GB data every day.
  • Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
  • Upgraded (MAJOR) the Hadoop cluster from cdh3 to cdh4.
  • Deployed HIGH AVAILABILITY on the Hadoop cluster quorum journal nodes.
  • Implemented automatic failover zookeeper and ZOOKEEPER failover controller
  • Tuned the cluster by COMMISIONING and DECOMMISIONING the Data Nodes.
  • Configured Oozie for workflow automation and coordination.
  • Implemented the rack aware topology in Hadoop Cluster.
  • Data Backup in between clusters was performed using distcp.
  • Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms TERASORT, TESTDFSIO, NN Benchmark, and MR Benchmark.
  • Configured GANGLIA which include installing GMOND and GMETAD daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Deployed Network file system for Name Node Metadata backup.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured and deployed hive metastore using MySQL and thrift server.
  • Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
  • Used Flume tomove the data from web logs onto HDFS.
  • Created a local YUM repository for installing and updating packages.
  • Configured flume agents to stream log events into HDFS for analysis.
  • Implemented authentication service using MIT Kerberos authentication protocol.
  • Custom monitoring scripts for NAGIOS to monitor the daemons and the cluster status.
  • Custom shell scripts for automating redundant tasks on the cluster.
  • Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.





  • Installation and configuration of Linux for new build environment.
  • Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
  • Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Experience with Linux internals, virtual machines, and open source tools/platforms.
  • Monitoring the System activity, Performance, Resource utilization.
  • Extensive use of LVM, creating Volume Groups, Logical volumes.
  • Performed RPM and YUM package installations, patch and other server management.
  • Performed scheduled backup and necessary restoration.
  • Configured Domain Name System (DNS) for hostname to IP resolution
  • Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours.
  • Developed an automation script for Replication failover, means if somehow database fails in replication process, this script will bring your system up to date within 5 minutes without manual intervention.
  • Implementing file sharing on the network by configuring NFS on the system to share essential resources.
  • Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.





  • Installing and updating packages using YUM.
  • Installing and maintaining the Linux servers.
  • Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
  • Ensured data recovery by implementing system and application level backups.
  • Performed various configurations which include networking and IPTable, resolving host names and SSH keyless login.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
  • Automate administration tasks through the use of scripting and Job Scheduling using CRON.
  • Monitoring System Metrics and logs for any problems.
  • Running cron-tab to back up data.
  • Adding, removing, or updating user account information, resetting passwords, etc.
  • Using Java Jdbc to load data into MySQL.
  • Maintaining the MySQL server and Authentication to required users for databases.
  • Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.


Hire Now