We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

St Louis, MO


  • Around 8 years of experience in IT with over 4 years of hands - on experience as Hadoop Administrator.
  • Hands on experience in deploying and managing multi-node development, testing and production of Hadoop Cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
  • Hand on experience in Big Data Technologies/Framework like Hadoop, HDFS, YARN, MapReduce, HBase, Hive, Pig, Sqoop, NoSQL, Flume, Oozie.
  • Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.
  • Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
  • Performed administrative tasks on Hadoop Clusters using Cloudera/Hortonworks.
  • Hands on experience in Hadoop Clusters using Hortonworks (HDP), Cloudera (CDH3, CDH4), oracle big data and Yarn distributions platforms.
  • Experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experience in administering Tableau and Green Plum databases instances in various environments.
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
  • Hands on experience in Hadoop Clusters using Hortonworks (HDP), Cloudera (CDH3, CDH4), oracle big data and Yarn distributions platforms.
  • Experience in administering Tableau and Green Plum databases instances in various environments.
  • Good experience in creating various database objects like tables, stored procedures, functions, and triggers using SQL, PL/SQL, and DB2.
  • Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
  • Experience in managing the Hadoop MapR infrastructure with MCS.
  • Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts
  • Worked on NoSQL databases including Hbase, Cassandra and MongoDB.
  • Designing and implementing security for Hadoop cluster with Kerberos secure authentication.
  • Hands on experience on Nagios and Ganglia tool for cluster monitoring system.
  • Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
  • Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.


Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Horton work, Ambari, Knox, Phoniex, Impala, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), chef, Nagios, NiFi.

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Servers: Web logic server, WebSphere and Jboss.

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.

Database: MySQL, NoSQL, Couch base, InfluxDB, Green Plum Teradata, HBase, MongoDB, Cassandra, Oracle.

Processes: Incident Management, Release Management, Change Management.


Confidential, St. Louis, MO

Hadoop Administrator


  • Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2. Managing and scheduling Jobs on Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions.
  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Used key tabs to authenticate various remote systems on a kerberized environment.
  • Used Borland Star Team to check in code to Development Environment and to avoid merge issues.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
  • Knowledge on supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud.
  • Extracted files from Cassandra Database through Sqoop and placed in HDFS and processed
  • Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics and Implemented Cassandra connector for Spark in Java.
  • Azure Cloud Infrastructure design and implementation utilizing ARM templates.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations.
  • Maintaining the Operations, installations, configuration of 250+ node clusters with MapR distribution.
  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
  • Secure Hadoop clusters and CDH applications for user authentication and authorization using Kerberos deployment.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration, and installation on Kafka.
  • Installed and configured Hadoop, MapReduce, HDFS developed multiple MapReduce jobs in java for data cleaning and Upgradation Cloudera from 5.5 to 6.0 version.
  • Automate repetitive tasks, deploy critical applications and manage change on several servers using Puppet.
  • Troubleshot and rectified platform and network issues using Splunk / Wireshark.
  • Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS).
  • Created tables, secondary indexes, join indexes and views in Teradata development Environment for testing.
  • Developed several productivity improvements using Jenkins automation for Checking all the environments
  • Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution
  • Worked on Apache Ranger for HDFS, HBase, Hive access and permissions to the users through active directory.
  • Establishing connectivity with various data sources like Oracle, Green plum, sqlservers, Sybase.
  • Experience in managing the Hadoop MapR infrastructure with MCS.
  • Working as Hadoop Administrator clusters with Hortonworks Distribution.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Implemented complex business logics: security and position linking, various security relationship / linking logic, best pricing logic, exception raising and override logic and smart dummy security creation and duplicate/expired securities cleanup automation.
  • Impala queries to determine if free shipping should be offered so customers would complete their purchases.
  • Monitor and manage 200 nodes Hadoop Cluster in production with 24x7 on-call support.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Provided support on Kerberos related issues and Coordinated Hadoop installations/upgrades and patch installations in the environment.

E nvironment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Spark, Azure, AWS, Impala, Ambari 2.0, Linux Cent OS, HBase, Splunk, GreenPlum, AWS, S3, EC2 DevOps, MongoDB, MapR, Hortonworks 2.3, Teradata, Puppet, Ambari, Kafka Cassandra, Ganglia and Cloudera Mana, Agile/scrum.

Confidential, San Francisco, CA

Hadoop/Cloudera Admin


  • The project plan is to build and setup Big data environment and support operations. Effectively manage and monitor the Hadoop cluster (152 nodes) through Cloudera Manager.
  • Worked on a live Big Data Hadoop production environment with 200 nodes.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Designed Azure storage for the Kafka topics and merge and loaded into couchbase with constant query components.
  • Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
  • Creating and managing Azure Web-Apps and providing the access permission to Azure AD users
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Used NoSQL database with Cassandra, MongoDB, Monod and Designed table architecture and developed DAO layer.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
  • Automated the configuration management for several servers using Chef and Puppet.
  • Installation and configuration of Hortonworks distribution HDP 2.2.x/2.3.x with Ambari.
  • Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons. deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
  • Experience in methodologies such as Agile, Scrum, and Test driven development.
  • Worked with Couchbase support team for sizing the Couchbase cluster.
  • Implemented MapR token based security.
  • Creating principles for new users in the Kerberos and Implemented and maintained Kerberos cluster and integrated with the Active Directories (AD).
  • Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Integrated Oozie with the rest of the Hadoop Data stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Involved in migrating java test framework to python flask.
  • Script the requirements using BigSQL and provide time statistics of running jobs.
  • Responsible for executing pig from Oozie to read HBase table in kerberized cluster.
  • Worked with operational analytics and log management using ELK and Splunk.
  • Moving the data from Oracle, Teradata, MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS. created a separate Couchbase Database cluster to store flight control data log information.
  • Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
  • Good knowledge in adding security to the cluster using Kerberos and Sentry.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Azure Cloud Infrastructure design and implementation utilizing ARM templates
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Analyze escalated incidences within the Azure SQL database.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
  • Experience in managing and analyzing Hadoop log files to look troubleshooting issues.
  • Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.

Environment: Hadoop, YARN, Hive, HBase, Spark, BigSQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, Java, Jenkins, GitHub, MySQL, Hortonwork, MapR NoSQL, MongoDB, Java, Shell Script, python.


Hadoop /Linux Administrator


  • Installed/Configured/Maintained Apache Hadoop and Cloudera Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Executed the Phoenix Project (built new functioning systems with parts derived from scrapped systems).
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Created solutions for Phoenix problems, documented processes and participated in ongoing processes for improvement.
  • Managing and scheduling Jobs on a Hadoop cluster. involved Configuration and installation of Couchbase 2.5.1 NoSQL instances on AWS EC2 instances (Amazon Web Services).
  • Installed and configured Hadoop cluster in Development, Testing and Production environments.
  • Performed both major and minor upgrades to the existing CDH cluster.
  • Install, configure, and operate Hadoop stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
  • Migrated SQL Server 2008 database to Windows Azure SQL Database and updating the Connection Strings based on this
  • Implemented Cassandra connector for Spark 1.6.1 in Java.
  • Synchronizing the hive tables with BigSql using Hcatalog and querying the tables using Data Server Manager (DSM).
  • Installed and configured Flume agents with well-defined sources, channels and sinks.
  • Configured safety valve to create active directory filters to sync the LDAP directory for Hue.
  • Developed scripts to delete the empty Hive tables existing in the Hadoop file system.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SQOOP and Pig Latin.
  • Implemented Name Node backup using NFS. This was done for High availability.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Wrote UNIX shell scripts for automated installations, to extract logs using Bash, Perl, Ksh, Python.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Implemented FIFO schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Involved in Data model sessions to develop models for HIVE tables.

Environment: Apache Hadoop, CDH4, Hive, Hue, Pig, Hbase, MapReduce, Sqoop, Redhat, Centos and Flume, MySQL, GreenPlum, NoSQL, MongoDB, Java, Linux.

Confidential, Los Angeles, CA

Hadoop Admin


  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments. configured Hadoop, Hive and Pig on Amazon EC2 servers.
  • Managed 350+ Nodes HDP 2.2.4 cluster with 4 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5
  • Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
  • Configured MySQL Database to store Hive metadata.
  • Developed bash scripts to bring the log files from ftp server and then processing it to load into hive tables.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Worked with Linux systems and MySQL database on a regular basis.
  • Used Java to develop User Defined Functions (UDF) for Pig Scripts.
  • Supported Map Reduce Programs those ran on the cluster.
  • Install/configure Oracle 10g Agent to report Oracle 11g databases in Oracle Enterprise Manager/Grid Control used by INFO1.
  • Involved in loading data from UNIX file system to HDFS.
  • Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
  • Worked with Systems Analyst and business users to understand requirements.

Environment: HDFS, Java, Hive, Pig, sentry, Kerberos, LDAP, Gird, YARN, Cloudera Manager, and Ambari.


Linux/Systems Administrator


  • Installation, Configuration, Upgradation and administration of Windows, Sun Solaris, RedHat Linux and Solaris.
  • Performed various configurations which include networking and IP Tables, resolving hostnames, SSH key less login.
  • Managed CRONTAB jobs, batch processing and job scheduling.
  • Worked on Linux Kick-start OS integration, DDNS, DHCP, SMTP, Samba, NFS, FTP, SSH, and LDAP integration.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
  • Designed, developed and validated User Interface using HTML, Java Script, XML and CSS.
  • Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
  • Automate administration tasks through use of scripting and Job Scheduling using CRON.
  • Setting up alert and level for MySQL (uptime, Users, Replication information, Alert based on different query).
  • Estimate MySQL database capacities; develop methods for monitoring database capacity and usage.
  • Develop and optimize physical design of MySQL database systems.
  • Support in development and testing environment to measure the performance before deploying to the production.



Linux Administrator


  • Storage management using JBOD, RAID Levels 0, 1, Logical Volumes, Volume Groups and Partitioning.
  • Analyzed the Performance of the Linux System to identify Memory, disk I/O and network problem.
  • Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
  • Administration of RedHat4.x, 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
  • Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts, Xen servers.
  • Logs & Resource Monitoring via Script on Linux Server.
  • Maintained and monitored Local area network and Hardware Support and Virtualization on RHEL server (Through Xen & KVM Server).
  • Administration of VMware virtual Linux server and resizing of LVM disk volumes as required.
  • Respond to all Linux systems problems 24x7 as a part of on call rotation and resolving them on a timely basis.

Environment: Linux, TCP/IP, LVM, RAID, Networking, Security, user management.

Hire Now