We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

4.00/5 (Submit Your Rating)

Mclean, VA

SUMMARY

  • Having 8+ plus years of IT industrial experience in Administering Linux, Database management, developing Map - reduce applications, designing, building and administering large scale Hadoop production Clusters
  • 3 years of experience in big data technologies: Hadoop HDFS, Map-reduce, Tez, Pig,Yarn, Hive, Spark, Oozie, Flume, kafka, Sqoop, Zookeeper, And NoSQL: Cassandra and Hbase.
  • Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, Spark, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Hortonworks Ambari.
  • Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Ambari Metrics.
  • Experience in benchmarking, performing backup and disaster recovery of Namenode metadata and important sensitive data residing on cluster.
  • Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
  • Strong knowledge in configuring Namenode High Availability and Namenode Federation.
  • Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
  • Experience in using Flume to stream data into HDFS - from various sources.
  • Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2.
  • Experience in installing and administering PXE Server with kick start, setting up FTP, DHCP, DNS servers and Logical Volume Management.
  • Experience in configuring and managing storage devices NAS (file level access - NFS) and SAN (block level access-iSCSI)
  • Experience in Storage management including JBOD, RAID Levels 1 5 6 10, Logical Volumes, Volume Groups and Partitioning
  • Exposure to Maven/Ant, GIT along with Shell Scripting for Build & Deployment Process.
  • Experience in maintain and distribute the configuration files using PUPPET
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing key tab using key tab tools.
  • Experience in handling multiple relational databases: MySQL, SQL Server.
  • Familiar with Agile Methodology (SCRUM) and Software Testing.
  • Effective problem-solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIEZOOKEEPER

NoSQL Database: Hbase, Cassandra

Security: Kerberos

Database: MySQL, SQL Server

Cluster management Tools: Cloudera Manager, Ambari

OS: LINUX (Centos, RHEL), windows, mac

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Administrator

Responsibilities:

  • Experienced in Build and maintain the >14PB Production environment.
  • Installed and maintained 300+ node Hadoop clusters using Hortonworks Hadoop (HDP 2.2, HDP 2.3, HDP 2.5 and HDP 2.6)
  • Monitored workload, job performance and capacity planning using Ambari.
  • Performed both major and minor upgrades to the existing Ambari Hadoop cluster.
  • Integrated Hadoop with Active Directory and enabled Kerberos, Knox for Authentication.
  • Applied patches and bug fixes on Hadoop Clusters.
  • Performance tuned and optimized Hadoop clusters to achieve high performance.
  • Implemented capacity scheduler on the yarn to share the resources of the cluster for the map reduces/tez jobs given by the users.
  • Monitoring Hadoop Clusters using Ambari UI
  • Expertise in implementation and designing of disaster recovery plan for Hadoop Cluster.
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Worked on Integration of Hiveserver2 with Tableau AND Informatica.
  • Worked on Providing User support and application support on Hadoop Infrastructure.
  • Involved in business requirements gathering and analysis of business use cases.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MR, HIVE, SQOOP.
  • Add new hosts and Balanced the data on the new data nodes by running the HDFS Balancer.
  • Running shell scripts through cron job to monitor and alert admins about the bad jobs.
  • Add non ambari managed ETL hosts to the hadoop environment and deploy the configs using PUPPET.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Experienced in deploying and maintaining the ambari views.
  • Experienced in fine tune Hbase service and jobs fine tuening.
  • Fine tuning hive jobs for optimized performance
  • Fine tuning Hive jobs for better performance.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Fine tuning the Spark jobs for better utilization of the resources.
  • Effectively used Sqoop to transfer data between databases and HDFS.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.

Environment: HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIE, ZOOKEEPER, HBASE, Manager

Confidential, McLean, VA

Hadoop Administrator

Responsibilities:

  • Installed and maintained 100+ node Hadoop clusters using HDP
  • Performed both major and minor upgrades to the existing Ambari Hadoop cluster.
  • Applied patches and bug fixes on Hadoop Clusters.
  • Performance tuned and optimized Hadoop clusters to achieve high performance.
  • Monitoring Hadoop Clusters using Ambari UI
  • Expertise in implementation and designing of disaster recovery plan for Hadoop Cluster.
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Involved in business requirements gathering and analysis of business use cases.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MR, HIVE, SQOOP.
  • Add new hosts and Balanced the data on the new data nodes by running the HDFS Balancer.
  • Running shell scripts through cron job to monitor and alert admins about the bad jobs.
  • Worked with developers to fine tuning Hive jobs for better performance.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Effectively used Sqoop to transfer data between databases and HDFS.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.

Environment: Hive, HBase, Map Reduce, Oozie, Sqoop, MySQL, PL/SQL, Linux, HDP.

Confidential, San Jose, CA

Hadoop Administrator

Responsibilities:

  • Involved in start to end process of hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
  • Importing and exporting data into HDFS using Sqoop.
  • Experienced in define being job flows with Oozie.
  • Loading log data directly into HDFS using Flume.
  • Experienced in managing and reviewing Hadoop log files.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Installation and configuration of Sqoop and Flume, Hbase
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • As a admin followed standard Back up policies to make sure the high availability of cluster.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new hadoop environments and expand existing hadoop clusters.
  • Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: HDFS, Pig, Hive, HBase, Sqoop, Spark, Oozie, Sqoop, flume, Kafka, AWS, Linux Shell Scripting.

Confidential, Salt Lake City

Linux Admin/ Hadoop Admin

Responsibilities:

  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
  • Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
  • Implemented authentication service using Kerberos authentication protocol.
  • Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
  • Master nodes disks are configured with RAID 1+0
  • Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms.
  • Tuned the cluster by Commissioning and decommissioning the Data Nodes.
  • Upgraded the Hadoop cluster.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Deployed high availability on the Hadoop cluster quorum journal nodes.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Configured Ganglia which include installing GMOND and GMETAD daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Deployed Network file system for Name Node Metadata backup.
  • Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured and deployed hive metastore using MySQL and thrift server.
  • Used hive schema to create relations in pig using Hcatalog.
  • Development of Pig scripts for handling the raw data for analysis.
  • Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
  • Deployed and configured flume agents to stream log events into HDFS for analysis.
  • Performed deploying yarn, which facilitate multiple applications to run on the cluster.
  • Configured Oozie for workflow automation and coordination.
  • Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
  • Custom shell scripts for automating redundant tasks on the cluster.
  • Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.

Environment: LINUX, HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIE, ZOOKEEPER

Confidential

Linux System Administrator

Responsibilities:

  • Day-to- day - user access, permissions, Installing and Maintaining Linux Servers
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot
  • Created groups, added Users ID to a group as a primary or secondary group, removing Users ID from a group as well as adding users in Sudoers file
  • Monitoring the System activity, Performance, Resource utilization.
  • Use RPM to install, update, verify, query and erase packages from linux Servers
  • Extensive use of LVM, creating Volume Groups, Logical volumes.
  • Worked on mounting the file-systems using AutoFS and configuring fstab file
  • Performed RPM and YUM package installations, patch and other server management.
  • Performed scheduled backup and necessary restoration.
  • Configured NFS
  • Developed Shell Scripts for automation of daily tasks
  • Setting up cron schedules for backups and monitoring processes
  • Configured Domain Name System (DNS) for hostname to IP resolution
  • Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours

Environment: LINUX (Centos/RHEL)

We'd love your feedback!