Hadoop Administrator Resume
Mclean, VA
SUMMARY
- Having 8+ plus years of IT industrial experience in Administering Linux, Database management, developing Map - reduce applications, designing, building and administering large scale Hadoop production Clusters
- 3 years of experience in big data technologies: Hadoop HDFS, Map-reduce, Tez, Pig,Yarn, Hive, Spark, Oozie, Flume, kafka, Sqoop, Zookeeper, And NoSQL: Cassandra and Hbase.
- Experience in deploying and managing teh multi-node development, testing and production Hadoop cluster wif different Hadoop components (HIVE, Spark, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Hortonworks Ambari.
- Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Experience in administering teh Linux systems to deploy Hadoop cluster and monitoring teh cluster using Ambari Metrics.
- Experience in benchmarking, performing backup and disaster recovery of Namenode metadata and important sensitive data residing on cluster.
- Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
- Strong knowledge in configuring Namenode High Availability and Namenode Federation.
- Familiar wif writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
- Familiar wif importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
- Experience in using Flume to stream data into HDFS - from various sources.
- Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2.
- Experience in installing and administering PXE Server wif kick start, setting up FTP, DHCP, DNS servers and Logical Volume Management.
- Experience in configuring and managing storage devices NAS (file level access - NFS) and SAN (block level access-iSCSI)
- Experience in Storage management including JBOD, RAID Levels 1 5 6 10, Logical Volumes, Volume Groups and Partitioning
- Exposure to Maven/Ant, GIT along wif Shell Scripting for Build & Deployment Process.
- Experience in maintain and distribute teh configuration files using PUPPET
- Experience in understanding teh security requirements for Hadoop and integrating wif Kerberos autantication infrastructure- KDC server setup, crating realm /domain, managing principals, generation key tab file each service and managing key tab using key tab tools.
- Experience in handling multiple relational databases: MySQL, SQL Server.
- Familiar wif Agile Methodology (SCRUM) and Software Testing.
- TEMPEffective problem-solving skills and outstanding interpersonal skills. Ability to work independently as well as wifin a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIEZOOKEEPER
NoSQL Database: Hbase, Cassandra
Security: Kerberos
Database: MySQL, SQL Server
Cluster management Tools: Cloudera Manager, Ambari
OS: LINUX (Centos, RHEL), windows, mac
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Administrator
Responsibilities:
- Experienced in Build and maintain teh >14PB Production environment.
- Installed and maintained 300+ node Hadoop clusters using Hortonworks Hadoop (HDP 2.2, HDP 2.3, HDP 2.5 and HDP 2.6)
- Monitored workload, job performance and capacity planning using Ambari.
- Performed both major and minor upgrades to teh existing Ambari Hadoop cluster.
- Integrated Hadoop wif Active Directory and enabled Kerberos, Knox for Autantication.
- Applied patches and bug fixes on Hadoop Clusters.
- Performance tuned and optimized Hadoop clusters to achieve high performance.
- Implemented capacity scheduler on teh yarn to share teh resources of teh cluster for teh map reduces/tez jobs given by teh users.
- Monitoring Hadoop Clusters using Ambari UI
- Expertise in implementation and designing of disaster recovery plan for Hadoop Cluster.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked on Integration of Hiveserver2 wif Tableau AND Informatica.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Involved in business requirements gathering and analysis of business use cases.
- Understanding teh existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MR, HIVE, SQOOP.
- Add new hosts and Balanced teh data on teh new data nodes by running teh HDFS Balancer.
- Running shell scripts through cron job to monitor and alert admins about teh bad jobs.
- Add non ambari managed ETL hosts to teh hadoop environment and deploy teh configs using PUPPET.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Imported logs from web servers wif Flume to ingest teh data into HDFS.
- Experienced in deploying and maintaining teh ambari views.
- Experienced in fine tune Hbase service and jobs fine tuening.
- Fine tuning hive jobs for optimized performance
- Fine tuning Hive jobs for better performance.
- Involved in extracting teh data from various sources into Hadoop HDFS for processing.
- Fine tuning teh Spark jobs for better utilization of teh resources.
- TEMPEffectively used Sqoop to transfer data between databases and HDFS.
- Teh Hive tables created as per requirement were internal or external tables defined wif appropriate static and dynamic partitions, intended for efficiency.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
Environment: HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIE, ZOOKEEPER, HBASE, Manager
Confidential, McLean, VA
Hadoop Administrator
Responsibilities:
- Installed and maintained 100+ node Hadoop clusters using HDP
- Performed both major and minor upgrades to teh existing Ambari Hadoop cluster.
- Applied patches and bug fixes on Hadoop Clusters.
- Performance tuned and optimized Hadoop clusters to achieve high performance.
- Monitoring Hadoop Clusters using Ambari UI
- Expertise in implementation and designing of disaster recovery plan for Hadoop Cluster.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Involved in business requirements gathering and analysis of business use cases.
- Understanding teh existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MR, HIVE, SQOOP.
- Add new hosts and Balanced teh data on teh new data nodes by running teh HDFS Balancer.
- Running shell scripts through cron job to monitor and alert admins about teh bad jobs.
- Worked wif developers to fine tuning Hive jobs for better performance.
- Involved in extracting teh data from various sources into Hadoop HDFS for processing.
- TEMPEffectively used Sqoop to transfer data between databases and HDFS.
- Teh Hive tables created as per requirement were internal or external tables defined wif appropriate static and dynamic partitions, intended for efficiency.
Environment: Hive, HBase, Map Reduce, Oozie, Sqoop, MySQL, PL/SQL, Linux, HDP.
Confidential, San Jose, CA
Hadoop Administrator
Responsibilities:
- Involved in start to end process of hadoop cluster setup where in installation, configuration and monitoring teh Hadoop Cluster.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon teh job requirement
- Importing and exporting data into HDFS using Sqoop.
- Experienced in define being job flows wif Oozie.
- Loading log data directly into HDFS using Flume.
- Experienced in managing and reviewing Hadoop log files.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Installation and configuration of Sqoop and Flume, Hbase
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- As a admin followed standard Back up policies to make sure teh high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented teh systems processes and procedures for future references.
- Worked wif systems engineering team to plan and deploy new hadoop environments and expand existing hadoop clusters.
- Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: HDFS, Pig, Hive, HBase, Sqoop, Spark, Oozie, Sqoop, flume, Kafka, AWS, Linux Shell Scripting.
Confidential, Salt Lake City
Linux Admin/ Hadoop Admin
Responsibilities:
- Worked wif teh Linux administration team to prepare and configure teh systems to support Hadoop deployment.
- Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
- Implemented autantication service using Kerberos autantication protocol.
- Created volume groups, logical volumes and partitions on teh Linux servers and mounted file systems on teh created partitions.
- Master nodes disks are configured wif RAID 1+0
- Performed benchmarking on teh Hadoop cluster using different benchmarking mechanisms.
- Tuned teh cluster by Commissioning and decommissioning teh Data Nodes.
- Upgraded teh Hadoop cluster.
- Implemented Fair scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
- Deployed high availability on teh Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Configured Ganglia which include installing GMOND and GMETAD daemons which collects all teh metrics running on teh distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Implemented Kerberos for autanticating all teh services in Hadoop Cluster.
- Deployed Network file system for Name Node Metadata backup.
- Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
- Designed and allocated HDFS quotas for multiple groups.
- Configured and deployed hive metastore using MySQL and thrift server.
- Used hive schema to create relations in pig using Hcatalog.
- Development of Pig scripts for handling teh raw data for analysis.
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Deployed and configured flume agents to stream log events into HDFS for analysis.
- Performed deploying yarn, which facilitate multiple applications to run on teh cluster.
- Configured Oozie for workflow automation and coordination.
- Custom monitoring scripts for Nagios to monitor teh daemons and teh cluster status.
- Custom shell scripts for automating redundant tasks on teh cluster.
- Worked wif BI teams in generating teh reports and designing ETL workflows on Pentaho.
Environment: LINUX, HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIE, ZOOKEEPER
Confidential
Linux System Administrator
Responsibilities:
- Day-to- day - user access, permissions, Installing and Maintaining Linux Servers
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot
- Created groups, added Users ID to a group as a primary or secondary group, removing Users ID from a group as well as adding users in Sudoers file
- Monitoring teh System activity, Performance, Resource utilization.
- Use RPM to install, update, verify, query and erase packages from linux Servers
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Worked on mounting teh file-systems using AutoFS and configuring fstab file
- Performed RPM and YUM package installations, patch and other server management.
- Performed scheduled backup and necessary restoration.
- Configured NFS
- Developed Shell Scripts for automation of daily tasks
- Setting up cron schedules for backups and monitoring processes
- Configured Domain Name System (DNS) for hostname to IP resolution
- Troubleshooting and fixing teh issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours
Environment: LINUX (Centos/RHEL)