Hadoop Admin Resume
Salt Lake City, UT
SUMMARY
- 7 years of IT experience with 2.5+ years of experience in administering Hadoop Ecosystem and 3 plus years of experience in Linux administering.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, CLOUDERA, HORTONWORKS, MapR distributions.
- Strong knowledge on Hadoop HDFS architecture and Map - Reduce framework.
- Involved in capacity planning for the Hadoop cluster in production.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
- Developed Map Reduce jobs.
- Experience in using Hive Query Language for data Analytics.
- Experience in performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Architected and implemented automated server provisioning using puppet.
- Experience in performing minor and major upgrades.
- Experience in performing commissioning and decommissioning of data nodes on Hadoop cluster.
- Strong knowledge in configuring Name Node High Availability and Name Node Federation.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
- Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing keytab using keytab tools.
- Implemented KNOX, RANGER in Hadoop cluster.
- Worked with system engineering team to plan and deploy Hadoop hardware and software environments.
- Worked on disaster management with Hadoop cluster.
- Built data transform framework using Map Reduce and Pig.
- Experience in monitoring and managing 80 node Hadoop cluster.
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, OpenStack.
- Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Cloudera Manager and Hortonworks Ambari.
- Supported Map Reduce programs running on the cluster.
- Developed Map Reduce program for parsing and loading into HDFS information.
- Manage and review Hadoop log files.
- Done stress and performance testing, benchmark for the cluster.
- Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.
- Hands-on experience with “Productionalizing” Hadoop applications such as administration, configuration management, debugging and performance tuning.
- Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required.
- Have knowledge on MapR
- Prototyped the proof-of-concept with Hadoop 2.0 (YARN).
TECHNICAL SKILLS
Hadoop Ecosystem Components: HDFS, MapReduce, Pig, Hive, Oozie, Sqoop, Flume, Zookeeper, Splunk
Languages and Technologies: Core Java, C, C++, and Data Structures, algorithms.
Operating Systems: Windows, Linux & Unix.
Scripting Languages: Shell scripting, puppet.
Networking: TCP/IP Protocol, Switches & Routers, OSI Architecture, HTTP, NTP & NFS.
Databases: SQL &, NoSQL - Cassandra.
PROFESSIONAL EXPERIENCE
Confidential, Salt Lake City, UT
Hadoop Admin
Environment: Map Reduce, HDFS, Hive, Pig, Flume, Splunk, Sqoop, UNIX Shell Scripting, Nagios, Kerbero, Zookeeper, Java
Responsibilities:
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Performed various configurations which Includes, networking and iptable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
- Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Implemented authentication and authorization service using Kerberos authentication protocol.
- Performed benchmarking on the Hadoop cluster using different bench marking mechanisms.
- Transfer of data to hdfs using splunk.
- Tuned the cluster by Commissioning and decommissioning the DataNodes.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Upgraded the Hadoop cluster from cdh3 to cdh4.
- Major Upgrade from cdh4 to chd 5.2.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Network file system for NameNode Meta data backup.
- Involved in helping developers team.
- Performed a POC on cluster back using distcp, Cloudera manager BDR and parallel ingestion.
- Configured and deployed hive metastore using MySQL and thrift server.
- Development of Pig scripts for handling the raw data for analysis.
- Cluster co-ordination services through ZooKeeper
- Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
- Deployed and configured flume agents to stream log events into HDFS for analysis.
- Configured Oozie for workflow automation and coordination.
- Developed Map Reduce programs for data analysis and data cleaning
- Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
- Custom shell scripts for automating redundant tasks on the cluster. involved in loading data from UNIX file system to HDFS.
- Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.
Confidential - Philadelphia, PA
Linux Administrator / Hadoop Admin
Environment: Linux, Map Reduce, HDFS, Hive, Pig, Sqoop, Flume, Ganglia, Nagios, Kerberos, Java
Responsibilities:
- Performed both Major and Minor upgrades to the existing cluster and also rolling back to the previous version.
- Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Implemented NFS, NAS and HTTP servers on Linux servers.
- Created a local YUM repository for installing and updating packages.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Used Flume to push large amount of data from different source to HDFS.
- Designed the shell script for backing up of important metadata.
- HA implementation of Name Node to avoid single point of failure.
- Designed the cluster so that only one Secondary name node daemon could be run at any given time.
- Implemented Name node backup using NFS. This was done for High availability.
- Supported Data Analysts in running Map Reduce Programs.
- Worked on analyzing data with Hive and Pig.
- Experienced in using Splunk
- Running cron-tab to back up data.
- Involved in creating UDF in java
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Designed and allocated HDFS quotas for multiple groups.
- Configured IPTABLES rules to allow the connection of application servers to the cluster and also setup NFS exports list and blocked unwanted ports.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources.
Confidential
Linux Administrator
Responsibilities:
- Administration of RHEL4.x, 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts, Xen servers.
- Installing RedHat Linux using kicks tart and applying security polices for hardening the server based on the company policies.
- Installed and verified that all AIX/Linux patches or updates are applied to the servers.
- Installing, administering RedHat using Xen, KVM based hypervisors.
- RPM and YUM package installations, patch and other server management.
- Managing systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
- Worked and performed data-center operations including rack mounting, cabling.
- Installed, configured, and maintained Weblogic10.x and Oracle10g on Solaris & RedHat Linux.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, user and group quota.
- Configuring multipath, adding SAN and creating physical volumes, volume groups, logical volumes.
- Installing and configuring Apache and supporting them on Linux production servers.
- Troubleshooting Linux network, security related issues, capturing packets using tools such as IPtables, firewall, TCP wrappers, NMAP.
- Worked on resolving production issues and documenting Root Cause Analysis and updating the tickets using BMC Remedy.
