Hadoop Administrator Resume
Atlanta, GA
SUMMARY
- A qualified Technocrat and a seasoned professional offering 7+ years of IT experience in Administration and Product Development.
- 2.5 years of experience in big data technologies: Hadoop HDFS, Hive, Oozie, Flume, Hcatalog, Sqoop, Zookeeper, NoSQL: Cassandra and Hbase.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloud era, HortonWorks, MapR and Pivotal Distributions.
- Backup configuration and Recovery from a Name Node failure.
- Decommissioning and commissioning the Node on running Hadoop cluster.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Installation and configuration of Sqoop and Flume.
- Experience in deploying and managing the multi - node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME,HCATALOG, ZOOKEEPER) using Cloudera Manager.
- Experience in understanding the security requirements for Hadoop and integrating with
- Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools
- Worked on setting up Name Node high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes
- Experience in Importing and exporting data from different databases like MySQL into HDFS and Hive using Sqoop.
- Worked on streaming the data into HDFS from web servers using flume.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
- Scheduling all hadoop/hive/sqoop/Hbase jobs using Oozie.
- Rack aware configuration for quick availability and processing of data.
- Handsome experience in Linux admin activities.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, Mapreduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zoo Keeper, Cloudera Manager.
Security: Kerberos
Programming Languages: Java, C#, C, SQL, Java Script, HTML
Scripting Languages: Shell Scripting, Puppet
IDE Tools: Eclipse, NetBeans, Visual Studio, Microsoft SQL Server, MS Office
Monitoring Tools: Nagios, Ganglia, Cloudera Manager.
Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8)
Virtualization technologies: VMware vSphere, Citrix XenServer
PROFESSIONAL EXPERIENCE
Confidential - Atlanta, GA
Hadoop Administrator
Responsibilities:
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Installation and configuration of Sqoop and Flume, Hbase
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- As a admin followed standard Back up policies to make sure the high availability of cluster.
- Decommissioning and commissioning the Node on running Hadoop cluster.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Experience in minor upgrades of Hadoop CDH4U2 to CDH4U7.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Effectively used Sqoop to transfer data between databases and HDFS.
- Extending the functionality of Hive and Pig with custom UDF’s and UDAF’s.
- Fine tuning Hive jobs for better performance.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on streaming the data into HDFS from web servers using flume.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
- RegEx, JSON and Avro SerDe’s are being for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
- Implemented PIG UDFS for filtering, loading and storing of data.
Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, ECLIPSE, CLOUDERA MANAGER.
Confidential, Dallas, TX
System Admin/ Hadoop Admin
Responsibilities:
- Responsible for building a cluster for storing 1.5PB Transactional data
- Configured in operating system level includes resolving DNS Resolution, user accounts and file permissions, networking, SSH password less login.
- Created LVM partitions on Linux Servers and mounted file systems on partitions.
- Deployed a Hadoop cluster CDH3u4 and its ecosystem components.
- Used Nagios to monitor the daemons and the cluster status, using custom monitoring scripts.
- Import and export data from RDBMS (Oracle, MySQL) to HDFS using Sqoop.
- Manage the day to day operations of the cluster for backup and support.
- Performed operating system installation, Hadoop version updates using deployment tools like chef, puppet.
- Implemented Kerberos on cluster for authenticating all the services.
- Deployed NFS for Name Node Metadata backup.
- Benchmarking mechanisms like TERASORT, TESTDFSIO.
- Worked on performing minor upgrade from CDH3-u4 to CDH3-u6
- Worked on performing major upgrade from CDH3 to CDH4
- Implemented Fair schedulers to share the resources of the cluster for the map.
- Configured Ganglia including the daemons of GMOND and GMETAD which collects all the metrics running on the distributed cluster and visualize them in real-time dynamic webpages which would further help in debugging and maintenance.
- Implemented Rack Topology on the Hadoop cluster.
- Regular Commissioning and Decommissioning of nodes depending upon the data.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
- Configured flume agents to stream log events into HDFS for analysis.
- Configured Oozie for workflow automation and coordination.
- Custom shell scripts for automating redundant tasks on the cluster.
- Day-to- day - user access, permissions, Installing and Maintaining Linux Servers
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot
- Monitoring the System activity, Performance, Resource utilization.
- Responsible for maintenance Raid-Groups, LUN Assignments as per agreed design documents. Performed all System administrationtasks like cron jobs, installing packages, and patches.
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Performed RPM and YUM package installations, patch and other server management.
- Configured Domain Name System (DNS) for hostname to IP resolution
- Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours
Environment: LINUX, HDFS, MAPREDUCE, KDC, NAGIOS, GANGLIA, OOZIE, SQOOP, CLOUDERA MANAGER
Confidential
Linux System Engineer
Responsibilities:
- Installation and configuration of Linux for new build environment.
- Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
- Configuring NFS, DNS.
- Updating YUM Repository and Red hat Package Manager (RPM).
- Created RPM packages using RPMBUILD, verifying the new build packages and distributing the package.
- Configuring distributed file systems and administering NFS server and NFS clients and editing auto-mounting mapping as per system / user requirements.
- Installation, configuration and maintenance FTP servers, NFS, RPM and Samba.
- Configured SAMBA to get access of Linux shared resources from Windows.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Experience with Linux internals, virtual machines, and open source tools/platforms.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recoverability by implementing system and application level backups.
- Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
- Automate administration tasks through use of scripting and Job Scheduling using CRON.
ENVIRONMENT: LINUX, CITRIX XEN SERVER 5.0
