Hadoop Administrator Resume
Irvine, CA
SUMMARY
- Around 8 years of administration experience including 4 years on Big Data Technologies like Hadoop, Cassandra, Hive, Sqoop, Flume and Pig.
- Involved in capacity planning for teh Hadoop cluster in production by using different distributions of Apache Hadoop, Cloudera and Hortonworks.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
- Analyzing teh clients existing Hadoop infrastructure and understand teh performance bottlenecks and provide teh performance tuning accordingly.
- Worked with Sqoop and Flume in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
- In depth knowledge of DataStax Cassandra and experience with installing, configuring and monitoring cluster using DatastaxOpsCenter.
- Excellent knowledge on CQL (Cassandra Query Language) for obtaining teh data present in Cassandra by running queries in CQL.
- Used teh DatastaxOpscenterfor maintenance operations and Keyspace and table management.
- Experience in building scalable and fault tolerant Cassandra production database system.
- Superior knowledge on Cassandra architecture with better understanding of read and write processes including SSTable, Mem - table and Commit log.
- Experience in configuring Zookeeper to provide Cluster coordination services.
- Loading logs from multiple sources directly into HDFS using tools like Flume.
- Excellent Working knowledge of CDH4.0 including High Availability, YARN, data streaming, security, application deployment
- Done stress and performance testing, benchmarking for teh cluster.
- Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
- Worked on Disaster Management with Hadoop Cluster.
- Worked with Puppet for application deployment.
- Strong knowledge on HadoopHDFS architecture and Map-Reduce framework.
- Experience in understanding teh security requirements forHadoop and integrating with Kerberos autantication infrastructure- KDC server setup, creating realm /domain, managing.
- Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, Cassandra, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume
Security: Kerberos
UNIX tools: Apache, Yum, RPM
Databases: Oracle, MySQL, SQL Server
NoSQL Databases: Cassandra
Cluster management tools: OpsCenter,Cloudera Manager, Ambari, Ganglia, Nagios
Scripting language: Shell scripting, Puppet,chef
ETL Tools: Informatica, SSIS
BI Reporting Tools: Cognos,OBIEE
Operating Systems: WindsLinux (Redhat, CentOS, Ubuntu), WS
PROFESSIONAL EXPERIENCE
Confidential, Irvine CA
Hadoop Administrator
Responsibilities:
- Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
- Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
- Provided Hadoop, OS, Hardware optimizations.
- Setting up teh machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Used TC for Network Bandwidth Control.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
- Performed upgrade from
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes
- Implemented Fair scheduler on teh job tracker to allocate fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on teh Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop
- Configured ZooKeeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Worked on developing scripts for performing benchmarking with Terasort/Teragen.
- Implemented Kerberos Security Autantication protocol for existing cluster.
- Good experience in troubleshoot production level issues in teh cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Regular Commissioning and Decommissioning of nodes depending upon teh amount of data.
- Monitored and configured a test cluster on Confidential for further testing process.
Confidential, Columbus, OH
Hadoop/Cassandra Administrator
Responsibilities:
- Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design and maintained both Hadoop and Cassandra clusters.
- Installed multi Data center cluster consisting of Cassandra rings.
- Worked on creating teh data model for Cassandra from teh current Oracle data model.
- Evaluated, benchmarked and tuned data model by running endurance tests using JMeter, Cassandra Stress Tool and OpsCenter.
- Administered and maintained multi data Cassandra cluster using OpsCenter.
- Involved closely with developers for choosing right compaction strategies and consistency levels.
- Based on teh use case implemented consistency level for reads and writes.
- Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
- Provided Hadoop, OS, Hardware optimizations.
- Used TC for Network Bandwidth Control.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
- Upgraded teh Hadoop cluster from CDH4 to CDH5.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Implemented Fair scheduler on teh job tracker to allocate fair amount of resources to small jobs.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on teh Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop
- Configured ZooKeeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Implemented Kerberos Security Autantication protocol for existing cluster.
- Backed up data on regular basis to a remote cluster using distcp.
- Regular Commissioning and Decommissioning of nodes depending upon teh amount of data.
- Custom monitoring scripts for Nagios to monitor teh daemons and teh cluster status.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase,Puppet ZooKeeper, CDH4, Cassandra, Nagios, NoSQL and Unix/Linux.
Confidential, Santa Clara CA
Linux & Hadoop Administrator
Responsibilities:
- Installed Namenode, Secondary name node, Job Tracker, Data Node, Task tracker.
- Deployed a Hadoop cluster using cdh3 integrated with Nagios and Ganglia.
- Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of teh Hadoop Cluster.
- Performed Installation and configuration of Hadoop Cluster with Cloudera distribution with CDH3.
- Implemented Commissioning and Decommissioning of data nodes, killing teh unresponsive task tracker and dealing with blacklisted task trackers.
- Implemented Rack Awareness for data locality optimization.
- Dumped teh data from MYSQL database to HDFS and vice-versa using SQOOP.
- Created a local YUM repository for installing and updating packages.
- Dumped teh data from one cluster to other cluster by using DISTCP, and automated teh dumping procedure using shell scripts.
- Implemented Name node backup using NFS.
- Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
- Worked with teh Linux administration team to prepare and configure teh systems to support Hadoop deployment.
- Created volume groups, logical volumes and partitions on teh Linux servers and mounted file systems on teh created partitions.
- Implemented Capacity schedulers on teh Job tracker to share teh resources of teh Cluster for teh Map Reduce jobs given by teh users.
- Worked on importing and exporting Data into HDFS and HIVE using Sqoop.
- Worked on analyzing Data with HIVE and PIG
- Helped in setting up Rack topology in teh cluster.
- Worked on performing minor upgrade from CDH3-u4 to CDH3-u6
- Upgraded teh Hadoop cluster from CDH3 to CDH4.
- Implemented Fair scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
- Implemented Kerberos for autanticating all teh services in Hadoop Cluster.
- Deployed Network file system for Name Node Metadata backup.
- Designed and allocated HDFS quotas for multiple groups.
Environment: ClouderaManagerCDH3, MapReduce, HDFS, Sqoop, Flume, LINUX, Oozie, Hadoop, Pig, Hive, HBase, Nagios, Gangila
Confidential
Linux Administrator
Responsibilities:
- Installing and updating packages using YUM.
- Installing and maintaining teh Linux servers..
- Created volume groups logical volumes and partitions on teh Linux servers and mounted file systems and created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Improve system performance by working with teh development team to analyze, identify and resolve issues quickly.
- Ensured data recovery by implementing system and application level backups.
- Performed various configurations which include networking and IPTable, resolving host names and SSH keyless login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Automate administration tasks through teh use of scripting and Job Scheduling using CRON.
- Monitoring System Metrics and logs for any problems.
- Running cron-tab to back up data.
- Adding, removing, or updating user account information, resetting passwords, etc.
- Using Java Jdbc to load data into MySQL.
- Maintaining teh MySQL server and Autantication to required users for databases.
- Support pre-production and production support teams in teh analysis of critical services and assists with maintenance operations.