Hadoop Administrator Resume
San, FranciscO
OBJECTIVE:
Seeking an opportunity to leverage my talents using deep understanding of big data technology around 7 years experience in IT industry with 3 years of experience as a Hadoop Administrator and with 4 years of experience as a Linux Administrator and Masters in Computer Science.
SUMMARY:
- Installing, configuring and using ecosystem components like Hadoop Apache Ranger, HBase, HDFS, Hive, Kafka, Map Reduce, Oozie, Pig, Sqoop, YARN and Zookeeper on Linux, Hortonworks Ambari and Cloudera Platform.
- Designing and maintaining all system tools for all scripts and automation processes and monitoring all capacity planning.
- Experience in monitoring and managing 100+ node Hadoop clusters.
- Identifying, assessing, and recommending appropriate solutions to advice customer on cluster requirements and applying industry best - practices and continuity planning to address back-up and recovery.
- Deploying Hadoop cluster on Public and Private Cloud Environments in AWS and Hortonworks Ambari.
- Enabling High Availability to HDFS, YARN, HIVE, OOZIE services to improve the redundancy from failures.
- Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Yarn and Map Reduce.
- Performing analysis and providing input to support capacity planning and solution design, cluster maintenance with patching/upgrades/migration, user provisioning, automation of routine tasks, and re-processing of failed jobs.
- Practical knowledge on functionalities of Hadoop daemon, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Optimized queries, portioned the tables as well as the hive config file as per the cluster to improve performance.
- Enabling table and column family authentication via Access Control Lists and implemented authorization to restrict user access to table data using HBase.
- Writing PIG scripts and Hive Queries for processing and analyzing large volumes of data.
- Running Oozie jobs using workflows and Controllers for job automation.
- Managing Zookeeper and ZKFC and configuring in Name Node failure situations.
- Creating job pools, assigning users to pools and restricting production job submissions based on pool.
- Creating a group of cluster nodes and assigning each groups to each set of configuration files.
- Distributed Kafka, partitioned, replicated commit log service and provided the functionality of a messaging system.
- Authorizing, authenticating, auditing, encrypting and administrating the users data using Apache Ranger, Apache Knox on Kerberos cluster.
- Provided 24X7 supports to ensure round the clock availability.
- Performing System Storage management/LVM tasks like creating volume groups, PV’s, LV’s and jfs/jfs2 File systems, mirroring, and mounting file systems.
- Scheduling cron jobs in Linux. Expertise in Installing, Configuration and Managing Red hat Linux 5, 6.
TECHNICAL SKILLS:
Hadoop ecosystem tools: HBase, HDFS, Hive, Hue, Kafka, Map Reduce, Oozie, Pig, Spark-SQL, Sqoop, YARN, Zookeeper.
Cluster Management Tools: Cloudera Manager, Hortonworks Ambari.
Security: Active Directory, Apache Ranger, Kerberos, Knox.
Monitoring Tools: Nagios, Ganglia.
Operating System:: Ubuntu, Unix/Linux, Windows, Mac.
Database:: MySQL, Postgre, Oracle.
Cloud Platform: AWS.
Virtualization Technologies: VMware vSphere, Citrix Xen-Server.
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco
Hadoop Administrator
- Deploying worker nodes on to Cloudera Manager.
- Continuous Monitoring and managing the Hadoop clusters, health status of the files and folders and fixing the issues through Linux Platform.
- Backing up the Hadoop files using snapshots, distcp and replication to prevent from system failures.
- Creating Resource Pools and assigning the users to the specific queues to improve the performance of the resource utilization.
- Add users into the Active Directory and OS level and assigning them to the specific groups.
- Limiting the users from access the data using Centrify, RADA.
- Solving the tickets based on the priority in Service Now, HPSM and JIIRA.
- Providing users the platform to run the code on Jupyter and Visualize the data on QLIK SENSE.
- Creating database tables and views in HUE.
Environment: Centos 6, Cloudera Manager, HDFS, Hive, HUE, Impala, Kerberos, MapReduce, Oozie, Pig, Sqoop, YARN, Zookeeper.
Confidential, Santa Clara
Hadoop Administrator
- Installing Hadoop eco systems from Linux Platform and Upgrading them to the latest versions.
- Deploying the clusters from Linux Platform on to Hortonworks Ambari.
- Running and scheduling MapReduce, hive, pig, cron jobs using oozie.
- Providing instant solution to reduce the impact and documenting the same and preventing future issues.
- Configured the Hadoop properties to achieve the high performance.
- Assigning Storage Policies and created a storage pools and directives to the users.
- Setting up multiple NFS Gateway to mount clients to the HDFS to browse, download, upload files to increase the scalability.
- Balancing the clusters to improve the performance, reduce the data loss due to hardware failures, random availability of data servers and addition of new worker nodes.
- Installing Apache Ranger via Ambari and setting up Ranger plugins for Hadoop components: HDFS, Hive, HBase, YARN, Knox.
- Increasing the Hive Java heap size when the Hive Server 2 service crashed.
- Collected and processed large amounts of streaming data into HDFS using Kafka and producing the messages using producers.
- Configuring queue administrators ACL’s and default queue mappings with or without override enabled.
- Setting up the secure environment by restricting the services access and limiting that run as a root.
- Setting up the Hadoop components (HBase, HDFS, Hive, Sqoop) to run on a Kerborized cluster and create Ranger appropriate authorization policies for access.
- Maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with corn jobs.
Environment: Ambari Hortonworks, Apache Knox, Apache Ranger, Centos 6, HBase, HDFS, Hive, Kafka, Kerberos, MapReduce, Oozie, Pig, Sqoop, YARN, Zookeeper.
Confidential, NY
Hadoop Administrator
- Creating instances and deploying the clusters on AWS and Standalone and Fully Distributed mode.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Controlling container, applications and user limits and reservations using YARN Queue Manager.
- Providing connectivity between MapReduce, Pig and Hive for the users to read and write the data using Hcatalog.
- Limited the number of concurrent users accessing the resources to reduce overall Hive Server 2 resource consumption.
- Using the framed and compact transport protocols and specifying the maximum frame size to prevent the thrift server crashing when the invalid data is received in HBase.
- Performing minor and major upgrades, commissioning and decommissioning of nodes on Hadoop cluster.
- Managing large-scale Hadoop cluster environments, design, cluster setup, performance tuning, ongoing monitoring and capacity planning.
- Proactively involved in ongoing maintenance, support, and improvements in Hadoop cluster.
- Starting and stopping the services in Ambari.
- Adding new worker node to avoid additional inconvenience and removing the failed worker nodes from cluster.
- Setting up the cluster size and memory size based on the requirements, queues to run the jobs based on the capacities and node labels and enabling them for the job queues to run.
- Running jobs configuration with combination of default, per-site, per node and per job configuration.
- Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers.
Environment: Ambari HortonWorks, AWS, Centos 6, HBase, HDFS, Hive, Kafka, Kerberos, MapReduce, MySQL, Oozie, Pig, Sqoop, YARN.
Confidential, Atlanta
Hadoop Administrator
- Installed and configured HDP 2.5 cluster using Ambari Hortonworks.
- Collecting the requirements from the clients, analyzing and finding a solution to setup the Hadoop cluster environment.
- Conducting meeting with team members and assigning work on each of them. Reporting to Manager on weekly basis about the work progress.
- Planning on data topology, rack topology and resources availability for users to share requirements for migrating users to production and implementing data migration from existing staging to production cluster and proposed effective Hadoop solutions to meet the specific customer requirements.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Presented Demos to customers how to use AWS and how it is different from traditional systems.
- Running different jobs on a daily basis to test the issues and improve the performance.
- Monitoring, managing & reviewing Hadoop log files and conducting performance tuning, capacity management and root cause analysis on failed components & implement corrective measures.
- Debug and solve the major issues with Cloudera manager by interacting with the Infrastructure team from Cloudera.
- Setting up the racks to improve the HDFS availability and increase the cluster performance.
- Coordinate with developers and QA team in completing the tasks as scheduled.
- Limiting the creating, adding of HDFS file, folders by setting up the HDFS Quotas.
- Tracking and protecting to sensitive data access, who is accessing what data and what are they doing with it.
Environment: Ambari HortonWorks, AWS, Centos 6, Cloudera Manager, HBase, HDFS, Hive, HUE.
Confidential
Linux Administrator
- Creation of VMs, cloning and migrations of the VMs on VMware v Sphere 4.0/4.1.
- Installed, configured, troubleshoot and maintain Linux Servers and Apache Web server, configuration and maintenance of security and scheduling backups, submitting various types of croon jobs.
- Creating and Authenticating Windows user accounts on Citrix server.
- Implementing backup solution using Dell T120 autoloader and CA Arc Server 7.0.
- Installing software’s and upgrading them to the latest and compatible version.
- Setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, Swampiness, Selinux and installing Java.
- Working with Logical Volume Manager and creating of volume groups/logical performed Red Hat Linux Kernel Tuning.
- Upgrading Hardware configurations by adding CPU, memory, storage and network resources.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Maintained and monitored all servers' operating system and application patch level, disk space and memory usage, user activities on daily basis.
- Providing permissions to users and groups based on the level of access to files and folders.
- Configuring network service like HTTP(S), SMB, NFS, Web Proxy, SMTP, IMAP, SSH, DNS, DHCP, NTP.
- Giving the Level 2/3 support and assisting the team members in fixing/troubleshooting the AIX software problems.
- Providing 24X7 On-call Production and Customer Support including trouble shooting problems related to IBM AIX pSeries servers.