We provide IT Staff Augmentation Services!

Hadoop Cloudera Admin Resume

2.00/5 (Submit Your Rating)

San Antonio, TX

SUMMARY:

  • 6+ Years of experience in IT which including Hadoop Administration, Windows VMWare and Linux Administration in areas of Financial, Insurance Industries, Client - Server, Internet Technologies, SOA application Integration.
  • Deploying a Hadoop cluster, maintaining a Hadoop cluster, adding and removing nodes using monitoring tools like Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running Hadoop jobs.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job. Experience Schedule Recurring Hadoop Jobs with Apache Oozie.
  • Worked closely with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Hands on experience with opens source monitoring tools including; Nagios and Ganglia.
  • Good Knowledge on NoSQL databases such as Cassandra, Hbase and MongoDB.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access. Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
  • Strong experience in writing shell scripts to automate the administrative tasks and automate the WebSphere Environment with Perl and Python Scripts.
  • Starting, stopping and restarting the Cloudera manager servers whenever there are any changes or any errors.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB.
  • Familiarity with a NoSQL database such as MongoDB.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Hands on experience in installing, configuring Cloudera, MapR, Hortonworks clusters and installing Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume and Zookeeper.
  • Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying of scheduled jobs such as backups.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
  • Experienced on supporting Production clusters troubleshooting issues within window to avoid any delays.
  • Good understanding and hands on experience of Hadoop Cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.
  • Good hands on experience on LINUX Administration and troubleshooting issues related to Network and OS level.
  • Assist developers with troubleshooting Map Reduce, BI jobs as required.
  • Good Working Knowledge on Linux concepts and building servers ready for Hadoop Cluster setup.
  • Extensive experience on monitoring servers with Monitoring tools like Nagios, Ganglia about Hadoop services and OS level Disk/memory/CPU utilizations.
  • Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
  • Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
  • Experience in Migrating the On-Premise Data Center to AWS Cloud Infrastructure.
  • Experience in AWS CloudFront including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Good working knowledge of Vertica DB architecture, column orientation and High Availability.
  • Hadoop Ecosystem Cloudera, Hortonworks, Hadoop, MapR, HDFS, HBase, Yarn, Zookeeper, Nagios, Hive, Pig, and Ambari Spark Impala.

TECHNICAL SKILLS:

Hadoop ecosystem tool's and Automation tool: MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Solr, Spark, Flume. MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Solr, Spark, Flume,Ansible.

Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Impala,MapR,HCatalog, Apache Spark, Spark Streaming, Spark SQL, HBase, NiFi and Cassandra, AWS (EMR, EC2), Hortonworks, Cloudera.

Databases: MySQL, Oracle Oracle Server X6-2, HBase, NoSQL.

Scripting languages: Shell Scripting, Bash Scripting, HTML scripting, Python.

WEB Servers: Apache Tomcat, JBOSS, windows server2003, 2008, 2012.

Security Tool's: LDAP, Sentry, Ranger and Kerberos.

Cluster Management Tools: Cloudera Manager, HDP Ambari, Hue

Operating Systems: Sun Solaris 8,9,10, Red Hat Linux 4.0, RHEL-5.4, RHEL 6.4, UNIX, VMware ESX 2.x, 3.Windows XP, Ubuntu.

Scripting & Programming Languages: Shell & Perl programming

Platforms: Linux (RHEL, Ubuntu) Open Solaris, AIX.

PROFESSIONAL EXPERIENCE:

Hadoop Cloudera Admin

Confidential - San Antonio, TX

Responsibilities:

  • Working as Hadoop Administrator with Cloudera Distribution of Hadoop (CDH).
  • Installed/Configured/Maintained Apache Hadoop and Cloudera Hadoop clusters for application development and Hadoop tools like HDFS, Hive, HBase, Zookeeper and Map Reduce.
  • Managing and scheduling Jobs on Hadoop Clusters using Apache, Cloudera (CDH5.7.0, CDH5.10.0) distributions.
  • Successfully upgraded Cloudera Distribution of Hadoop distribution stack from 5.7.0 to 5.10.0.
  • Installed and configured a Cloudera Distribution of Hadoop (CDH) manually through command line.
  • Maintaining the Operations, installations, configuration of 150 node clusters with CDH distribution.
  • Monitored multiple Hadoop clusters environments, workload, job performance and capacity planning using Cloudera Manager.
  • Created instances in AWS as well as migrated data to AWS from data Center using snowball and AWS migration service.
  • Created graphs for each HBase table in cloudera on basis of writes, reads, file size in respective dashboards.
  • Exported and created Dashboards of cloudera logs in to Grafana by using JMX exporter and Prometheus.
  • Installed and Configured SOLR in cloudera to query HBase data.
  • Worked on setting up High availability for major Hadoop Components like Name Node, Resource Manager, Hive and Cloudera Manager.
  • Created new Users, Principals, Keytabs in different kerberozed clusters.
  • Part of every 30 day patching with Operational team on Hadoop clusters.
  • Installed and configured Hadoop cluster across various environments through Cloudera Manager.
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager
  • Enable TLS between Cloudera manager and agents.
  • Enhancing by tuning performance of HBase and HDFS to with stand heavy writes and reads by changing Configurations.
  • Installed and Configured Phoenix to query HBase data in Cloudera Environment.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
  • Involved in start to end process of Hadoop cluster setup which includes Configuring and Monitoring the Hadoop Cluster.
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Installed Name Node, Secondary Name Node, Yarn (resource Manager, Node manager, Application Master) and Data Nodes.
  • Handling and generating tickets via the BMC Remedy ticketing tool.
  • Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
  • Monitoring performance and tuning configuration of services in Hadoop Cluster.
  • Experienced in managing and reviewing Hadoop log files.

Environment: Linux, Shell Scripting, Teradata, SQL server, Cloudera 5.7, 5.8, 5.9 Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.

Hadoop/Kafka Admin

Confidential - St.Louis, MO

Responsibilities:

  • Responsible for maintaining 24x7 production CDH Hadoop clusters running spark, HBase, hive, MapReduce with multiple petabytes of data storage on daily basis.
  • Successfully secured the Kafka cluster with Kerberos.
  • Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
  • Deployed Name Node high availability for major production cluster.
  • Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.Configured Oozie for workflow automation and coordination.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Troubleshoot production level issues in the cluster and its functionality.
  • Backup data on regular basis to a remote cluster using Distcp.
  • Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
  • Worked on Hadoop Operations on the ETL infrastructure with other BI teams like TD and Tableau.
  • Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
  • Performed Disk Space management to the users and groups in the cluster.
  • Created POC for implementing streaming use case with Kafka and HBase services.
  • Used Storm and Kafka Services to push data to HBase and Hive tables.
  • Documented slides & Presentations on Confluence Page.
  • Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
  • Used Sqoop, Distcp utilities for data copying and for data migration.
  • Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
  • Installed Kafka cluster with separate nodes for brokers.
  • Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
  • Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
  • Effectively worked in Agile Methodology and provide Production On call support
  • Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Monitor Hadoop cluster connectivity and security.
  • Manage and review Hadoop log files.
  • File system management and monitoring.
  • Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
  • Diagnose and resolve performance issues and scheduling of jobs using Cron & Control-M.
  • Used Avro SerDe for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.

Environment: Kafka, Kafka Broker, CDH 5.8.3, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper.

Hadoop Admin

Confidential - Englewood, CO

Responsibilities:

  • Responsible for maintaining 24x7 production CDH Hadoop clusters running spark, HBase, hive, MapReduce with multiple petabytes of data storage on daily basis.
  • Successfully secured the Kafka cluster with Kerberos.
  • Experienced as admin in Hortonworks (HDP 2.2.4.2) distribution for clusters ranges from POC to PROD.
  • Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
  • Deployed Name Node high availability for major production cluster.
  • Creating instances and deploying the clusters on AWS and Standalone and Fully Distributed mode.
  • Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
  • Configured Oozie for workflow automation and coordination.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Troubleshoot production level issues in the cluster and its functionality.
  • Backup data on regular basis to a remote cluster using Distcp.
  • Installed and configured a Hortonworks HDP 2.2 using Ambari and manually through command line.
  • Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
  • Worked on Hadoop Operations on the ETL infrastructure with other BI teams like TD and Tableau.
  • Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
  • Performed Disk Space management to the users and groups in the cluster.
  • Created POC for implementing streaming use case with Kafka and HBase services.
  • Used Storm and Kafka Services to push data to HBase and Hive tables.
  • Documented slides & Presentations on Confluence Page.
  • Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
  • Used Sqoop, Distcp utilities for data copying and for data migration.
  • Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
  • Installed Kafka cluster with separate nodes for brokers.
  • Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
  • Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
  • Effectively worked in Agile Methodology and provide Production On call support
  • Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Monitor Hadoop cluster connectivity and security.
  • Manage and review Hadoop log files.
  • File system management and monitoring.
  • Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
  • Diagnose and resolve performance issues and scheduling of jobs using Cron & Control-M.
  • Used Avro SerDe for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.

Environment: CDH 5.8.3, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Kafka, Flume, Zookeeper.

Hadoop/ Linux Admin

Confidential - Sunnyvale, CA

Responsibilities:

  • Involved in Designing, Planning, Administering, Installation, Configuring, Updating, Troubleshooting, Performance monitoring and Fine-tuning of Hadoop cluster .
  • Collecting the requirements from the clients, analyzing and finding a solution to setup the Hadoop cluster environment.
  • Conducting meeting with team members and assigning work on each of them. Reporting to Manager on weekly basis about the work progress.
  • Planning on data topology, rack topology and resources availability for users to share requirements for migrating users to production and implementing data migration from existing staging to production cluster and proposed effective Hadoop solutions to meet the specific customer requirements.
  • Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Presented Demos to customers how to use AWS and how it is different from traditional systems.
  • Running different jobs on a daily basis to test the issues and improve the performance.
  • Monitoring, managing & reviewing Hadoop log files and conducting performance tuning, capacity management and root cause analysis on failed components & implement corrective measures.
  • Setting up the cluster size and memory size based on the requirements, queues to run the jobs based on the capacities and node labels and enabling them for the job queues to run.
  • Running jobs configuration with combination of default, per-site, per node and per job configuration.
  • Debug and solve the major issues with Cloudera manager by interacting with the Infrastructure team from Cloudera.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
  • Performing minor and major upgrades, commissioning and decommissioning of nodes on Hadoop cluster.
  • Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Monitor critical services and provide on call support to the production team on various issues.
  • Assist in Install and configuration of Hive, Pig, Sqoop, Flume, Oozie and HBase on the Hadoop cluster with latest patches.
  • Involved in performance tuning of various hadoop ecosystem components like YARN, MRv2.
  • Implemented the Kerberos security software to CDH cluster for user level as well as service level to provide strong security to the cluster.
  • Troubleshooting, diagnosing, tuning, and solving Hadoop issues.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Setting up the racks to improve the HDFS availability and increase the cluster performance.
  • Coordinate with developers and QA team in completing the tasks as scheduled.
  • Limiting the creating, adding of HDFS file, and folders by setting up the HDFS Quotas.
  • Tracking and protecting to sensitive data access, who is accessing what data and what are they doing with it.

Environment: Hadoop, Map Reduce, Yarn, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Impala, Spark, Hortonworks, Flume, HBase, Zookeeper and Unix/Linux, Hue (Beeswax), AWS.

Linux System Admin

Confidential

Responsibilities:

  • Worked on Administration, maintenance and support of Red Hat Enterprise Linux (RHEL) servers.
  • Executed user administration and maintenance tasks including creating users & groups.
  • Upgrading packages and patching systems to maintain the productive environment using rpm and yum.
  • Upgraded and maintained software packages on servers using RHEL Satellite and Repository servers.
  • Resizing volumes to meet customer requirement, dealing with Volume Manager Performance issues.
  • Writing Bash scripts for backup and automation.
  • Install and configure DHCP, DNS (BIND, MS), web (Apache, IIS) and file servers on Linux servers.
  • Periodic checks of production and development systems; CPU utilization, memory profiles, disk utilization, network connectivity, system log files, etc.
  • Log file was managed for troubleshooting and probable errors.
  • Responsible for reviewing all open tickets, resolve and close any existing tickets.
  • Performed all System Administration tasks like installing packages, and patches.
  • Troubleshooting various problems related to VM in initializing, replacing, mirroring, encapsulating and removing disk devices on various Production Boxes.
  • Handled effectively Issues related to NFS, NIS, and LVM, configuration & maintenance of RAID Levels.
  • Managing and troubleshooting user's login problems.
  • Manage and monitor Active Directory services and group policies.
  • Involved in development, user acceptance, and performance testing, production & disaster recovery server.
  • Add swap space on disks as needed using Linux utilities.
  • Experience in Installation of third-party tools using packages on Linux and Ubuntu
  • Upgraded and maintained software packages on servers using RHEL Satellite and Repository servers.
  • Configured resources such as packages, services, files, directories, notifying users and groups, setting up Cron Scheduler.
  • Performed all necessary day-to-day Git support, implemented, maintained the Branching, Build/Release strategies utilizing Git repositories.

Environment: RHEL 4.0, 5.0, UNIX, Git, VMware, Cron, Bash, Apache, Lvm, Nfs.

We'd love your feedback!