Hadoop Cloudera Admin Resume
Huston, TX
SUMMARY:
- 7 Years of experience in IT which including Hadoop Administration, Windows VMWare and Linux Administration in areas of Financial, Insurance Industries, Client - Server, Internet Technologies, SOA application Integration.
- Deploying a Hadoop cluster, maintaining a Hadoop cluster, adding and removing nodes using monitoring tools like Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running Hadoop jobs.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job. Experience Schedule Recurring Hadoop Jobs with Apache Oozie.
- Worked closely with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Hands on experience with opens source monitoring tools including; Nagios and Ganglia.
- Good Knowledge on NoSQL databases such as Cassandra, Hbase and MongoDB.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access. Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
- Strong experience in writing shell scripts to automate the administrative tasks and automate the WebSphere Environment with Perl and Python Scripts.
- Starting, stopping and restarting the Cloudera manager servers whenever there are any changes or any errors.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Familiarity with a NoSQL database such as MongoDB.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Hands on experience in installing, configuring Cloudera, MapR, Hortonworks clusters and installing Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume and Zookeeper.
- Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying of scheduled jobs such as backups.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
- Experienced on supporting Production clusters troubleshooting issues within window to avoid any delays.
- Good understanding and hands on experience of Hadoop Cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.
- Good hands on experience on LINUX Administration and troubleshooting issues related to Network and OS level.
- Assist developers with troubleshooting Map Reduce, BI jobs as required.
- Good Working Knowledge on Linux concepts and building servers ready for Hadoop Cluster setup.
- Extensive experience on monitoring servers with Monitoring tools like Nagios, Ganglia about Hadoop services and OS level Disk/memory/CPU utilizations.
- Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
- Experience in Migrating the On-Premise Data Center to AWS Cloud Infrastructure.
- Experience in AWS CloudFront including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Good working knowledge of Vertica DB architecture, column orientation and High Availability.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Experience in installing, configuring Hive, its services and Meta store. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.
- Hadoop Ecosystem Cloudera, Hortonworks, Hadoop, MapR, HDFS, HBase, Yarn, Zookeeper, Nagios, Hive, Pig, and Ambari Spark Impala.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
TECHNICAL SKILLS:
Hadoop ecosystem tool's and Automation tool: MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Solr, Spark, Flume. MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Solr, Spark, Flume,Ansible.
Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Impala,MapR,HCatalog, Apache Spark, Spark Streaming, Spark SQL, HBase, NiFi and Cassandra, AWS (EMR, EC2), Hortonworks, Cloudera.
Programming Language: Core Java, HTML, Programming C, C++.
Databases: MySQL, Oracle … Oracle Server X6-2, HBase, NoSQL.
Scripting languages: Shell Scripting, Bash Scripting, HTML scripting, Python.
WEB Servers: Apache Tomcat, JBOSS, windows server2003, 2008, 2012.
Security Tool's: LDAP, Sentry, Ranger and Kerberos.
Cluster Management Tools: Cloudera Manager, HDP Ambari, Hue
Operating Systems: Sun Solaris 8,9,10, Red Hat Linux 4.0, RHEL-5.4, RHEL 6.4, IBM-AIX, HPUX 11.0, HPUX 11i, UNIX, VMware ESX 2.x, 3.Windows XP, Server … Ubuntu.
Scripting & Programming Languages: Shell & Perl programming
Platforms: Linux (RHEL, Ubuntu) Open Solaris, AIX.
PROFESSIONAL EXPERIENCE:
Hadoop Cloudera Admin
Confidential, Huston, TX
Responsibilities:
- Hadoop installation, Configuration of multiple nodes using Cloudera platform.
- Installed and configured a Hortonworks HDP 2.2 using Ambari and manually through command line. Cluster maintenance as well as creation and removal of nodes using tools like Ambari, Cloudera Manager Enterprise and other tools.
- Handling the installation and configuration of a Hadoop cluster.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Worked on creating comprehensive MongoDB API and Document DB API using Storm into Azure Cosmos DB.
- Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Created instances in AWS as well as migrated data to AWS from data Center using snowball and AWS migration service
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
- Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
- Currently working as hadoop administrator in MapR hadoop distribution for 5 clusters ranges from POC clusters to PROD clusters contains more than 1000 nodes.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performed by the cluster.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
- Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
- Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions Extensive Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies.
- Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Implemented several scheduled Spark, Hive & Map Reduce jobs in Hadoop MapR distribution.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Environment: Linux, Shell Scripting, Teradata, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.
Hadoop Kafka Admin
Confidential, Chicago, IL
Responsibilities:
- Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
- Responsible for maintaining 24x7 production CDH Hadoop clusters running spark, HBase, hive, MapReduce with multiple petabytes of data storage on daily basis.
- Successfully secured the Kafka cluster with Kerberos.
- Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
- Deployed Name Node high availability for major production cluster.
- Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
- Configured Oozie for workflow automation and coordination.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Troubleshoot production level issues in the cluster and its functionality.
- Backup data on regular basis to a remote cluster using Distcp.
- Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
- Worked on Hadoop Operations on the ETL infrastructure with other BI teams like TD and Tableau.
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Performed Disk Space management to the users and groups in the cluster.
- Created POC for implementing streaming use case with Kafka and HBase services.
- Used Storm and Kafka Services to push data to HBase and Hive tables.
- Documented slides & Presentations on Confluence Page.
- Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
- Used Sqoop, Distcp utilities for data copying and for data migration.
- Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
- Installed Kafka cluster with separate nodes for brokers.
- Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
- Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
- Effectively worked in Agile Methodology and provide Production On call support
- Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitor Hadoop cluster connectivity and security.
- Manage and review Hadoop log files.
- File system management and monitoring.
- Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
- Diagnose and resolve performance issues and scheduling of jobs using Cron & Control-M.
- Used Avro SerDe for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
Environment: CDH 5.8.3, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Kafka, Flume, Zookeeper.
Hadoop Hortonworks Admin
Confidential, Sunnyvale, CA
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
- Worked on Capacity planning for the Production Cluster.
- Installed HUE Browser.
- Involved in loading data from UNIX file system to HDFS and creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Experience in MapR, Cloudera, & EMR Hadoop distributions.
- Worked on Installation of HORTONWORKS 2.1 in AWS Linux Servers and Configuring Oozie Jobs
- Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.
- Performed on cluster up gradation in Hadoop from HDP 2.1 to HDP 2.3.
- Ability to Configuring queues in capacity scheduler and taking Snapshot backups for Hbase tables.
- Worked on fixing the cluster issues and Configuring High Availability for Name Node in HDP 2.1.
- Involved in Cluster Monitoring backup, restore and troubleshooting activities.
- Involved in MapR to Hortonsworks migration.
- Currently working as hadoop administrator in MapR hadoop distribution for 5 clusters ranges from POC clusters to PROD clusters contains more than 1000 nodes.
- Implemented manifest files in puppet for automated orchestration of Hadoop and Cassandra clusters.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
- Responsible for implementation and ongoing administration of Hadoop infrastructure
- Managed and reviewed Hadoop log files.
- Administration of Hbase, Hive, Sqoop, HDFS, and MapR.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Worked on Configuring Kerberos Authentication in the cluster
- Experience in using Mapr File system, Ambari, Cloudera Manager for installation and management of Hadoop Cluster.
- Very good experience with all the Hadoop eco systems in UNIX environment.
- Experience with UNIX administration.
- Worked on installing and configuring Solr 5.2.1 in Hadoop cluster.
- Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
- Worked on indexing the Hbase tables using and indexing the Json data and Nested data.
- Hands on experience on installation and configuring the Spark and Impala.
- Successfully install and configuring Queues in Capacity scheduler and Oozie scheduler.
- Worked on configuring queues in and Performance Optimization for the Hive queries while Performing tuning in the Cluster level and adding the Users in the clusters.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Ambari.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Monitored workload, job performance and capacity planning
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Inventing and deploying a corresponding Solr Cloud collection.
- Creating collections and configurations, Register a Lily Hbase Indexer configuration with the Lily Hbase Indexer Service.
- Creating and managing the Cron jobs.
Environment: Hadoop, Map Reduce, Yarn, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Impala, Spark, Hortonworks, Flume, HBase, Zookeeper and Unix/Linux, Hue (Beeswax), AWS.
Hadoop/Linux Admin
Confidential, Englewood, CO
Responsibilities:
- Provide technical designs, architecture, Support automation, installation and configuration tasks and upgrades and planning system upgrades of Hadoop cluster.
- Design development and architecture of the Hadoop cluster, map reduce processes, Hbase system.
- Design and develop process framework and support data migration in Hadoop system.
- Involved in installation and configuration of Kerberos security setup on CDH5.5 cluster.
- Involved in upgrading Hadoop Cluster from HDP 1.3 to HDP 2.0.
- Implemented secondary sorting to sort reducer output globally in MapReduce.
- Involved in installation and configuration of LDAP server and integrated with kerberos on cluster.
- Worked with Sentry configuration to provide centralized security to hadoop services.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Monitor critical services and provide on call support to the production team on various issues.
- Assist in Install and configuration of Hive, Pig, Sqoop, Flume, Oozie and HBase on the Hadoop cluster with latest patches.
- Involved in performance tuning of various hadoop ecosystem components like YARN, MRv2.
- Implemented the Kerberos security software to CDH cluster for user level as well as service level to provide strong security to the cluster.
- Troubleshooting, diagnosing, tuning, and solving Hadoop issues.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Environment: Hortonworks (HDP 2.2), Ambari, Map Reduce 2.0(Yarn), HDFS, Hive, Hbase, Pig, Oozie, Sqoop, Spark, Flume, Kerberos, Zookeeper, DB2, SQL Server 2014, CentOS,Linux, RHEL 6.x.