Hadoop Administrator Resume
Baltimore, Md
SUMMARY:
- Overall 12+ years of professional Information Technology experience in Hadoop and Linux Administration activities such as installation, configuration and maintenance of systems/clusters.
- Having extensive experience in Linux Administration & Big Data Technologies as a Hadoop Administration.
- Hands on experience in Hadoop Clusters using Hortonworks (HDP), Cloudera (CDH3, CDH4), oracle big data and Yarn distributions platforms.
- Possessing skills in Apache Hadoop, Map - Reduce, Pig, Impala, Hive, Hbase, Zookeeper, Sqoop,Flume, OOZIE, Kafka, storm, Spark, Java Script, and J2EE.
- Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (Hive, Pig, Sqoop, Oozie, Flume, HCatalog, Hbase, Zookeeper) using Hortonworks Ambari.
- Good experience in creating various database objects like tables, stored procedures, functions, and triggers using SQL, PL/SQL and DB2.
- Used Apache Falcon to support Data Retention policies for HIVE/HDFS.
- Experience in Configuring Name-node High availability and Name-node Federation and depth knowledge on Zookeeper for cluster coordination services.
- Experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- Experience in administering Tableau and Green Plum databases instances in various environments.
- Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Extensive knowledge in Tableau on Enterprise Environment and Tableau administration experience including technical support, troubleshooting, reporting and monitoring of system usage.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Worked on NoSQL databases including Hbase, Cassandra and MongoDB.
- Designing and implementing security for Hadoop cluster with Kerberos secure authentication.
- Hands on experience on Nagios and Ganglia tool for cluster monitoring system.
- Experience in scheduling all Hadoop/Hive/Sqoop/Hbase jobs using Oozie.
- Knowledge of Data Ware Housing concepts and Cognos 8 BI Suit and Business Objects.
- Experience in HDFS data storage and support for running map-reduce jobs.
- Experience in Installing Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems.
- Expert in Linux Performance monitoring, kernel tuning, Load balancing, health checks and maintaining compliance with specifications.
- Hands on experience in Zookeeper and ZKFC in managing and configuring in Name Node failure scenarios.
- Team Player with good communication and interpersonal skills and goal oriented approach to problem solving issues.
- Authorized to work in United States for any employer
PROFESSIONAL EXPERIENCE:
Hadoop Administrator
Confidential
Responsibilities:
- Providing hardware architectural guidance, planning and estimating cluster capacity, and creating roadmaps for Hadoop cluster deployment.
- Working on Hortonworks & Cloudera cluster setup
- Automate the repeated tasks using python.
- Having knowledge in shell and python programing language
- Installing, Configuring, Maintaining, and Troubleshooting Standalone systems
- Configuring and updating parameters in servers using python.
- Analyze the shell/python script code during the migrations.
- Troubleshooting the issues like job failures and performance issues
- Hadoop user administration using Sentry.
- Upgrade from CDH from 5.2 to 5.3
- Responsible to manage data coming from different sources.
- Knowledge on snapshots.
- User administration via Ldap, Kerberos mechanism.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Adding new nodes to an existing cluster, recovering from a Name Node failure.
- Maintaining Backup for name node.
- Knowledge in Name node recoveries from previous backups
- Importing and exporting data into HDFS using Sqoop
- Depth conceptual and functional understanding of Map Reduce and • Hadoop eco-system Infrastructure (Both MRv1 and MRv2)
- Deploy & scale-out multi-node Hadoop cluster components including MapReduce, PIG, Hive, Hbase
- Design/implement workflow and coordinator jobs using Oozie tool
- Functional knowledge of flume, sqoop
- Experience in trouble shooting, optimization& performance tuning
- Experience in change management.
- Follow the functional spec analysis and develop ETL pipeline Develop MapReduce/PIG application to transform the data available in HDFS
- Generate reporting data using PIG/Hive to serve business team's ad-hoc requests
- Import data to Hive from RDBMS sources, process it, write resulted data to HDFS
- Cluster management, troubleshoot, share best practices to the team
- Schedule the jobs using Oozie workflow
- Built hadoop cluster from scratch in a "start small and scale quickly" approach
- Well versed with the security issues like Quotas, RBAC, ACL, setuid and sticky bit.
- Using Kerberos, LDAP, Rangers for Access identification management.
Environment: Hortonworks, Ambari, Core Java, HDPS, Yarn, Spark, Pig, Oozie, Hive, Sqoop, Flume, Zookeeper, Kerberos, LINUX, AWS, UNIX
Hadoop Administrator
Confidential - Baltimore, MD.
Responsibilities:
- Install and Manage HDP Hortonworks DL and DW components.
- Worked on Hadoop Hortonworks (HDP 2.6.0.2.2) distribution which managed services viz. HDFS, MapReduce2, Tez, Hive, Pig, Hbase, Sqoop, Flume, Spark, Ambari Metrics, Zookeeper, Falcon and oozie etc.) for 4 cluster ranges from LAB, DEV, QA to PROD contains nearly 350+ nodes with 7PB data.
- Monitor Hadoop cluster connectivity and security on Ambari monitoring system.
- Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Rendered L3/L4 support services for BI users, Developers and Tableau team through Jira ticketing system.
- One of the key engineers in Aetna's HDP web engineering team, Confidential engineering ISE.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately for the tickets raised by users in JIRA ticketing system
- Worked closely with System Administrators, BI analysts, developers, and key business leaders to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.
- Performance Tuning and ETL, Agile Software Deployment, Team Building & Leadership, Engineering Management.
- Hortonworks Ambari, Apache Hadoop on Redhat, and Centos as data storage, retrieval, and processing systems.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID's using keytab scripts.
- Performed a Major upgrade in production environment from HDP 2.3 to HDP 2.6. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Installed OLAP software Atscale on its designated edge node server.
- Implemented dual data center set up for all Cassandra cluster. Performed much complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize.
- Conducted cluster sizing, tuning, and performance benchmarking on a multi-tenant OpenStack platform to achieve desired performance metrics.
- Good knowledge on providing solution to the users who encountered java exception and error problems while running the data models in SAS script and Rscript. Good understanding on forest data models.
- Worked on data ingestion on systems to pull data scooping from traditional RDBMS platforms such as Oracle, MySQL and Teradata to Hadoop cluster using automated ingestion scripts and also store data in NoSQL databases such as HBase, Cassandra.
- Provided security and authentication with Kerberos which works on issuing Kerberos tickets to users.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developers/business users for day-to-day activities.
- Create queues and allocated the clusters resources to provide the priority for jobs in hive.
- Implementing the SFTP for the projects to transfer data SCP from External servers to servers. Experienced in managing and reviewing log files. Involved in scheduling Oozie workflow engine to run multiple Hive, sqoop and pig jobs.
Environment: CDH 5.4.3 and 4.x, Cloudera Manager CM 5.1.1, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Chef, Redhat/Centos 6.5, Control-M.
Hadoop Administrator
Confidential, - Groveport, OH
Responsibilities:
- Providing Administration, management and support for large scale Big Data platforms on Hadoop eco-system.
- Built lab, dev and prod environments from scratch, worked with network, AD and Unix team for preparing the cluster to build Hadoop platform for project.
- Involved in Cluster Capacity planning, deployment and Managing Hadoop for our data platform operations with a group of Hadoop architects and stakeholders.
- Worked on various POC's while working for the production deployment with Cloudera.
- Expertise knowledge in Unix administration in perspective of Hadoop ecosystems.
- Implemented Kerberos using MIT KDC (POC) and also Kerberos using Active Directory as KDC integrated with Centrify (First Implemented in POC and later for DEV/UAT and PROD Environments.
- Upgraded DEV and UAT cluster from Cloudera Manager and CDH packages from 5.7.1 to 5.7.2 with Kerberos.
- Working with off-shore and on-shore application teams for creating resource pools and resource allocation.
- Setup monitoring for the Cloudera cluster which includes configuring alert thresholds and SNMP settings to push alert notifications to service mail boxes.
- Setting up Name-node HA using QJM and Resource Manager HA for the Cloudera cluster.
- Monitoring cluster job performance and killing long running job.
- Experience in setting up deploying Hadoop cluster in DMZ (Demilitarized Zone) which is isolated network zone with custom data access policies for the sources/clients trying to access the Hadoop cluster. Experience in designing Data Flow Diagrams and User Access groups.
- Developed Ansible scripts to check the service status, PID status, pushing file to target servers, presently working on developing Ansible scripts for managing cluster through Cloudera REST API. (Developed and deployed an automated cluster using Cloudera Director)
- YARN cluster tuning and configured HIVE on spark.
- Scheduled cron jobs for taking cold-backups of back-end databases used in the cluster deployment (MySQL)
- Experience in setting up Kafka cluster for publishing topics and familiar with lambda architecture.
- Use Cloud era's BDR to implement backup and recovery strategy.
- Configuring and setting up user and service accounts, ACL, creating Sentry policies, running Hive QL DDL for creating tables in production.
- Experience in creating MOP (Method of process) documents before making any changes to the prod environment.
- Benchmarking cluster using DFSIO, Terasort and Teragen utilities provided by Cloudera.
- Setting up local repositories for CDH and Cloudera Manager upgrades.
- Enabled Rack-configuration for the cluster and deployed services according to the rack configuration.
- Adding new nodes to the cluster and schedule cluster rebalancing after adding new nodes and also decommissioning nodes from the cluster for maintenance purpose.
- Configured cluster to tolerate multiple data disks failures and raise tickets with data center folks as necessary for hardware failures and replacement
- Experience in following ITIL procedures for change, incident and task management.
- Creating dynamic resource pools for managing the cluster resources among various application teams.
- Setting up file hierarchy in HDFS File structure for storing application jars, staging data and final data. Discussing data and log retention policies for the cluster.
Environment: CDH 5.7.X, Cassandra, Kafka, Cloudera Manager, Bash, Python, Oozie, Hue, Splunk, HDFS, Zookeeper, YARN, Appworx, Hive, Spark, Dell Hardware, Upgrades and patching, Rack configuration, Centrify, Attunity CDC.
Hadoop Administrator
Confidential - Arlington, TX
Responsibilities:
- Monitored workload job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root cause and recommended course of actions.
- Imported logs from web servers with Flume to ingest data into HDFS.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Effectively used Sqoop to transfer data between databases and HDFS.
- Worked on streaming data into HDFS from web servers using Flume.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Involved in Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- Automated workflows using shell scripts pull data from various databases into Hadoop
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java Map-Reduce, Hive and Sqoop as well as system specific jobs.
- Experienced with different kind of compression techniques like LZO, GZIP and Snappy.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Provisioning, building and support Linux servers both Physical and Virtual using VMware for Production, QA and Developers environment.
Environment: Cloudera Manager, Hadoop HDFS, MapReduce, HBase, Hive, Pig, Flume, Oozie, Sqoop, Shell Scripting, Hue, MySQL, Mango DB, AWS EC2, ETL, Bash, Service Now, JIRA.
Hadoop Administrator
Confidential - Richardson, TX
Responsibilities:
- Currently working as Hadoop administrator in Hortonworks distribution for 3 clusters ranges from POC clusters to PROD clusters.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Working knowledge on hive and its components and troubleshooting if any issues with hive.
- Responsible for cluster availability and experienced on ON-call support.
- Experienced in Setting up the project and volume setups for the new hadoop projects.
- Involved in snapshots and HDFS data backup to maintain the backup of cluster data and even remotely.
- Implementing the SFTP for the projects to transfer data from External servers to hadoop servers.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Experienced in managing and reviewing Hadoop log files.
- Good knowledge on maintain mysql databases creation and setting up the users and maintain the backup of databases.
- Extensive knowledge on HDFS and YARN implementation in cluster for better performance.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new hadoop environments and expand existing hadoop clusters.
- Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job.
Environment: Hortonworks 2.1.7, Hive, pig, Sqoop, Flume, Zookeeper and HBase, MYSQL Shell Scripting, Redhat Linix
Linux Administrator
Confidential
Responsibilities:
- Administration of RHEL 5/6 which includes installation, testing, tuning, upgrading and loading patches, troubleshooting server issues.
- Configure and automate the deployment of Linux and VMware infrastructure through our existing Kick start infrastructure.
- Configure Linux guests in a VMware ESX environment.
- Understand server virtualization technology such as VMware.
- Worked on Cisco USC, virtual infra on VMware, Storage migration and installations.
- Installing, configuring, custom building Oracle10g and preparing servers for database installation which includes adding kernel parameters, software installation, permissions etc.
- Implemented multi-tier application provisioning in Open Stack cloud, integrating it with Puppet. Involved in integrated Vsphere hypervisor with Open Stack.
- Configure and maintained FTP, DNS, NFS and DHCP servers. Configuring, maintaining and troubleshooting of local development servers.
- Performed configuration of standard Linux and network protocols, such as SMTP, DHCP, DNS, LDAP, NFS, SMTP, HTTP, SNMP and others.
- Managing corn jobs, batch processing and job scheduling.
- Worked on planning for the recovery of critical IT systems and services in a fallback situation following a disaster that overwhelms the resilience arrangements.
- Monitoring System Activities like CPU, Memory, Disk and Swap space usage to avoid any performance issues.
- Tuning the Kernel parameters for the better performance of applications like Oracle.
- Provided 24X7 on-calls production and customer support including trouble shooting problems.
Environment: LINUX, FTP, Shell, UNIX, VMware, NFS, TCP/IP, Oracle Red Hat Linux.
TECHNICAL SKILLS
Big Data Technologies HDFS, YARN, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume, Mongo-DB, Cassandra: BigData Distributions: Cloudera, Hortonworks, Apache Hadoop
Installation Ansible, GitLab: Batch scheduling tool: Control-M
Scripting Languages Shell, Bash: Monitoring tools: Grafana, Ganglia, Nagios, Ambari, Jenkins, Navigator, Netcool
Reporting Tools Service-Now, Tableau, Jasper soft: Programming Languages: SQL, PL/SQL, Java, Chef, Puppet
Application Servers Apache Tomcat, Web Logic Server, WebSphere, JBoss: Databases: Oracle9.x, 10g, 11g, MySQL Server, DB2, Hbase
.
