- Highly Skilled Hadoop Administrator with 12+ years of overall experience in IT industry with 4+ years of experience in Hadoop administration and big data technologies, with expertise in Linux/System administration.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Cloudera distribution (CDH 5.X), Hortonworks (HDP 2.X) both on bare metal and cloud (AWS & Rackspace).
- Supported and maintained Hadoop ecosystem components like Hadoop HDFS, YARN, MapReduce, HBase, Oozie, Hive, Sqoop, Pig, Flume, KTS, KMS, Sentry, SmartSense, Storm, Kafka, Ranger, Falcon and Knox
- Experienced in Hadoop Cluster capacity planning, performance tuning, Monitoring, Troubleshooting
- Experienced in setting up High Availability (HA) for diverse services in the ecosystem
- Experienced in setting up backup and recovery policies to ensure the High Availability (HA) of clusters
- Strong understanding of Hadoop security concepts and implementations - Authentication (Kerberos/AD/LDAP/KDC), Authorization (Sentry/Ranger) and Encryption (KTS/KMS)
- Monitored Workload job performance and capacity planning using Cloudera Manager/ Ambari
- Involved in enforcing standards and guidelines for the big data platform during production move
- Responsible for onboarding new tools/technologies onto the platform
- Experience in designing and developing data ingestion using apache NiFi, Apache Camel, Spark Streaming, Kafka, Flume, Sqoop and Shell Script
- Experienced in analyzing Log files for Hadoop ecosystem services & analyze cause using Splunk
- Experienced with various Tools like Clarify, JIRA, ServiceNow, New Relic, Splunk, Serena Team Track and Test Management Tools like Rational suite of Products, MS Test Manager, HP ALM
- Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Proficient in OS upgrades and Patch loading as and when required.
- Expert in setting up SSH, SCP and VSFTP connectivity between UNIX hosts.
- Closely work with different team such as Application team, networking, Linux system administrators and Enterprise service monitoring team
- Excellent Communication, Analytical, Interpersonal, Presentation and Leadership Skills
- Excellent in problem resolution and Root Cause Analysis & techniques
- Good team player with excellent communication & Client interface skills.
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Hue, Knox, NiFi
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS,Cloudera Navigator
Cloud: AWS, EC2, S3, ELB, VPC, EMR
Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic
Operating Systems: Windows, Linux, Unix, Mac
Databases: Oracle (9i, 10g), MS SQL Server, MySQL
Tools: MS Office, Cygwin, SVN, Eclipse
Confidential, Westborough, MA
- Maintained and supported Cloudera Hadoop clusters in production, development & sandbox environments
- Experienced in installation, upgradation and managing Cloudera distribution Hadoop cluster
- Managed and reviewed Hadoop and other ecosystem log files using Splunk
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, YARN, HBase, Flume, Sqoop, Sentry, Oozie, Pig, Hive, Kafka
- Installed and configured multiple versions spark services (Spark 1.6 and 2.0) used by different use cases
- Worked along with developers in tuning the spark jobs for optimal configuration
- Involved in resolving and troubleshooting cluster related issues
- Installed new services and tools using cloudera packages as well parcels
- Implemented authentication using Kerberos with AD/LDAP on Cloudera Hadoop Cluster
- Implemented authorization and enforced security policies using Apache Sentry
- Implemented encryption for data at rest using KTS/KMS and Navencrypt
- Implemented log level masking of sensitive data using redaction
- Developed POC to test Apache NiFi fits to import data for QFRM project
- Involved in upgradation of Cloudera Manager and CDH
- Resolved node failure issues and troubleshooting of common Hadoop cluster issues
- Derived operational insights across the cluster for identifying anomalies and health check-up
- Configured alert mechanism using SNMP traps for Cloudera Hadoop distribution
- Involved in onboarding new users and use cases onto the CDH platform
- Involved in deploying use cases onto production environment
- Worked with development teams to triage issues and implement fixes on Hadoop environment and associated applications
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Governed data on the CDH cluster using Navigator for continuous optimization, audit, metadata management and policy enforcement
- Provisioned various HDInsights clusters for differentiated workloads (Spark, H20, R, HBase on Azure portal)
- HDInsights + H2O Cluster provisioned for DataScience prediction modeling.
- Customized DataScience based HDInsights cluster built with all requisite python libraries
- Created HDInsight cluster with different storages - WASB (Blob Store) and ADLS (Azure Datalake Store)
- Explored template based HDInsights Cluster deployments for commission and decommissioning.
- Explored runbook based HDInsights cluster deployments for automated commissioning and decommissioning.
- Implemented encryption on Azure WASB storage (Data at Rest)
Environment: CDH 5.x, Azure HDInsights 3.5/3.6, Azure H2O, Spark, Map Reduce, Hive, Pig, Zookeeper, Nifi, HBase, Flume, Sqoop, Kerberos, Sentry, Cent OS
Confidential, Beaverton, OR
- Responsible for support, implementation and ongoing administration of Hadoop infrastructure on HDP platform
- Co-ordinate with the use case teams for new tools and services for deployment on the HDP platform
- Involved in setting up kerberos security for authentication
- Involved in setting up and configuring apache ranger for authorization
- Cluster maintenance as well as creation and removal of nodes using Ambari
- Performance tuning of Hadoop clusters and jobs related to Hive and Spark
- Monitor Hadoop cluster connectivity and security
- Collaborated with application teams like LSA and Networking teams to install operating system and Hadoop updates, patches, version upgrades when required
- Involved in upgrading of Ambari and HDP
- Involved in onboarding of new users and use cases to the HDP Platform
- Involved in enabling HA on various services on HDP
- Planning on requirements for migrating users to production beforehand to avoid last minute access issues
- Planning and implementation of data migration from existing development to production cluster
- Installed and configured Hadoop ecosystem components like PySpark, Hive, Sqoop, ZooKeeper, Oozie, Ranger, Falcon
- Prepared multi-cluster test harness to exercise the system for performance, failover and upgrades
- Involved in migrating data from production to development clusters using distcp
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time
- Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
Environment: HDP 2.5, HDFS, Mapreduce, Yarn, Spark, Hive, Pig, Flume, Oozie, Sqoop, Ambari.
Confidential, Dallas, TX
- Currently working as admin in Hortonworks (HDP) distribution for 4 clusters ranges from POC to PROD.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Experienced on adding/installation of new components and removal of them through Ambari.
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business.
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures.
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
- Changing the configurations based on the requirements of the users for the better performance of the jobs.
- Experienced in Ambary-alerts configuration for various components and managing the alerts.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Good troubleshooting skills on Hue, which provides GUI for developer's/business users for day to day activities.
- Experience in Ranger, Knox configuration to provide the security for Hadoop services.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented Name Node HA in all environments to provide high availability of clusters.
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Experienced in Setting up the project and volume setups for the new projects.
- Involved in snapshots and mirroring to maintain the backup of cluster data and even remotely.
- Implementing the SFTP for the projects to transfer data from External servers to servers.
- Experienced in managing and reviewing log files.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
- Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Monitored multiple clusters environments using AMBRI Alerts, Metrics and Nagios.
Environment: Hadoop Hdfs, Mapreduce, Hive, Scala, Pig, Flume, Nagios, Ranger, Knox, Oozie, Sqoop, Eclipse, Hortonworks, Ambari
- Installed (racked), loaded Red Hat Enterprise Linux operating system (KickStart) & maintained HP & IBM& Dell servers.
- Helped run long term Hadoop MapReduce workload batches.
- Helped automate Hadoop batch with shell scripts and scheduled them using Cron utility.
- Changing permissions, ownership and groups of file/folders.
- Managing disk space using Logical Volume Management(LVM) and monitoring system performance of virtual memory, managing swap space, Disk utilization and CPU utilization. Monitoring system performance using Nagios.
- Generating Monthly Performance Reports and updating of procedural & process documents.
- Experience in System Builds, Server builds Installs, Upgrades, Patches, Migration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning Systems.
- Experience in deploying virtual machines using templates and cloning, taking backup with a snapshot. Moving VM's and data stores using vMotion and storage vMotion in VMware environment.
- Deploy virtual machines using templates and cloning.
- Setup, configure, and maintain UNIX and Linux servers, RAID subsystems, and desktop/laptop machines including installation/maintenance of all operating system and application software.
- Install, configure, and maintain Ethernet hubs/switches/cables, new machines, hard drive, memory, and network interface cards.
- Manage software licenses, monitor network performance and application usage, and make software purchases
- Provide user support including troubleshooting, repairs, and documentation and developing web site for support.
- Configured and upgraded large disk volumes (SAS/SATA)
- Plan and execute network security and emergency contingency programs.
- Responsible for meeting with client and gathering business requirements for projects.
- Create, Configure and manage standard Virtual switches.
- Creating and configuring swatches port groups and NIC Teaming.
- Knowledge on Resource Handling, Memory Management techniques, Fault Tolerance and Update Manager.
Environment: Redhat Linux 5.X, HP & Dell Servers, Oracle, Hadoop open source, VMWare ESX 4.x,VMware VSphere, ESX, Bash, Shell Scripting, Nagios.
- Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Creating, cloning Linux Virtual Machines.
- Installing Red Hat Linux using kick start and applying security polices for hardening the server based on the company policies.
- RPM and YUM package installations, patch and other server management.
- Managing systems routine backup, scheduling jobs like disabling and enabling cronjobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
- Tech and non-tech refresh of Linux servers, which includes new hardware, OS, upgrade, application installation, testing.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, and user and group quota.
- Installing MySQLDB in Linux and Customize the MySQL DB parameters.
- Working with Service Now incident tool.
- Creating physical volumes, volume groups and logical volumes.
- Samba Server configuration with Samba Clients.
- Knowledge of IP tables, SELINUX.
- Modified existing Linux file systems to a Standard EXT3.
- Configuration and administration of NFS FTP, SAMBA, NIS.
- Maintenance of DNS, DHCP and APACHE services on Linux machines.
- Installing and configuring Apache and supporting them on Linux production servers.
Environment: Red-Hat Linux Enterprise servers (HP Proliant DL 585, BL ML Series, SAN (Netapp), VERITAS Cluster Server 5.0, Windows 2003 server, Shell programming.