- Around 8 years of experience in Information Technology along with 5 years of experience in Hortonworks/Cloudera administration activities like cluster Builds/Upgrades, Configuration management, POCs, Installation and maintenance of Hadoop ecosystems including Cloudera Manager, Ambari, HDFS, YARN, Spark, Machine Learning, Hive, Hbase (NoSQL), Hue, Impala, Kafka, MapReduce, Zookeeper, Oozie, Solr, Sqoop, Flume, Pig, Chef, Puppet, Knox and Cloudera Navigator, Metadata (MySQL) backup and recovery, job scheduling and maintenance, code and data migration, debugging, Troubleshooting (Connectivity/Alerts/System/Data Issues), Performance tuning, Backup and recovery (BDR), monitoring a Hadoop System using Nagios, Ganglia, Machine Learning, Python Scripting, and Security setup and configuration includes Kerberos, Sentry and LDAP .
- Good working experience in Amazon Web Services (AWS) provisioning/Services and in depth knowledge of application deployment and data migration on AWS and expertise in monitoring, logging and cost management tools that integrate with AWS. Developed Cloud formation scripts for AWS Orchestration, Chef and Puppet.
- Experience in the successful implementation of ETL solutions on data extraction, transformation and load in Sqoop, Hive, Pig, Spark and HBase (NoSQL Database).
- Designed and Documented Big Data Best Practices and Standards, EDL (Enterprise Data Lake) Overview, Step by Step Instructions on Cluster setup/upgrade/Adding/Decommission Nodes, Onboarding Process, Security Design Model, Failure Scenarios and Remedy, PROD to COB Discrepancies, and EDL Migration.
- As a Hadoop/Linux administrator have experience in working with several Databases like Oracle, MySQL, NoSQL, HBase, MongoDB, Casandra.
- In - depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experience in installation and configuration of Hadoop ecosystems like HDFS, Hive, Yarn, HBase, Sqoop, Flume, Oozie, Pig, and Spark.
- Involved in the Deployment, Up-gradation, Configuration Apache Storm and Spark clusters using Ansible playbooks.
- Experienced in installation, configuration, supporting and monitoring 200+ node Hadoop cluster using Cloudera manager and Hortonworks distributions.
- Experience in Adding and removing the nodes in Hadoop Cluster.
- As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup& Recovery strategies.
- Hands-on experience in configuration and management of security for Hadoop cluster using Kerberos.
- Experience in setting up and managing the High-Availability to avoid a single point of failure on large Hadoop Clusters.
- Knowledgeable of spark and Scala mainly in framework exploration for the transition from Hadoop/Map Reduce to spark.
- Working with applications teams to install the operating system, Hadoop updates, patches, version upgrades as required.
- Good Knowledge in Amazon AWS concepts like EMR, S3, and EC2 web services which provide fast and efficient processing of Hadoop.
- Experience in writing Shell scripts for various purposes like file validation, automation and job scheduling using Crontab.
Big Data Technologies: Hortonworks, HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Apache NiFi, Scoop, Zookeeper, Mahout, Flume, Oozie, Avro, HBase, MapReduce, HDFS, Storm, Cloudera.
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia
Application Servers: Apache Tomcat, WebLogic Server, Web Sphere
Security: Kerberos, Knox.
Reporting Tools: Cognos, Hyperion Analyzer, OBIEE & BI+
Analytic Tools: Elastic search-Logstash-Kibana
Hadoop Operations Administrator
Confidential, Boston, MA
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters.
- Worked as admin on Cloudera (CDH 5.5.2) distribution for clusters ranges from POC to PROD.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Deployed Cloudera Platform on AWS using the Ansible playbook.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
- Level 2, 3 SME for current Big Data Clusters at the Client Site and set up standard troubleshooting technique.
- Created and maintained user's accounts, profiles, security, rights disk space and process monitoring.
- Implemented and Configured High Availability Hadoop Cluster (Quorum Based) for HDFS, IMPALA and SOLR.
- Extensively worked on Impala to compare processing time of Impala with Apache Hive for batch applications to implement the former in project. Extensively Used Impala to read, write and query the Hadoop data in HDFS.
- Prepared adhoc phoenix queries on Hbase.
- Created secondary index tables using phoenix on HBase tables
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Performed Disk Space management to the users and groups in the cluster.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory loading the data from local system to HDFS.
- Experience in Chef, Puppet or related tools for configuration management.
- Retrieved data from HDFS into relational databases with Sqoop.
- Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Regular disk management like adding/replacing hard drives on existing servers/workstations, partitioning according to requirements, creating new file systems or growing existing one over the hard drives and managing file systems.
- Creating collections and configurations, Register a Lily HBase Indexer configuration with the Lily HBase Indexer Service.
- Worked with Phoenix, a SQL layer on top of HBase to provide SQL interface on top of No-SQL database.
- Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
- Configuring, Managing permissions for the users in hue.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: HDFS, Map Reduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, CDH5, Apache Hadoop 2.6, Spark, SOLR, Storm, Knox, Cloudera Manager, Python, Red Hat, MySQL and Oracle..
Confidential, Charlotte, NC
- Responsible for cluster maintenance, monitoring, commissioning and decommissioning Data nodes, Troubleshooting, manage and review backups, manage and review log files.
- Involved in cluster capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop cluster.
- Worked with the technical architect in upgrading and increasing the size of Hadoop cluster.
- Coordinate with different teams with user issues and resolved it.
- Monitored the Hadoop cluster with Ambari GUI to ensure the health of Hadoop services in Hadoop cluster
- Connected with Hortonworks support team for resolving issues as well as preferred recommendations.
- Day to day responsibilities includes solving developer issues, deployments, moving code from one environment to another environment, providing access to the new user, providing instant solutions for reducing the impact and documenting the same and preventing future issues.
- Planned and prepared the use case for new Hadoop services and tested on sandbox by adding/installing using Ambari manager.
- Working experience in designing and implementing complete end-to-end Hadoop Infrastructure which includes all Hadoop Ecosystem.
- Upgraded HDP from 2.2 to HDP 2.4.2.
- Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS.
- Experience in providing support to data analyst in running Pig and Hive queries.
- Configured Capacity scheduler with various queues and priority for Hadoop
- Set up and manage High Availability NameNode, Resource Manager, Hive Metastore, and Oozie to avoid a single point of failures in large clusters.
- Designed and allocated HDFS quotas for multiple groups.
- Created HIVE databases and granted appropriate permissions through Ranger policies.
- Introduced Smart Sense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Moving the data from Teradata into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Used Oozie scheduler to automate the pipeline workflow and orchestrate the sqoop, hive and pig jobs that extract the data on a timely manner.
- Written complex Hive and SQL queries for data analysis to meet business requirements.
- Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
- Development operations using GIT, Puppet, its modules configuration, upload to master server and implement on client servers.
Environment: HDFS, Map Reduce, Hortonworks, Hive, Pig, Flume, Oozie, Sqoop, Ambari, and Linux.
Confidential, Los Angeles, CA
- Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments.
- Installed and configured Hadoop, Hive and Pig on Amazon EC2 servers
- Upgraded the cluster from CDH4 to CDH5 the tasks were first performed on the staging platform, before doing it on production cluster.
- Enabled Kerberos and AD security on the Cloudera cluster running CDH 5.4.4.
- Implemented Sentry for the Dev Cluster
- Configured MySQL Database to store Hive metadata.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Worked with Linux systems and MySQL database on a regular basis.
- Supported Map Reduce Programs those ran on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- As a admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster.
- Managed backups for key data stores
- Supported configuring, sizing, tuning and monitoring analytic clusters
- Implemented security and regulatory compliance measures
- Streamlined cluster scaling and configuration
- Monitoring cluster job performance and involved capacity planning
- Works with application teams to install operating system and Hadoop updates, patches,
- Version upgrades as required.
- Documented technical designs and procedures
Environment: HDFS, Hive, Pig, sentry, Kerberos, LDAP, YARN, Cloudera Manager, and Ambari.
Linux Hadoop Administrator
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- In depth understanding of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Resource Manager, Node Manager and YARN / Map Reduce programming paradigm.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Extensively worked on commissioning and decommissioning of cluster nodes, replacing failed disks, file system integrity checks and maintaining cluster data replication.
- Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
- Setting up HDFS Quotas to enforce the fair share of computing resources.
- Strong Knowledge in Configuring and maintaining YARN Schedulers (Fair, and Capacity).
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Experience in setting up HBase cluster which includes master and region server configuration, High availability configuration, performance tuning and administration.
- Created user accounts and given users the access to the Hadoop cluster.
- Involved in loading data from UNIX file system to HDFS.
- Worked on ETL process and handled importing data from various data sources, performed transformations.
- Coordinate with QA team during testing phase.
- Provide application support to production support team.
Environment: Cloudera, HDFS, Hive, Sqoop, Zookeeper and HBase, UNIX Linux Java, HDFS Map Reduce, Pig Hive HBase Flume Sqoop, Shell Scripting.
Linux System Administrator
- Installed RedHat Enterprise Linux (RHEL 6) on production servers.
- Provided Support to Production Servers.
- Updated firmware on Servers, Installed patches and packages for security vulnerabilities for Linux.
- Monitored system resources, like network, logs, disk usage etc.
- User account creation and account maintenance both local and centralized (LDAP - Sun Identity Manager).
- Performed all duties related to system administration like troubleshooting, providing sudo access, modifying DNS entries, NFS, backup recovery (scripts).
- Setup password less login using ssh public - private key.
- Setting up cron jobs for the application owners to deploy scripts on production servers.
- Performed check out for the sanity of the file systems and volume groups.
- Developed scripts for internal use for automation of some regular jobs using shell scripting.
- Completed Work Requests raised by customer/team and following up with them.
- Worked on Change Request raised by customer/team and follow up.
- Did Root Cause Analysis on Problem Tickets and frequently occurring incidents.
- Raised Case with vendors if any software or hardware needs to be updated/replaced/repaired.
- Raised Case with RedHat and follow up them as and when required.
- Engaged different team’s member when ticket requires multiple team support.
- Effectively and efficiently monitored SDM / Remedy queues so that no SLA Breach should happen.
- Worked in a 24X7 on call rotation to support critical production environments.
Environment: RedHat LINUX Release 5.x, 6.x,SUSE LINUX v 10.1, 11, OpenBSD, TCP/IP Wrapper, SSH, SCP, RSYNC, Service Desk Manager, BMC Remedy, Hostinfo, Apache Web Server, Samba Server, Iptables, FTP, DHCP, DNS, NFS, RPM, YUM, LDAP, Auto FS, LAN, WAN,KVM, RedHat Ent Virtualization, Xen, VMware.