- 8 years of IT experience in which 3 years of experience in Hadoop Administration, Big Data Eco system and 5 plus years of experience in Linux Administration.
- Deployed Hadoop Cluster in Standalone, Pseudo - distributed and Fully Distributed modes.
- Expertise in Installing, Updating Hadoop and its related components in multi-node Cluster environment.
- Experience in installation and configuration of Hadoop ecosystem like Hdfs, Hive, Yarn, HBase,
- Sqoop, Flume, Oozie, Pig, Impala, Spark, Kafka in cluster, monitoring and troubleshooting them using Cloudera manager and Ambari .
- Expertise in Managing, Monitoring and Administration of Hadoop for Multi Hundred Node Cluster with different distributions like Cloudera CDH and Horton Works HDP.
- Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HBASE, ZOOKEEPER) using Cloudera Manager and Amabri.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
- Experience in configuring Zookeeper to provide High Availability and Cluster service co-ordination.
- Troubleshooting Cluster issues, Backup and Disaster Recovery for Hadoop, Root Cause Analysis preparing run books.
- Experience in Importing and Exporting Data between different Database Tables like MySQL, Oracle Teradata and HDFS using Sqoop.
- Worked with Flume for collecting the logs from log collector into HDFS.
- Worked with Puppet for automated deployments.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
- Supported technical team members for automation, installation and configuration tasks.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing keytab using keytabtools.
- Experience in handling multiple relational databases: MySQL, SQL Server, Teradata.
- Experience with NoSQL databases like HBase, Cassandra and MongoDB.
- Familiar with Agile Methodology (SCRUM) and Software Testing.
- Assisted in Loaded data from various data sources into Hadoop HDFS/Hive Tables.
- Expertise in Installing, Configuration and Managing RedHat Linux 4, 5 & Centos Linux 6.5.
- Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Experience in supporting users to debug their job failures.
- Experience in supporting systems with 24X7 availability and monitoring.
Hadoop Ecosystem: HDFS, Mapreduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zoo Keeper, High Availability.
Hadoop Paradigms: Yarn, Map Reduce, High Availability.
Scripting Languages: Shell, Phyton, Java, HTML Scripting.
Databases: Mysql, SQL Server, Oracle and Teradata.
NoSql Databases: MongoDB, Cassandra and HBase.
Monitoring Tools: Ganglia, Nagios.
Configuration Management: Cloudera Manager, Ambari.
Configuration Management Tools: Puppet, Chef.
Other Relevant Tools: Tableau, JIRA, QC, MS Office Suite.
Operating Systems: Linux RHEL 5.X,6.X/Ubuntu/CentOS, Windows (XP/7/8), MacOS.
Confidential, St. Louis, MO
- Installed and configured various components of Hadoop ecosystem and maintained their integrity.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Commissioned Data Nodes when data grew and decommissioned when the hardware degraded.
- Migrated data across clusters using DISTCP.
- Experience in collecting metrics for Hadoop clusters using Ganglia.
- Experience in creating shell scripts for detecting and alerting problems system.
- Worked with systems engineering team to plan and deploy new hadoop environments and expand existing hadoop clusters.
- Monitored multiple hadoop clusters environments using Ganglia and Nagios.
- Monitored workload, job performance and capacity planning.
- Worked with application teams to install hadoop updates, patches, version upgrades as required.
- Installed and configured Hive, Pig, Sqoop and Oozie on the 2.0 cluster.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Namenode utilizing zookeeper services.
- Implemented HDFS snapshot feature.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
- Worked with Linux server admin team in administering the server hardware and operating system.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Managed Hadoop clusters: setup, install, monitor, maintain.
- Debugging and troubleshooting the issues in development and Test environments.
- Monitor cluster stability, use tools to gather statistics and improve performance.
- Help to plan for future upgrades and improvements to both processes and infrastructure.
Environment: MapR, Sqoop, Flume, Hive, HQL, Pig, RHEL, Cent OS, Oracle, MS-SQL, Zookeeper, Oozie, MapReduce, Postgresql, Nagios.
Confidential, San Jose, CA
- Installed/Configured/Maintained Apache/Cloudera Hadoop clusters for application development and Hadoop tools like MapReduce, Hive, Pig, HBase, Zookeeper and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Managing and Scheduling Jobs on a Hadoop cluster.
- Deployed Hadoop Cluster in Standalone, Pseudo-distributed and Fully Distributed modes.
- Implemented NameNode backup using NFS. This was done for High availability.
- Involved in taking up the Backup, Recovery and Maintenance.
- Developed MapR Distribution for Apache Hadoop, which speeds up MapReduce jobs with an optimized .
- Shuffle algorithm, direct access to the disk, built-in compression, and code written in Java.
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Involved in commissioning and decommissioning at the time of node failure.
- Implemented Partitioning and Bucketing concepts using Hive.
- Involved in upgrading clusters to Cloudera Distributed Clusters and deployed into CDH4.
- Taken ad-hoc requests to get the non PII(Personally Identifiable Information) from the production cluster to non-production cluster to test out various scenarios .
Environment: Hadoop, MapReduce, Hive, Pig, HBase, Sqoop, Flume, ZooKeeper, Cloudera Distributed Hadoop (CDH4),Cloudera Manager.
Confidential, New York City -NY
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage &review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Experienced on adding/installation of new components and removal of them through ambari.
- Implemented and Configured High Availability Hadoop Cluster(Quorum Based).
- Installed and Configured Hadoop monitoring and Administrating tools: Nagios and Ganglia.
- Back up of data from active cluster to a backup cluster using distcp.
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.
- Hands on experience working on Hadoop ecosystem components like Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig, Flume.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Worked on analyzing Data with HIVE and PIG.
- Helped in setting up Rack topology in the cluster.
- Upgraded the Hadoop cluster from CDH3 to CDH4.
- Deployed a Hadoop cluster using CDH3 integrated with Nagios and Ganglia.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Deployed Network file system for Name Node Metadata backup.
- Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
- Performed both major and minor upgrades to the existing cluster and also rolling back to the previous version.
- Designed the cluster so that only one secondary name node daemon could be run at any given time.
- Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
Environment: Flume, Oozie, Zookeeper, Pig, Hive, Map-Reduce, YARN, Cloudera Manager and Nagios.
- Installation and configuration of Apache and supporting them on Linux production servers.
- Administration of RHEL4.x, 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts, Xen servers.
- Installing RedHat Linux using kick start and applying security polices for hardening the server based on company's policies.
- Installed and verified that all AIX/Linux patches are applied to the servers.
- Installed and administered RedHat using Xen and KVM based hypervisors.
- Maintenance and installation of RPM and YUM package installations and other server.
- Experience in Managing and Scheduling Cron jobs such as enabling system logging, network logging of servers for maintenance, performance tuning and testing.
- Set up user and group login ID, printing parameters, network configuration, password, resolving permissions issues, user and group quota.
- Worked on various applications and improving their performance by performance tuning and analysis.
- Configuring multipath, adding SAN and creating physical volumes, volume groups and logical volumes.
- Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
Environment: Redhat Linux, AIX/Linux, VMware, TCP/IP.
- Installation and configuration of Linux for new build environment.
- Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.l
- Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Experience with Linux internals, virtual machines, and open source tools/platforms.
- Monitoring the System activity, Performance, Resource utilization.
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Performed RPM and YUM package installations, patch and other server management.
- Performed scheduled backup and necessary restoration.
- Configured Domain Name System (DNS) for hostname to IP resolution.
- Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours.
- Developed an automation script for Replication failover, means if somehow database fails in replication process, this script will bring your system up to date within 5 minutes without manual intervention.
- Implementing file sharing on the network by configuring NFS on the system to share essential resources.
- Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
Environment: Linux, TCP/IP, Telnet, Ubuntu.
- Installing and updating packages using YUM.
- Installing and maintaining the Linux servers.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recovery by implementing system and application level backups.
- Performed various configurations which include networking and IPTable, resolving host names and SSH keyless login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Automate administration tasks through the use of scripting and Job Scheduling using CRON.
- Monitoring System Metrics and logs for any problems.
- Running cron-tab to back up data.
- Adding, removing, or updating user account information, resetting passwords, etc.
- Using Java Jdbc to load data into MySQL.
- Maintaining the MySQL server and Authentication to required users for databases.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
Environment: Redhat Linux, VMware, TCP/IP, Linux, Ubuntu.