- Around 6 years of IT experience in Administering, Installing, Configuring and Maintaining Hadoop and Linux clusters
- Around 4 yrs. of experience working on Hadoop eco - system
- Experience in configuring, installing and managing both Cloudera and Hortonworks Hadoop Distributions.
- Experience in loading data into Oracle Database 12c using Orcale BDA.
- Extensive Experience in understanding the client’s Big Data business requirements and transform it into Hadoop centric technologies
- Analyzing the clients existing Hadoop infrastructure and understanding the performance Bottlenecks and provide the performance tuning accordingly
- Experience with Installing Hadoop in new servers and rebuild existing servers
- Experience in Installing Hadoop patches, major and minor version upgrades of Cloudera distributions Platform
- Experience in using Automation tools like Puppet for installing, configuring and maintaining Hadoop clusters
- Experience in using Cloudera Manager for installation and management of Hadoop Cluster
- Experience in setting up Backup Disaster recovery ( BDR) in Cloudera distribution.
- Expertise in writing Shell scripts and Perl scripts and debugging existing scripts
- Experience in setting up automated 24x7 on monitoring and escalation infrastructure for Hadoop Cluster using Nagios and Ganglia
- Experience in Performance Management of Hadoop Cluster
- Experience in using Flume to load log files into HDFS
- Expertise in using Oozie for configuring job flows
- Managing the configuration of the cluster to meet the needs of data analysis whether I/O bound or CPU bound
- Developed Hive Queries and automated those queries for analyzing on Hourly, Daily and Weekly Basis
- Data loading to hadoop and hive using sqoop from MySQL, oracle and db2
- Experience in importing and exporting the preprocessed data into the commercial
- Analytic database, e.g. RDBMS
- Providing training to users to make Hadoop usability simple and updating them for best practices.
- Implemented both Fair scheduling and capacity scheduling on clusters with MapReduce and Yarn
- Hands on experience in PIG Latin scripts to transform, cleanse and parse data in HDFS
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS
- Performed benchmarking and analysis using Test DFSIO and Terasort.
Hadoop Ecosystem Components: HDFS, MapReduce, YARN, Pig, Hive, Oozie, HBase, Sqoop, Flume, Zookeeper, Spark, Solr, Impala, Hue
Languages and Technologies: Core Java, C, C++.
Operating Systems: Windows, Linux & Unix
Scripting Languages: Bash, Perl, Python and Shell scripting
Other: Puppet, Chef, Nagios, Ganglia, MySql
Sr Hadoop Administrator
Confidential, El Segundo, CA
Environment: MapReduce, YARN, Pig, Hive, Oozie, HBase, Sqoop, Flume, Kafka, NIfi, Zookeeper, Spark, Solr, Impala, Hue, Orcale BDA, Cloudera Manager and Hortonworks
- Managed multiple Hadoop clusters with the highest capacity of 1500TB (1.4) PB with Kerberos Enabled.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Responsible for architecting Hadoop clusters with Cloudera CDH4, CDH5 and Hortonworks distribution platform HDP 2.2.8 and HDP 2.5.3.
- Installed Cloudera distribution of hadoop on both physical and as well cloud based servers(AWS cloud based servers).
- Installed Oracle BDA Cloudera 4.0 from scratch for use case that involves unstructured data.
- Responsible for Performance tuning of Oracle BDA cluster and setting up Oracle Nosql Databases for the same.
- Experience in Installation and configuration Cloudera CDH4, Name Node, Secondary Name Node, Job Tracker, Task Trackers and DataNodes.
- Upgraded Production Hadoop cluster CDH 4.x to CDH 5.x alongside of OS upgrade from RHEL 6.2 to RHEL 6.4 with security (Kerberos) Enabled
- Responsible for on-boarding new users to the hadoop cluster (adding user a home directory and providing access to the datasets).
- Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
- Experience in writing the automatic scripts for monitoring the file systems.
- Responsible for giving presentations about new ecosystems to be implemented in the cluster with the teams and managers.
- Applying patches for cluster (eg. HIVE-1975).
- Adding new Data Nodes when needed and running balancer.
- Responsible for building scalable distributed data solutions using Hadoop.
- Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Done major and minor upgrades to the hadoop cluster.
- Upgraded the Cloudera hadoop ecosystems in the cluster using Cloudera distribution packages.
- Done stress and performance testing, benchmark for the cluster.
- Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
- Debug and solve the major issues with the help of Cloudera Support team.
- Supported 100+ business users to use hadoop platform and resolving tickets and issues they run into and helping them to use best ways to achieve their results
- Installed several projects on hadoop servers and configured each project to run jobs and scripts successfully.
- Configured file-cleanup utilities for cluster maintanence.
- Setting up Flume agents in order to handle multiple streaming data ingestion via setup box
- Enabled HA for Namenode using quorum to avoid single point of failure
- Developed scripts to monitor automation jobs and processes required for hadoop and setup mail service in case of failure.
- Developed scripts and automated data management from end to end and updating b/w all the clusters
- Installed and configured HBase on top of Hadoop
- Implemented dynamic resource pooling for allocating resources according to the users requirement.
- Build Kafka cluster for specific use cases and also worked on TSL/SSL configurations, topic creations and ACL definitions.
Confidential, Hillsboro OR
Environment: Hive, pig, MapReduce, and Cloudera Manager
- Installed Hadoop on clustered environment
- Upgraded the cluster from CDH2 to CDH3
- The tasks were first performed on the staging platform, Before doing it on production cluster
- Implemented Ganglia to Monitor the tool
- Experience in file system Management
- Provided day to day production support of our Hadoop infrastructure including new hardware
- Infrastructure and application installation.
- Managed backups for key data stores
- Supported configuring, sizing, tuning and monitoring analytic clusters
- Implemented security and regulatory compliance measures
- Streamlined cluster scaling and configuration
- Monitoring cluster job performance and involved capacity planning
- Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
- Documented technical designs and procedures
Confidential, St Louis, MO
- Installing and maintaining the Linux servers.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers
- Verify the peripherals are working properly
- Quickly arrange repair for hardware in occasion of hardware failure
- Update system as soon as new version of OS and application software comes out
- Setup securities for users and groups and firewall intrusion detection systems
- Creating new users, Resetting user passwords, Lock/Unlock user accounts
- Monitoring System Metrics and logs for any problems
- Running cron-tab to back up data.
- Involved in Adding, removing, or updating user account information, resetting passwords etc
- Maintaining the RDBMS server and Authentication to required users for databases
- Provided technical support for Level I-III issues via helpdesk and the telephone
- Took Backup Confidential regular intervals and planned with a good disaster recovery plan
- Documented and maintain server, network, and support documentation.
- Installation and administration of RHEL 4.0/5.0 and SUSE 10.x.
- Configured kickstart server and updating/applying patches to the servers using Red hat Satellite server.
- Remote system administration using tools like SSH, Telnet, and Rlogin.
- Planning and implementing system upgrades including hardware, operating system and periodical patch upgrades. Automation of jobs through crontab and AutoSys.
- Installation of packages, patch management, volume management on Suse servers using YaST.
- Applied appropriate support packages/patches to maintain system integrity.
- Performed capacity analysis, monitored and controlled disk space usage on systems.
- Monitored system activities and fine-tuned system parameters and configurations to optimize performance and
- Ensure security of systems.
- Adding servers to domain and managing the groups and user in AD, installing and configuring send mail.
- Responsible for maintenance of development tools and utilities and to maintain shell, Perl automation Scripts.
- Worked with project manager and auditing teams to implement PCI compliance.
- Integrating Web logic 10.x and Apache 2.x and deploying EAR, WAR files in Web logic Application servers.
- Designing Firewall rules for new servers to enable communication with application, Oracle 10g servers.
- Fine tuning of Servers and configuring networks for optimum performance.
- Setting up network environments using TCP/IP, NIS, NFS, DNS, SNMP agents, DHCP and Proxy.