- Around 14+ years of professional experience in IT industry as a Linux/Hadoop Administrator and Cloud Engineering(AWS) production support of various applications on Red Hat Enterprise Linux, Cloudera, Hortonworks, and MapR distribution of Hadoop environment.
- 5+ years of experience in Configuring, installing, benchmarking and managing Apache Hadoop in various distributions like Cloudera and Hortonworks.
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Cloudera, Hortonworks, and Amazon AWS.
- Experience in Amazon Web Services (AWS) provisioning and good knowledge of AWS services like EC2, ELB, Elastic Container Service, S3, DMS, VPC, Route53, Cloud Watch, IAM.
- Experience in installing, patching, upgrading and configuring Linux based operating systems like CentOS, Ubuntu, RHEL on a large set of clusters.
- Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS, Namenode, Datanode, Job tracker, Task tracker, Node manager, Resource manager and Application master.
- Hands on experience in installing, configuring Hadoop ecosystem components such as MapReduce, spark, HDFS, HBase, Oozie, Hive, Pig, impala, and zookeeper, Yarn, Kafka and Sqoop.
- Experience in improving the Hadoop cluster performance by considering the OS kernel, storage, Networking, Hadoop HDFS and MapReduce by setting appropriate configuration parameters.
- Experience in Installation and configuration, Hadoop Cluster Maintenance, Cluster Monitoring and Troubleshooting.
- Experience in tuning the performance of Hadoop eco system as well as monitoring.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure - KDC server setup, creating and managing the real domain.
- Experience in importing and exporting the data using sqoop from relational database/mainframe to HDFS and vice versa and logs using Flume.
- Experience in configuring Zookeeper to provide high availability and Cluster service co-ordination.
- Strong knowledge in Name Node High Availability and recovery of Name node metadata and data residing in the cluster.
- Strong knowledge and Experience in benchmarking and performing backup/disaster data recovery of Namenode and other important data on cluster.
- Expertise knowledge and hands on experience in importing and exporting of data from different sources using different services like sqoop, flume.
- Experience in configuration and management of security for Hadoop cluster using Kerberos and integration with LDAP/AD at an Enterprise level.
- Experience with scripting languages like shell, python and java script.
- Creating and maintaining user accounts, profiles, security, disk space and process monitoring.
- Experience in Logic Volume manager (LVM), Creating new file systems, mounting file systems and unmounting file systems.
- Experience in securing Hadoop Cluster using sentry.
- Expert in Installing, configuring and maintaining Apache/Tomcat, samba, sendmail, Web Sphere Application Servers.
- Experience in Networking Concepts, DNS, NIS, NFS and DHCP, troubleshooting network problems such as TCP/IP, providing support for users in solving their problems.
- Experience in using automation tools like Puppet and Chef.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
Cloud: AWS - EC2, EMR, S3, lambda, VPC, Subnets, Route53, Cloud Formation templates, troposphere, IAM, Security Groups
Operating Systems: RedHat Linux 5.X, 6.X, Windows 95, 98, NT, 2000, Windows Vista, 7, Centos, Ubuntu.
Hadoop Eco Systems: Hive, Pig, Flume, Oozie, Sqoop, Spark, Kafka, Impala, zookeeper, Hcatalog and HBase.
Monitoring Tools: Ganglia, Nagios, Cloud watch.
Database: Oracle(SQL)10g, MySQL, Teradata, PostgreSQL.
Configuration Management Tools: Puppet, Chef
Hadoop Configuration: Cloudera Manager, Ambari Hortonworks, MapR
Confidential, Irving, TX
- Installed and configured Hadoop clusters and Eco-system components like spark, Hive, Scala, Yarn, Map Reduce and HBase.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning and slots configuration.
- Trouble shooting many cloud related issues such as Data Node down, Network failure and data block missing.
- Worked with various AWS Components such as EC2, S3, IAM, VPC, RDS, Route 53, SQS, SNS.
- Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
- Expertise in performing tuning of Spark Applications for setting right Batch Interval time.
- Created EC2 instances and managed the volumes using EBS.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Updated cloud formation templates with IAM roles for S3 bucket access, security groups, subnet ID, EC2 Instance Types, Ports and AWS Tags.
- Worked on bitbucket, git and bamboo to deploy EMR clusters.
- Created monitors, alarms and notifications for EC2 hosts using Cloud watch
- Reviewed firewall settings security group and updated on Amazon AWS.
- Worked on Multiple instances, managing the Elastic load balancer, Auto Scaling, setting the security groups to design a fault tolerant and high available system.
- Loading data from large data files into Hive tables and HBase NoSQL databases.
- Created S3 buckets policies, IAM role based policies and customizing the JSON template.
- Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3).
- Involved in creating Hive tables and loading and analyzing data using hive queries. -***Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Developed automated scripts to install Hadoop clusters.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Extensively worked with continuous Integration of application using Jenkins.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in extraction of jobs to Import data into Hadoop file system from oracle traditional systems using Sqoop Import tasks.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Cloudera Manger 5.10, CDH 5.10, AWS, Yarn, Spark, HDFS, Python, Hive, HBase, Oozie, Tableau, Oracle 12c, Linux.
Confidential, New Brunswick, NJ
- Installed and configured various components of Hadoop ecosystem and maintained their integrity on Cloudera.
- Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
- Actively involved on proof of concept for Hadoop cluster in AWS. Used EC2 instances, EBS volumes and S3 for configuring the cluster.
- Involved in migrating the ON PREMISE data to AWS.
- Worked with big data developers, designers and scientists in troubleshooting job failures and issues.
- Exported the analyzed data to the relational databases using Sqoop.
- Monitoring systems and services through Cloudera manager dashboard to make the clusters available for the business.
- Involved in upgrading clusters to Cloudera Distributed versions and deployed into CDH5.
- Extensively working on Spark using Python for testing and development environments.
- Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
- Changing the configurations based on the requirements of the users for the better performance of the jobs.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Extensively involved in configuring the Storm in loading the data from MySQL to HBase.
- Used Impala to write sample queries to test connectivity or work flow.
- Monitored multiple clusters using Cloudera Manager Alerts and Ganglia.
- Setup Flume for different sources to bring the log messages from outside to Hadoop HDFS.
- Automated the data movement between the clusters by using distcp
Environment: Cloudera 5.8, CDH 5.x, AWS, Spark, HDFS, MapReduce, YARN, Pig, Hive, HBase, Flume, Sqoop, Oozie, Oracle 10g, MySQL, Impala, Ganglia, Sentry.
Confidential, Portland, OR
- Expertise on Cluster Planning, Performance tuning, Monitoring and Troubleshooting the Hadoop Cluster.
- Responsible in building a Hortonworks cluster from scratch on HDP 2.x and deployed a Hadoop cluster integrated with Nagios and Ganglia.
- Expertise on cluster audit findings and tuning configuration parameters.
- Expertise in configuring MySQL to store the hive metadata.
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller and Quorum Journal nodes.
- Extensively involved Commissioning and Decommissioning Nodes from time to time.
- Installed and Configured Hadoop monitoring and Administrating tools such as Nagios and Ganglia.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Responsible on adding/installation of new services and removal of them through Ambari.
- Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.
- Responsible in setting log retention policies and setting up of trash interval time period.
- Monitoring the data streaming between web sources and HDFS.
- Involved in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Back up of data from active cluster to a backup cluster using DISTCP.
- Worked on analyzing Data with Hive and PIG.
- Involved working on Hadoop ecosystem components like Hadoop Map Reduce, HDFS, Zookeeper, Oozie, Hive, Sqoop, Pig, Flume.
- Deployed Network file system for Name Node Metadata backup.
- Monitor Hadoop cluster connectivity and security.
- Install operating system and Hadoop updates, patches, version upgrades when required.
- Dumped the data from HDFS to MYSQL database and vice-versa using Sqoop.
Environment: Hortonworks HDP 2.2, Ambari 2.x, HDFS, Zookeeper, Unix/Linux, HDFS, Map Reduce, Zookeeper, YARN, Pig, Hive, HBase, Flume, Sqoop, Shell Scripting, Ambari, Kerberos, Nagios & Ganglia.
Confidential, Santa Clara, CA
- Used Hortonworks distribution of Hadoop to store and process huge data generated from different enterprises
- Experience in installing, configuring, monitoring HDP stacks.
- Good experience on cluster audit findings and tuning configuration parameters.
- Implemented Kerberos security in all environments.
- Expertise in cluster planning, performance tuning, Monitoring, and troubleshooting the Hadoop cluster.
- Responsible for cluster MapReduce maintenance tasks commissioning and decommissioning task trackers and MapReduce jobs.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
- Involved in configuring MySQL to store the hive metadata.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing MFS and Hive.
- Help design of scalable Big Data clusters and solutions.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
- Worked with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- Production support responsibilities include cluster maintenance
Environment: Hortonworks HDP 2.1, HDFS, Hive, Pig, MapReduce, Sqoop, HBase, Kerberos, MySQL, Shell Scripting, RedHat Linux.
Confidential, Pleasanton, CA
- Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4.
- Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce, involved in defining job flows.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Solved small file problem using Sequence files processing in Map Reduce.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Performed cluster co-ordination through Zookeeper.
- Involved in support and monitoring production Linux Systems.
- Expertise in Archive logs and monitoring the jobs.
- Monitoring Linux daily jobs and monitoring log management system.
- Expertise in troubleshooting and able to work with a team to fix large production issues.
- Expertise in creating and managing DB tables, Index and Views.
- User creation and managing user accounts and permissions on Linux level and DB level.
- Extracted large data sets from different sources with different data-source formats which include relational databases, XML and flat files using ETL extra processing.
Environment: Cloudera Distribution CDH 4.1, Apache Hadoop, Hive, MapReduce, HDFS, PIG, ETL, HBase, Zookeeper
- Installation, configuration and administration of Linux (RHEL 4/5) and Solaris 8/9/10 servers.
- Maintained and support mission critical, front end and back-end production environments.
- Configured RedHat Kickstart server for installing multiple production servers.
- Provided Tier 2 support to issues escalated from Technical Support team and interfaced with development teams, software vendors, or other Tier 2 teams to resolve issues.
- Installing and partitioning disk drives. Creating, mounting and maintaining file systems to ensure access to system, application and user data.
- Maintenance and installation of RPM and YUM package installations and other server.
- Creating users, assigning groups and home directories, setting quota and permissions; administering file systems and recognizing file access problems.
- Experience in Managing and Scheduling Cron jobs such as enabling system logging, network logging of servers for maintenance, performance tuning and testing.
- Maintaining appropriate file and system security, monitoring and controlling system access, changing permission, ownership of files and directories, maintaining passwords, assigning special privileges to selected users and controlling file access.
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
- Build, configure, deploy, support, and maintain enterprise class servers and operating systems.
- Building Centos 5/6 Servers, Oracle Enterprise Linux 5 and RHEL (4/5) Servers from scratch.
- Organized the patch depots and act as POC for the patch related issues.
- Configuration & Installation of Red hat Linux 5/6, Cent OS 5 and Oracle Enterprise Linux 5 by using Kick Start to reduce the installation issues.
- Attended team meetings, change control meetings to update installation progress and for upcoming changes in environment.
- Handled patch upgrades and firmware upgrades on and RHEL Servers, Oracle Enterprise Linux Servers.
- User and Group administration on RHEL Systems.
- Creation of various user profiles and environment variables to ensure security.
- Server hardening and security configurations as per the client specifications.
- A solid understanding of networking/distributed computing environment concepts, including principles of routing, bridging and switching, client/server programming, and the design of consistent network-wide file system layouts.
- Strong understanding of Network Infrastructure Environment.
Environment: Red Hat Linux AIX, RHEL, Oracle 9i/10g, Samba, NT/2000 Server, VMware 2.x, Tomcat 5.x, Apache Server.