- Total 8+ years of IT experience and Extensive experience in the administration, modification, installation and maintenance of Hadoop on Linux RHEL operating system and Tableau.
- Over 8 years of total Information Technology experience with expertise in Administration and Operations experience, in Big Data and Cloud Computing Technologies
- Expertise in setting up fully distributed multi node Hadoop clusters, with Apache, Cloudera Hadoop.
- Expertise in AWS services such as EC2, Simple Storage Service (S3), Autoscaling, EBS, Glacier, VPC, ELB, RDS, IAM, Cloud Watch, and Redshift.
- Expertise in MIT kerberos and High Availability as well as Integration of Hadoop clusters.
- Experience in upgrading Hadoop clusters.
- Strong knowledge in installing, configuring and using ecosystem components like Hadoop Map Reduce, Oozie, Hive, Sqoop, Pig, Flume, Zookeeper, Kafka, Name Node Recovery, HDFS High Availability Experience in Hadoop Shell commands, verifying managing and reviewing Hadoop Log files.
- Designed and Implemented CI & CD Pipelines achieving the end to end automation Supported server/VM provisioning activities, middleware installation and deployment activities via puppet.
- Written puppet manifests Provision several pre - prod environments.
- Written puppet modules to automate our build/deployment process and do an overall process improvement to any manual processes.
- Designed, Installed and Implemented / puppet. Good Knowledge in automation by using Puppet
- Implementing AWS architectures for web applications
- Experience in EC2, S3, ELB, IAM, Cloudwatch, VPC in AWS
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Extensive experience on performing administration, configuration management, monitoring, debugging, and performance tuning in Hadoop Clusters.
- Performed AWS EC2 instance mirroring, WebLogic domain creations and several proprietary middleware Installations.
- Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
- Evaluating performance of EC2 instances their CPU, memory usage and setting up EC2 Security Groups and VPC.
- Configured and Managed Jenkins in various Environments, Linux and Windows.
- Administered Version Control systems GIT, to create daily backups and checkpoint files.
- Created various branches in GIT, merged from development branch to release branch and created tags for releases.
- Experience creating, managing and performing container based deployments using Docker images Containing Middleware and Applications together.
- Enabling/Disabling of Passive and Active check for Hosts and Service in Nagios.
- Good knowledge in installing, configuring & maintaining Chef server and workstation
- Expertise in provisioning clusters and building manifests files in puppet for any services.
- Excellent knowledge in Import/Export structured, un-structured data from various data sources such as RDBMS, Event logs, Message queues into HDFS, using a variety of tools such as Sqoop, Flume etc.
- Expertise in converting non kerberized Hadoop cluster to Hadoop with kerberized cluster
- Administration and Operations experience with Big Data and Cloud Computing Technologies
- Handling in setting up fully distributed multi node Hadoop clusters, with Apache and AWSEC2instances
- Handling in AWS services such as EC2, Simple Storage Service(S3), Auto scaling, EBS, ELB, RDS, IAM, Cloud Watch
- Performing administration, configuration management, monitoring, debugging, and performance tuning in Hadoop Clusters.
- Experience in Installing Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems.
- Expert in Linux Performance monitoring, kernel tuning, Load balancing, health checks and maintaining compliance with specifications.
- Hands on experience in Zookeeper and ZKFC in managing and configuring in Name Node failure scenarios.
- Team Player with good communication and interpersonal skills and goal oriented approach to problem solving issues.
Sr. Hadoop Administrator
Confidential, Phoenix, AZ
- Providing hardware architectural guidance, planning and estimating cluster capacity, and creating roadmaps for Hadoop cluster deployment.
- Working on Hortonworks & Cloudera cluster setup
- Automate the repeated tasks using python
- Having knowledge in shell and python programing language
- Installing, Configuring, Maintaining, and Troubleshooting Standalone syste
- Configuring and updating parameters in servers using python.
- Analyze the shell/python script code during the migrations.
- Troubleshooting the issues like job failures and performance issues
- Hadoop user administration using Sentry
- Upgrade from CDH from 5.2 to 5.
- Responsible to manage data coming from different sources.
- Knowledge on snapshots
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting
- Adding new nodes to an existing cluster, recovering from a Name Node failure
- Decommissioning and commissioning the Node on running cluste
- Supported Map Reduce Programs those are running on the cluster.
- Configured Fair Scheduler to provide service-level agreements for multiple users of a cluster.
- Managing nodes on Hadoop cluster connectivity and security
- Experienced in managing and reviewing Hadoop log files
- Maintaining Backup for name node.
- Knowledge in Name node recoveries from previous backups
- Importing and exporting data into HDFS using Sqoop
- Depth conceptual and functional understanding of Map Reduce and • Hadoop eco-system Infrastructure (Both MRv1 and MRv2)
- Deploy & scale-out multi-node Hadoop cluster
- Components including MapReduce, PIG, Hive, Hbase
- Design/implement workflow and coordinator jobs using Oozie tool
- Functional knowledge of flume, sqoop
- Experience in trouble shooting, optimization& performance tuning
- Experience in change management.
- Follow the functional spec analysis and develop ETL pipeline Develop MapReduce/PIG application to transform the data available in HDFS
- Generate reporting data using PIG/Hive to serve business team's ad-hoc requests
- Import data to Hive from RDBMS sources, process it, write resulted data to HDFS
- Cluster management, troubleshoot, share best practices to the team
- Schedule the jobs using Oozie workflow
- Built hadoop cluster from scratch in a "start small and scale quickly" approach
- Well versed with the security issues like Quotas, RBAC, ACL, setuid and sticky bit.
- Using Kerberos, LDAP, and Rangers for Access identification management.
Confidential - Memphis, TN
- Install and Manage HDP Hortonworks DL and DW components.
- Worked on Hadoop Hortonworks (HDP 220.127.116.11.2) distribution which managed services viz. HDFS, MapReduce2, Tez, Hive, Pig, Hbase, Sqoop, Flume, Spark, Ambari Metrics, Zookeeper, Falcon and oozie etc.) for 4 cluster ranges from LAB, DEV, QA to PROD contains nearly 350+ nodes with 7PB data.
- Monitor Hadoop cluster connectivity and security on Ambari monitoring system.
- Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Rendered L3/L4 support services for BI users, Developers and Tableau team through Jira ticketing system.
- One of the key engineers in Aetna's HDP web engineering team, Integrated Systems engineering ISE.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately for the tickets raised by users in JIRA ticketing system
- Worked closely with System Administrators, BI analysts, developers, and key business leaders to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.
- Performance Tuning and ETL, Agile Software Deployment, Team Building & Leadership, Engineering Management.
- Horton works Ambari, Apache Hadoop on Red hat, and Centos as data storage, retrieval, and processing systems.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and Map Reduce access for the new users and creating key tabs for service ID's using keytab scripts.
- Performed a Major upgrade in production environment from HDP 2.3 to HDP 2.6. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Installed OLAP software Atscale on its designated edge node server.
- Implemented dual data center set up for all Cassandra cluster. Performed much complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize.
- Conducted cluster sizing, tuning, and performance benchmarking on a multi-tenant Open Stack platform to achieve desired performance metrics.
- Good knowledge on providing solution to the users who encountered java exception and error problems while running the data models in SAS script and R script. Good understanding on forecast data models.
- Worked on data ingestion on systems to pull data scooping from traditional RDBMS platforms such as Oracle, MySQL and Teradata to Hadoop cluster using automated ingestion scripts and also store data in NoSQL databases such as HBase, Cassandra.
- Provided security and authentication with Kerberos which works on issuing Kerberos tickets to users.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, R studio which provides GUI for developers/business users for day-to-day activities.
- Create queues and allocated the clusters resources to provide the priority for jobs in hive.
- Implementing the SFTP for the projects to transfer data SCP from External servers to servers. Experienced in managing and reviewing log files. Involved in scheduling Oozie workflow engine to run multiple Hive, sqoop and pig jobs.
- Environment: CDH 5.4.3 and 4.x, Cloudera Manager CM 5.1.1, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Chef, Redhat/Centos 6.5, Control-M.
Confidential, Sacramento, CA
- Providing Administration, management and support for large scale Big Data platforms on Hadoop eco-system.
- Built lab, dev and prod environments from scratch, worked with network, AD and UNIX team for preparing the cluster to build Hadoop platform for project.
- Involved in Cluster Capacity planning, deployment and Managing Hadoop for our data platform operations with a group of Hadoop architects and stakeholders.
- Worked on various POC's while working for the production deployment with Cloudera.
- Expertise knowledge in UNIX administration in perspective of Hadoop ecosystems.
- Implemented Kerberos using MIT KDC (POC) and also Kerberos using Active Directory as KDC integrated with Centrify (First Implemented in POC and later for DEV/UAT and PROD Environments.
- Upgraded DEV and UAT cluster from Cloudera Manager and CDH packages from 5.7.1 to 5.7.2 with Kerberos.
- Working with off-shore and on-shore application teams for creating resource pools and resource allocation.
- Setup monitoring for the Cloudera cluster which includes configuring alert thresholds and SNMP settings to push alert notifications to service mail boxes.
- Setting up Name-node HA using QJM and Resource Manager HA for the Cloudera cluster.
- Monitoring cluster job performance and killing long running job.
- Experience in setting up deploying Hadoop cluster in DMZ (Demilitarized Zone) which is isolated network zone with custom data access policies for the sources/clients trying to access the Hadoop cluster. Experience in designing Data Flow Diagrams and User Access groups.
- Developed Ansible scripts to check the service status, PID status, pushing file to target servers, presently working on developing Ansible scripts for managing cluster through Cloudera REST API. (Developed and deployed an automated cluster using Cloudera Director)
- YARN cluster tuning and configured HIVE on spark.
- Scheduled cron jobs for taking cold-backups of back-end databases used in the cluster deployment (MySQL)
- Experience in setting up Kafka cluster for publishing topics and familiar with lambda architecture.
- Use Cloud era's BDR to implement backup and recovery strategy.
- Configuring and setting up user and service accounts, ACL, creating Sentry policies, running Hive QL DDL for creating tables in production.
- Experience in creating MOP (Method of process) documents before making any changes to the prod environment.
- Benchmarking cluster using DFSIO, Terasort and Teragen utilities provided by Cloudera.
- Setting up local repositories for CDH and Cloudera Manager Upgrades.
- Enabled Rack-configuration for the cluster and deployed services according to the rack configuration.
- Adding new nodes to the cluster and schedule cluster rebalancing after adding new nodes and also decommissioning nodes from the cluster for maintenance purpose.
- Configured cluster to tolerate multiple data disks failures and raise tickets with data center folks as necessary for hardware failures and replacemen
- Experience in following ITIL procedures for change, incident and task management.
- Creating dynamic resource pools for managing the cluster resources among various application teams
- Setting up file hierarchy in HDFS File structure for storing application jars, staging data and final data. Discussing data and log retention policies for the cluster.
- Environment: CDH 5.7.X, Cassandra, Kafka, Cloudera Manager, Bash, Python, Oozie, Hue, Splunk, HDFS, Zookeeper, YARN, Appworx, Hive, Spark, Dell Hardware, Upgrades and patching, Rack configuration, Centrify, Attunity CDC.
- Monitored workload job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root cause and recommended course of actions.
- Imported logs from web servers with Flume to ingest data into HDFS.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Fine tuning Hive jobs for optimized performance. Installed, configured and deployed a Hadoop cluster for development, production and testing.
- As a admin followed standard Back up policies to make sure the High availability of cluster.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Effectively used Sqoop to transfer data between databases and HDFS.
- Worked on streaming data into HDFS from web servers using Flume.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Involved in Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- Automated workflows using shell scripts pull data from various databases into Hadoop
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java Map-Reduce, Hive and Sqoop as well as system specific jobs.
- Experienced with different kind of compression techniques like LZO, GZIP and Snappy.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Provisioning, building and support Linux servers both Physical and Virtual using VMware for Production, QA and Developers environment.
- Environment: Cloudera Manager, Hadoop HDFS, MapReduce, HBase, Hive, Pig, Flume, Oozie, Sqoop, Shell Scripting, Hue, MySQL, Mango DB, AWS EC2, ETL, Bash, Service Now, JIRA.
- Currently working as Hadoop administrator in Hortonworks distribution for 3 clusters ranges from POC clusters to PROD clusters.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Working knowledge on hive and its components and troubleshooting if any issues with hive.
- Responsible for cluster availability and experienced on ON-call support.
- Experienced in Setting up the project and volume setups for the new hadoop projects.
- Involved in snapshots and HDFS data backup to maintain the backup of cluster data and even remotely.
- Implementing the SFTP for the projects to transfer data from External servers to hadoopservers.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Experienced in managing and reviewing Hadoop log files.
- Good knowledge on maintain mysql databases creation and setting up the users and maintain the backup of databases.
- Extensive knowledge on HDFS and YARN implementation in cluster for better performance.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new hadoop environments and expand existing hadoop clusters.
- Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job.
- Environment: Hortonworks 2.1.7, Hive, pig, Sqoop, Flume, Zookeeper and HBase, MYSQL Shell Scripting, Redhat Linix,
- Windows server 2003/2008/2008 R2: Installing, configuring and administering Windows 2003/2008/2008 R2 servers
- Installing, configuring and administering Windows 2003/2008/2008 R2 servers
- Manage new hire account creation, user terminations, privileges and all other user account modifications Active Directory:
- Troubleshooting user account lockouts using AD Lockout tool and Microsoft Operations Manager.
- Managed user accounts using Active Directory 2003/2008 to create new domain accounts, provide controlled access to existing network shares as well as creating new ones
- Identified highest space using folder on various NAS servers to free up disk usage using TreeSize Professional.
- Adding user contact information in AD.
- Providing RSA(Secure key) to the users to work remotely.
- Troubleshooting issues while logi
- Rights Management service
- Creating and modification of users, groups and OU. Exchange:
- Creating Mailbox for new users to receive mails
- User contact Management to add them into the organisation servers
- Giving access to the mailbox for the requested user
- Increasing Mailbox size
- Addition and deletion of users from DL
- Adding Primary and Secondary Email Address
- Adding Email to safe sender list and block listing.
- Troubleshooting of Mail delivery failure issues and OWA issues
Big Data Technologies: HDFS, YARN, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume, Mongo-DB, Cassandra
Big Data Distributions: Cloudera, Hortonworks, Apache Hadoop Installation Ansible, GitLab
Batch scheduling tool: Control-M Scripting Languages Shell, Bash
Monitoring tools: Grafana, Ganglia, Nagios, Ambari, Jenkins, Navigator, Netcool
Reporting Tools: Service-Now, Tableau, Jasper soft
Programming Languages: SQL, PL/SQL, Java, Chef, Puppet
Application Servers: Apache Tomcat, Web Logic Server, WebSphere, J-Boss
Databases: Oracle9.x, 10g, 11g, MySQL Server, DB2, H-Base.
Networking & Protocols: TCP/IP, Telnet, HTTP, HTTPS, FTP, SNMP, DNS.
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP Vista, Windows 7, Windows 8