Sr. Bigdata Administrator Resume
Franklin, TN
SUMMARY:
- Around 8 years of IT experience including 3 + years in Big Data Technologies.
- Extensive experience in Designing, Installing, Configuring and Tuning Hadoop core and ecosystem components
- Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Yarn, Zookeeper, Spark and Oozie
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
- Experience in installation, configuration, support and management of a Hadoop Cluster.
- Experience in task automation using Oozie, cluster co - ordination through Pentaho and Map Reduce job scheduling using Fair Scheduler.
- Worked on both Hadoop distributions: Cloudera and Hortonworks
- Experience in performing minor and major upgrades and applying patches for Ambari and Cloudera Clusters
- Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring.
- Capability to configure scalable infrastructures for HA (High Availability) and disaster recovery
- Experience in analyzing data using HiveQL, Pig Latin.
- Experience in developing and scheduling ETL workflows, data scrubbing and processing data in Hadoop using Oozie.
- Experience in balancing the cluster after adding/removing nodes or major data cleanup.
- Experience in Setting up Data Ingestion tools like Flume, Sqoop, and NDM
- General Linux system administration including design, configuration, installs, automation.
- Experience on Oracle, Hadoop, MongoDB, AWS Cloud, Greenplum.
- Experience in configuring Zookeeper to coordinate the servers in clusters.
- Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata
- Experience in setting up Name Node high availability for major production cluster
- Experience in designing Automatic failover control using zookeeper and quorum journal node
- Experience in creating, building and managing public and private cloud Infrastructure
- Experience in working with different file formats and compression techniques in Hadoop
- Experience in analyzing existing Hadoop cluster, Understanding the performance bottlenecks and providing the performance tuning solutions accordingly.
- Extensive experience in installation, configuration, maintenance, design, development, implementation, and support on Linux.
- Experience in Ansible and related tools for configuration management.
- Experience in working large environments and leading the infrastructure support and operations.
- Migrating applications from existing systems like MySQL, oracle, db2 and Teradata to Hadoop.
- Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, HDP 2.4, HDP 2.5.
Monitoring Tools: Ambari, Cloudera manager, Ganglia, Nagios, Cloud watch.
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Programming Languages: C, Java, SQL, and PL/SQL.
Front End Technologies: HTML, XHTML, XML.
Application Servers: Apache Tomcat, WebLogic Server, Web sphere
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.
NoSQL Databases: HBase, Cassandra, MongoDB
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.
Security: Kerberos, Ranger, Rangerkms, Knox.
WORK EXPERIENCE:
Sr. Bigdata Administrator
Confidential
Responsibilities:
- Installed, configured, upgraded, and applied patches and bug fixes for Prod, Test and Dev Servers.
- Installed/Configured/Maintained Hadoop clusters in dev/test/UAT/Prod environments.
- Install, configure and administer Hdfs, Hive, Ranger, Pig, HBase, Oozie, Sqoop, Spark and Yarn.
- Worked on the installation and configuration of Hadoop HA Cluster.
- Involved in capacity planning and design of Hadoop clusters.
- Setting up alerts in Ambari for the monitoring of Hadoop Clusters.
- Setting up security authentication using Kerberos security.
- Creating and dropping of users, granting and revoking permissions to users/Policies as and when required using Ranger.
- Commission and decommission the data nodes from cluster.
- Write and modify UNIX shell scripts to manage HDP environments.
- Installed and configured Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
- Administer, configure and performance tuning for Spark applications
- Create directories and setup appropriate permissions for different applications.
- Backup tables in Hbase to Hdfs dir.’s using export utility.
- Involved in planning and implementation of Hadoop cluster Upgrade.
- Installation, Configuration and administration of HDP on Red Hat Enterprise Linux 6.6
- Used Sqoop to import data into HDFS from Oracle database.
- Detailed analysis of system and application architecture components per functional requirements.
- Review and monitor system and instance resources to insure continuous operations (i.e., database storage, memory, CPU, network usage, and I/O contention)
- On call support for 24x7 Production job failures and resolve the issue in timely manner.
- Developed UNIX scripts for scheduling the delta loads and master loads using Auto sys Scheduler.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment.
- Troubleshoots with problems regarding the databases, applications and development tools.
Technical Environment: Hortonworks 2.4/2.5, Ambari, HDP, Ranger, HUE, Sqoop, Kerberos, Hive, Informatica 9.6.1, Oracle 11g/10g, DB2, LINUX, AWS, UNIX - AIX, Autosys.
Bigdata Administrator
Confidential, Franklin, TN
Responsibilities:
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
- Resolving tickets submitted, P1 issues, troubleshoot the error documenting, resolving the errors.
- Adding new Data Nodes when needed and running balancer.
- Responsible for building scalable distributed data solutions using Hadoop.
- Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
- Done major and minor upgrades to the Hadoop cluster.
- Done stress and performance testing, benchmark for the cluster.
- Working closely with both internal and external cyber security customers.
- Research effort to tightly integrate Hadoop and HPC systems.
- Compared Hadoop to commercial big-data appliances from Netezza, XtremeData and LexisNexis. Published and presented results.
- Research effort to tightly integrate Hadoop and HPC systems.
- Deployed, and administered 300+ nodes Hadoop cluster. Administered two bigger clusters.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Deployed, and administered Hadoop clusters.
- Compared Hadoop to commercial big-data appliances from Netezza, XtremeData, and LexisNexis. Published and presented results.
- Worked on developing Linux scripts for Job Automation.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Developing machine-learning capability via Apache Mahout.
Technical Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, Tomcat 6.
Hadoop Administrator
Confidential, Palo Alto, California
Responsibilities:
- Involved in Hadoop Implementation project specializing in but not limited to Hadoop Cluster management, write MapReduce Programs, Hive Queries (HQL) and used Flume to analyze the log files.
- Installation, Configuration and Management of Bigdata 2.x cluster (Hortonworks Data Platform1.3/2.2).
- Involved and played a key role in the development of an ingestion framework based on Oozie and another framework using Java.
- Developed Data Quality checks to match the ingested data with source in RDBMS using Hive.
- Did a POC for data ingestion using ETL tools including Talend, DataStage and ToadDataPoint.
- Created some custom component detection and monitoring using Zookeeper APIs.
- Supported Bigdata 1.x cluster (HDP 1.3) with issues related to jobs and cluster-utilization.
- Deep understanding and related experience with Hadoop, HBase, Hive, and YARN/Map-Reduce.
- Enabled High-Availability for NameNode and setup fencing mechanism for split-brain scenario.
- Enabled High-Availability for Resource Manager and several ecosystem components including Hiveserver2, Hive Metastore, and HBase.
- Configured YARN queues - based on Capacity Scheduler for resource management.
- Configured Node-Labels for YARN to isolate resources at a node level - separating the nodes specific for YARN applications and HBASE separately.
- Configured CGroups to collect CPU utilization stats.
- Setup Bucket Cache on HBASE specific slave nodes for improved performance.
- Setup rack-awareness for the Bigdata cluster and setup rack-topology script for improved fault tolerance.
- Setup HDFS ACLs to restrict/enable access to HDFS data.
- Performance evaluation for Hadoop/YARN - TestDFSIO, Terasort.
- Performance evaluation of Hive 14 with Tez using HiveTestBench.
- Configured Talend, Data Stage and Toad Data Point for ETL activities on Hadoop/Hive databases.
- Backing up HBase data and HDFS using HDFS Snapshots and evaluated the performance overhead.
- Created several recipes for automation of configuration parameters/scripts using Chef.
- Management and configured retention period of log files for all the services across the cluster.
- Involved in development of ETL processes with Hadoop, YARN and Hive.
- Developed Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.
- Coordinate with Operation/L2 team for knowledge transfer.
- Setting up quotas and replication factor for user/group directories to keep the disk usage under control using HDFS quotas.
Technical Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0,Pig 0.11.1,Oozie 3.3.0, H base 0.94.11,Sqoop1.4.4, Flume 1.4.0, Java, SQL, PL/SQL, Oracle 10g, Eclipse HTTP, Jama.
Hadoop Administrator
Confidential, Santa Clara, CA
Responsibilities:
- End to End Migration of Hadoop 1.x to Hadoop 2.x.
- Perform the installation and configuration of a Hadoop cluster using Ambari 2.0 (Hortonworks HDP 2.2)
- Monitor health of Hadoop ecosystem (HDFS, YARN, HIVE, HBASE, Sqoop, Hue and Slider)
- Monitor disk, Memory, Heap, CPU utilization on all Master and Slave machines and took necessary measures to keep the cluster up and running on 24/7 basis.
- Configured Capacity Scheduler to provide service-level agreements for multiple users of a cluster.
- Experience in performing benchmark tests for Hive, HBASE, HDFS, NameNode benchmark, MapReduce benchmark
- Created Hive internal and external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Worked on installing cluster, commissioning & decommissioning of data nodes.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for NameNode, Resource Manager, HiveServer2, and HBase Master.
- Implemented Kerberos for Hadoop Security.
- Creating and deploying Kerberos key tab Files, creating principals, realm.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Creating and managing HBase clusters dynamically using Slider.
- Upgrading Apache Ambari from Version 1.7 to 2.0.
- Involved in collecting metrics for Hadoop clusters using Ganglia.
- Interacted with developers when we had to deploy new jobs, Jobs throwing exceptions, and Data related issues.
Technical Environment: MapReduce, HDFS, Pig, Hive, HBase, Flume, Slider, Sqoop, Oozie, Nagios Zookeeper, Ganglia and Hortonworks Ambari.
Linux system administrator
Confidential
Responsibilities:
- Designing and configuring of Red Hat Enterprise Linux 5/4/3 User Administration, management and archiving.
- Expertise in Performance Monitoring and Performance Tuning using Top, prstat, SAR, vmstat, ps, iostat etc.
- Expertise in package management involves creating, installing and configuring of packages using Red Hat RPM.
- Used RPM in several Linux distributions such as Red Hat Enterprise Linux, SUSE Linux Enterprises and Fedora.
- Extensive experience in building servers using Jumpstart and Kick-start Process.
- Worked in configuration of Redhat Satellite server and Managed, configured and Maintained customer entitlements including upgrading and patching of Linux servers
- Used RHN Satellite exporter command to channel all the packages and deploying rpm packages.
- Configuring NFS, NIS, DNS, Auto Mount & Disk Space Management on SUN Servers.
- Experience in Configuring and Managing SAN Disks, Disk Mirrors & RAID 0, 1 & 5 Levels.
- Maintained and modified hardware and software components, content and documentation.
- Provided guidance for equipment checks and supported processing of security requests.
- Created/expand file systems in Linux (volume groups and Logical Volumes) and Solaris using volume managers
- Experience in backup and restore operations with Linux Inbuilt utilities.
- Installed configured and troubleshoot RedHat Linux 6.0 issues.
- Configured Users & Security administration, backup, recovery and maintenance of various activities.
- Advance level of experience in researching OS Issues, Applying Patches and opening the vendor Tickets.
- Maintained and documented all the errors and logs that were new to the environment shared it with the team.
- Installed and monitored VMware Virtual environments withESX4ESX 3.x, ESXi servers & Virtual Center 2.X.
- SENDMAIL configurations and administrations, testing the mail servers.
- Experience in adding and configuring devices like hard disks and backup devices etc.
- Monitor daily Net Backup activity and reports to proactively avoid issues.
- Improve automation and productivity of backups through scripting enhancements.
- Modify and Optimize backup schedules.
- Used VMware for testing various applications on different operating system.
- Provided 24x7 supports on pager rotation basis.
Environment: RHEL 5/4/3, SUSE, LVM, VMware, RAID