Hadoop Administrator Resume
TexaS
SUMMARY
- 8+ years of professional experience including around 5 years of Linux Administrator and 3 plus years in Big Data analytics as Hadoop/Big Data Administrator.
- Experience in architecting, designing, installing, configuring and managing of Apache Hadoop Clusters in MapR, Hortonworks & Cloudera Hadoop Distribution.
- Experience in Configuring and maintaining HA of HDFS, YARN (yet another resource negotiator) Resource Manager, MapReduce, Hive, HBASE and Kafka.
- Practical knowledge on functionalities of every Hadoop daemon, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Experience in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
- Experience in understanding and managing Hadoop Log Files.
- Configuring the Zookeeper to coordinate the servers in Clusters and to maintain the Data Consistency.
- Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in collecting the logs from log collector into HDFS using Flume.
- Experience in Kafka multi node cluster setup.
- Experience in setting up and managing the batch scheduler Oozie.
- Extending Hive functionalities by writing custom UDFs.
- Experience in integrating AD/LDAP users with Ambari and Ranger.
- Good experience in implementing Kerberos & Ranger in Hadoop Ecosystem.
- Experience in configuring policies in Ranger to provide the security for Hadoop services (Hive, HBase, Hdfs etc.)
- Good Understanding of Rack Awareness in the Hadoop cluster.
- Experience in using Monitoring tools like Cloudera manager and Ambari.
- Experienced in adding/installation and configuring of new services and removal of them through Ambari.
- Experienced in Ambari-alerts configuration for various components and managing the alerts.
- Involved in migration of cluster to AWS.
- Good understanding of Lambda functions.
- Actively worked on enabling ssl for Hadoop services in EMR.
- Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
- Good Understanding of data ingestion pipelines.
- Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
- Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
- Experience on UNIX commands and Shell Scripting.
- Excellent interpersonal, communication, documentation and presentation skills.
TECHNICAL SKILLS
Hadoop /Big Data Technologies: HDFS, Map Reduce, YARN, HBase, Hive, Tez, Sqoop, Flume, Zookeeper, Spark, Storm, MongoDB, Pig, Hue, Ranger, Impala, Kafka, Oozie and Kerberos
Programming Languages: Shell Scripting, Java, Python
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Databases: SQL Server, MYSQL, Cassandra
Web Technologies: HTML, XML, JSON, JavaScript
Operating Systems: Linux, Unix, Windows, Mac, CentOS
ETL Tools: Informatica Power Center 10.1/9.6.1/9.1/8. x, Power Exchange/Power Connect, Data Analyst, Metadata Manager, IDQ, Informatica MDM HUB 9.6.1/ 9.7, Business Glossary, B2B DT (Data Transformation), DX, MFT.
Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering
Protocols: TCP/IP, FTP, SSH, Telnet, SCP,RSH,ARP and RARP
Configuration Tools: Puppet, IBM-TEM tool
PROFESSIONAL EXPERIENCE
Confidential, Texas
Hadoop Administrator
Responsibilities:
- Cluster maintenance, Monitoring, Troubleshooting, Manage and review data backups, Manage & review log file Using Hortonworks and MapR.
- Implemented and configured High Availability Hadoop Cluster using Hortonworks Distribution and MapR.
- Experience working on Hadoop components like HDFS, YARN, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Storm, Flume, Ambari Infra, Ambari Metrics, Kafka.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow
- Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Deployed Network file system for Name Node Metadata backup.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
- Back up of data from active cluster to a backup cluster using distcp.
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented an instance of Zookeeper for Kafka Brokers.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Worked in Kerberos, Active Directory/LDAP, Unix based File System.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Performance tuning of Jobs when Yarn jobs are slow, Tez job is slow, Slow data loading.
- Managing the alerts on the Ambari page and take corrective and preventive actions.
- HDFS Disk space management, Generate HDFS Disc Utilization report for Capacity planning.
- User access management by setting up new users and providing them name Quotas and Space Quotas.
Environment: HDFS, YARN, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Storm, Flume, Ambari Infra, Ambari Metrics, Kafka, Ranger, Kerberos, Zookeeper
Confidential, New York
Hadoop Administrator
Responsibilities:
- Installed and Configured Hadoop monitoring and administrating tools like Cloudera Manager, Nagios and Ganglia.
- Participated in setting up Rack topology in the cluster.
- Implemented and configured High Availability Hadoop Cluster (Quorum Based) using cloudera Distributed Hadoop.
- Back up of data from active cluster to a backup cluster using distcp.
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings
- Implemented multiple high-performance MongoDB replica sets on EC2 with robust reliability
- Removed the nodes for maintenance or malfunctioning nodes using decommissioning and added nodes using commissioning.
- Hands on experience working on Hadoop ecosystem components like Yarn, Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig, Flume.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow
- Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Monitored services through Zookeeper
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Worked on analyzing Data with HIVE and PIG.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Network file system for Name Node Metadata backup.
- Performed cluster backup using DISTCP, Cloudera manager BDR and parallel ingestion.
- Designed and implemented scalable, secure cloud architecture based on Amazon Web Services.
- Leveraged AWS cloud services such as EC2; auto-scaling; and VPC (Virtual Private Cloud) to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts, and can quickly evolve during development iterations.
Environment: Hadoop Quorum Based, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, ZooKeeper, Nagios, Ganglia, Metadata, Flume, Yarn, Amazon Web Services, EC2, Horton works.
Confidential, Columbus, Ohio
Hadoop Administrator
Responsibilities:
- Installed and Configured Hadoop monitoring and administrating tools like Cloudera Manager, Nagios and Ganglia.
- Cluster maintenance, Monitoring, Troubleshooting, Manage and review data backups, Manage & review log file Using Hortonworks and MapR.
- Implemented and configured High Availability Hadoop Cluster using Hortonworks Distribution and MapR.
- Experience working on Hadoop components like HDFS, YARN, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Storm, Flume, Ambari Infra, Ambari Metrics, Kafka.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow
- Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Deployed Network file system for Name Node Metadata backup.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
- Back up of data from active cluster to a backup cluster using distcp.
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented an instance of Zookeeper for Kafka Brokers.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Worked in Kerberos, Active Directory/LDAP, Unix based File System.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Performed both major and minor upgrades to the existing cluster and rolling back to the previous version.
- Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Performance tuning of Jobs when Yarn jobs are slow, Tez job is slow, Slow data loading.
- Managing the alerts on the Ambari page and take corrective and preventive actions.
- HDFS Disk space management, Generate HDFS Disc Utilization report for Capacity planning.
- User access management by setting up new users and providing them name Quotas and Space Quotas.
Environment: HDFS, YARN, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Storm, Flume, Ambari Infra, Ambari Metrics, Kafka, Ranger, Kerberos, Zookeeper
Confidential, Irvine CA
Hadoop/Bigdata Administrator
Responsibilities:
- Handle the installation and configuration of a Hadoop cluster using Hortonworks Distribution.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Handle the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop.
- Monitor the data streaming between web sources and HDFS.
- Worked in Kerberos and how it interacts with Hadoop and LDAP.
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Worked in Kerberos, Active Directory/LDAP, Unix based File System.
- Managed data in Amazon S3, implemented s3cmd to move data from clusters to S3.
- Experience in Continuous Integration and expertise in Jenkins and Hudson tools.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop, Hortonworks Distribution.
- Worked in Unix commands and Shell Scripting.
- Worked in core competencies in Java, HTTP, XML and JSON.
- Worked on spark it’s a fast and general - purpose clustering computing system.
- Worked on Storm its distributed real-time computation system provides a set of general primitives for Commission and decommissions the Data nodes from cluster in case of problems.
- Handle the Massively Parallel Processing (MPP) databases such as Microsoft PDW.
- Experience in a Web-based Git repository hosting service, which offers all the distributed revision control and source code management (SCM) functionality of Git as well as adding its own features in Git Hub.
- Experience in Hortonworks Distribution Platform (HDP) cluster installation and configuration.
- Worked in statistics collection and table maintenance on MPP platforms.
- Worked on Cloudera to analyze data present on top of HDFS.
- Worked on large sets of structured, semi-structured and unstructured data.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Hortonworks, Flume, HBase, ZooKeeper, CDH3, MongoDB, Oracle, NoSQL and Unix/Linux.
Confidential
Linux Administrator
Responsibilities:
- Installing and upgrading OE & Red hat Linux and Solaris 8/ & SPARC on Servers like HP DL 380 G3, 4 and 5 & Dell Power Edge servers.
- Experience in LDOM’s and Creating sparse root and whole root zones and administered the zones for Web, Application and Database servers and worked on SMF on Solaris 10.
- Experience working with HP LVM and Red hat LVM.
- Experience in implementing P2P and P2V migrations.
- Involved in Installing and configuring Centos & SUSE 11 & 12 servers on HP x86 servers.
- Implemented HA using Red Hat Cluster and VERITAS Cluster Server 5.0 for Web Logic agent.
- Managing DNS, NIS servers and troubleshooting the servers.
- Troubleshooting application issues on Apache web servers and database servers running on Linux and Solaris.
- Experience in migrating Oracle, MYSQL data using Double take products.
- Used Sun Volume Manager for Solaris and LVM on Linux & Solaris to create volumes with layouts like RAID 1, 5, 10, 51.
- Performed performance analysis using tools like prstat, mpstat, iostat, sar, vmstat, truss, Dtrace.
- Experience working on LDAP user accounts and configuring Ldap on client machines.
- Upgraded Clear-Case from 4.2 to 6.x running on Linux (Centos &Red hat)
- Worked on patch management tools like Sun Update Manager.
- Experience supporting middle ware servers running Apache, Tomcat and Java applications.
- Worked on day-to-day administration tasks and resolve tickets using Remedy.
- Used HP Service center and change management system for ticketing.
- Worked on the administration of the Web Logic 9, JBoss 4.2.2 servers including installation and deployments.
- Worked on F5 load balancers to load balance and reverse proxy Web Logic Servers.
Environment: Solaris 8/9/10, Veritas Volume Manager, web servers, LDAP directory, Active Directory, BEA Web logic servers, SAN Switches, Apache, Tomcat servers, Web Sphere application server.
Confidential
Linux/Systems Administrator
Responsibilities:
- Installing, configuring and updating Solaris 7, 8, Red Hat 7.x, 8, 9, Windows NT/2000 Systems using media and Jumpstart and Kick-start.
- Installing and configuring Windows Active directory server 2000 and Citrix Servers.
- Published and administered applications via Citrix Meta Frame.
- Creating System Disk Partition, mirroring root disk drive, configuring device groups in UNIX and Linux environment.
- Working with VERITAS Volume Manager 3.5 and Logical Volume Manager for file system management, data backup and recovery.
- Implementing backup solution using Dell T120 autoloader and CA Arc Server 7.0
- Installed and Configured SSH Gate for Remote and Secured Connection.
- Configuration of DHCP, DNS, NFS and auto mounter.
- Creating, troubleshooting and mounting NFS File systems on different OS platforms.
- Installing, Configuring and Troubleshooting various software’s like Windd, Citrix - Clarify, Rave, VPN, SSH Gate, Visio 2000, Star Application, Lotus Notes, Mail clients, Business Objects, Oracle, Microsoft Project.
- Working 24/7 on call for application and system support.
- Experience in working and supported SIBES database running on Linux Servers.
Environment: HP ProLiant servers, SUN Servers (6500, 4500, 420, Ultra 2 Servers), Solaris 7/8, Veritas Net Backup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.
