We provide IT Staff Augmentation Services!

Hadoop Operations Administrator Resume

5.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • Over 8+ years of IT experience including 4 + years in Big Data Technologies.
  • Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Yarn, Zookeeper, Spark and Oozie.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts. Experience in analyzing data using HiveQL, Pig Latin. Experience in Ansible and related tools for configuration management.
  • Experience in task automation using Oozie, cluster co - ordination through Pentaho and Map Reduce job scheduling using Fair Scheduler. Worked on both Hadoop distributions: Cloudera and Hortonworks
  • Experience in performing minor and major upgrades and applying patches for Ambari and Cloudera Clusters.
  • Extensive experience in installation, configuration, maintenance, design, implementation, and support on Linux.
  • Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring.
  • Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
  • Capability to configure scalable infrastructures for HA (High Availability) and disaster recovery.
  • Experience in developing and scheduling ETL workflows, data scrubbing and processing data in Hadoop using Oozie.
  • Experienced with machine learning algorithm such as logistic regression, KNN, SVM, random forest, neural network, linear regression, lasso regression and k-means. .Experience in Setting up Data Ingestion tools like Flume, Sqoop, and NDM. Experience in balancing the cluster after adding/removing nodes or major data cleanup.
  • General Linux system administration including design, configuration, installs, automation.
  • Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata.
  • Experience in setting up Name Node high availability for major production cluster.
  • Experience in designing Automatic failover control using zookeeper and quorum journal node.
  • Experience in creating, building and managing public and private cloud Infrastructure.
  • Experience in working with different file formats and compression techniques in Hadoop
  • Experience in analyzing existing Hadoop cluster, Understanding the performance bottlenecks and providing the performance tuning solutions accordingly. Experience on Oracle, MongoDB, AWS Cloud, Greenplum.
  • Experience in working large environments and leading the infrastructure support and operations.
  • Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance. Experience in configuring Zookeeper to coordinate the servers in clusters.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
  • Experienced on supporting Production clusters on-call support and troubleshooting issues within window to avoid any delays. Storage/Installation, LVM, Linux Kickstart, Solaris Volume Manager, Sun RAID Manage.
  • Expertise in Virtualizations System Administration of VMware EESX/EESXi, VMware Server, VMware Lab Manager, Vcloud, Amazon EC2 & S3 web services.
  • Excellent knowledge of in NOSQL databases like HBase, Cassandra. Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Involved in 24X7 Production support, Build and Migration Assignments.
  • Good Working Knowledge on Linux concepts and building servers ready for Hadoop Cluster setup.
  • Extensive experience on monitoring servers with Monitoring tools like Nagios, Ganglia about Hadoop services and OS level Disk/memory/CPU utilizations.
  • Closely worked with Developers and Analysts to address project requirements. Ability to effectively manage time and prioritize multiple projects.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, HDP 2.4, 2.6, CDH 5.x

Devops Tools: Jenkins, Git

Monitoring Tools: Cloudera Manager, Ambari, Ganglia

Scripting Languages: Shell Scripting, CSH.

Programming Languages: C, Java, SQL, and PL/SQL. Python.

Front End Technologies: HTML, XHTML, XML.

Application Servers: Apache Tomcat, WebLogic Server, Web sphere

Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.

NoSQL Databases: HBase, Cassandra, MongoDB.

Operating Systems: Linux, UNIX, Mac OS X 10.9.5, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.

Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.

Security: Kerberos, Ganglia and Nagios

PROFESSIONAL EXPERIENCE

Hadoop Operations Administrator

Confidential, Boston MA

Responsibilities:

  • Working on Hadoop Hortonworks (HDP 2.6) distribution which manages services viz. HDFS, MapReduce2, Tez, Hive, Pig, HBase, Cloudera, Sqoop, Flume, Spark, Ambari Metrics, Zookeeper, Falcon and Oozie etc.) For 4 cluster ranges from LAB, DEV, QA to PROD contains nearly 350+ nodes with 7PB data. Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Hadoop cluster for data acquisition.
  • Rendered L3/L4 support services for BI users, Developers and Tableau team through Jira ticketing system.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms Hadoop in using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • One of the key engineers in Aetna's HDP web engineering team, Integrated Systems engineering ISE.
  • Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability. Worked on installing Kafka on Virtual Machine.
  • Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to MaprFS.
  • Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Jenkins, Git, or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's to read/write JSON files.
  • Optimizing MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Installation of OAS (Oracle Application Server) on Solaris 9 and its configuration with oracle database.
  • Managed Rackspace and MS Azure accounts utilized for Big Data Platform.
  • Developed Shell and Python scripts to automate the jobs.
  • Analyze escalated incidences within the Azure SQL database, Implemented test scripts to support.
  • Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow. Designed and Developed data migration strategy between AWS and Azure environments.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Managing & review log files.
  • Worked on python files to load the data from csv, json, MySQL, hive files to Neo4J Graphical database.
  • Worked closely with System Administrators, BI analysts, developers, and key business leaders to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.
  • Worked with NoSQL databases like HBase and MongoDB for POC purpose.
  • Configured, installed, monitored MapR Hadoop on 10 AWS EC2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster.
  • Performance Tuning and ETL, Agile Software Deployment, Team Building & Leadership, Engineering Management.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Performed a Major upgrade in production environment from HDP 2.5 to HDP 2.6. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
  • Installed OLAP software Atscale on its designated edge node server.
  • Implemented dual data center set up for all Cassandra cluster. Performed much complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize.
  • Conducted cluster sizing, tuning, and performance benchmarking on a multi-tenant OpenStack platform to achieve desired performance metrics.
  • Good knowledge on providing solution to the users who encountered java exception and error problems while running the data models in SAS script and Rscript. Good understanding on forest data models.
  • Worked on data ingestion on systems to pull data scooping from traditional RDBMS platforms such as Oracle, MySQL and Teradata to Hadoop cluster using automated ingestion scripts and also store data in NoSQL databases such as HBase, Cassandra.
  • Provided security and authentication with Kerberos which works on issuing Kerberos tickets to users.
  • Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developer’s/business users for day-to-day activities.
  • Implemented encryption on Azure WASB storage (Data at Rest).
  • Create queues and allocated the clusters resources to provide the priority for jobs in hive.
  • Implementing the SFTP for the projects to transfer data SCP from External servers to servers. Experienced in managing and reviewing log files.
  • Working in scheduling Oozie workflow engine to run multiple Hive, Sqoop and pig jobs

Environment: HDP 2.6,, Azure H2O, Jenkins, Git, Spark, Map Reduce, Hive, Pig, Zookeeper, Nifi, Kafka, HBase, Flume, Sqoop, Kerberos, Sentry, Cent OS.

Hadoop Operations Administrator

Confidential, Charlotte NC

Responsibilities:

  • Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 150 nodes ranges from POC (Proof-of-Concept) to PROD clusters.
  • Involved in the requirements review meetings and collaborated with business analysts to clarify any specific scenario.
  • Worked on Hortonworks Distribution which is a major contributor to Apache Hadoop.
  • Experience in Installation, configuration, deployment, maintenance, monitoring and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production using Ambari front-end tool and Scripts.
  • Enabled HA for Resource Manager, Name Node, and HiveMetastore.
  • Experience with implementing High Availability for HDFS, Yarn, Hive and HBase.
  • Created databases in MySQL for Hive, Ranger, Oozie, Dr. Elephant and Ambari.
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
  • Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
  • Installed and configured Ambari metrics, Grafana, Knox, Kafka brokers on Admin Nodes.
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Commissioning and Decommissioning Nodes from time to time.
  • Implemented NameNode automatic failover using zkp controller.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Introduced Smart Sense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig and Sqoop.
  • Configured the Kerberos and installed MIT ticketing system.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
  • Experienced in defining job flows
  • Experienced in managing and reviewing Hadoop log files
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Installed Grafana for metrics analytics & visualization suite.
  • Monitoring local file system disk space usage, CPU using Ambari.
  • Installed various services like Hive, HBase, Pig, Oozie, and Kafka.
  • Production support responsibilities include cluster maintenance.
  • Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades when required.

Environment: Hortonworks 2.4, Ambari 2.5.2, HDFS, MapReduce, Yarn, Hive, PIG, Zookeeper, TEZ, Oozie, MYSQL, Puppet, and RHEL

Linux / Hadoop Administrator

Confidential

Responsibilities:

  • Managing UNIX Infrastructure involves day-to-day maintenance of servers and troubleshooting.
  • Provisioning Red Hat Enterprise Linux Server using PXE Boot according to requirements.
  • Performed Red Hat Linux Kick start installations on Red Hat 4.x/5.x, performed Red Hat Linux Kernel Tuning, memory upgrades.
  • Worked with Logical Volume Manager and creating of volume groups/logical performed Red Hat Linux Kernel Tuning.
  • Checking and cleaning the file systems whenever it's full. Used Log watch 7.3, which reports server info as scheduled.
  • Had hands on experience in installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
  • Configured Job Tracker to assign Map Reduce Tasks to Task Tracker in cluster of Nodes
  • Configured Job Tracker to assign Map Reduce Tasks to Task Tracker in cluster of Nodes
  • Implemented Kerberos security in all environments.
  • Defined file system layout and data set permissions.
  • Implemented Capacity Scheduler to share the resources of the cluster for the Map Reduce jobs given by the users
  • Worked on importing the data from oracle databases into the Hadoop cluster.
  • Managed and reviewed data backups and log files and worked on deploying Java applications on cluster.
  • Commissioning and Decommissioning Nodes from time to time.

Environment: Red Hat Linux (RHEL 3/4/5), Solaris, Logical Volume Manager, Sun & Veritas Cluster Server, Global File System, Red Hat Cluster Servers.

Linux System Administrator

Confidential

Responsibilities:

  • Installation and configuration of Solaris 9/10 and Red Hat Enterprise Linux 5/6 systems.
  • Involved in building servers using jumpstart and kickstart in Solaris and RHEL respectively.
  • Installation and configuration of RedHat virtual servers using ESXi 4/5 and Solaris servers (LDOMS) using scripts and Ops Center.
  • Performed package and patches management, firmware upgrades and debugging.
  • Addition and configuration of SAN disks for LVM on Linux, and Veritas Volume Manager and ZFS on Solaris LDOMs.
  • Configuration and troubleshooting of NAS mounts on Solaris and Linux Servers.
  • Configuration and administration of ASM disks for Oracle RAC servers.
  • Analyzing and reviewing the System performance tuning and Network Configurations.
  • Managed Logical volumes, Volume Groups, using Logical Volume Manager.
  • Troubleshooting and analysis of hardware and failures for various Solaris servers (Core dump and log file analysis)
  • Performed configuration and troubleshooting of services like NFS, FTP, LDAP and Web servers.
  • Installation and configuration of VxVM, Veritas file system (VxFS).
  • Management of Veritas Volume Manager (VxVM), Zetabyte File System (ZFS) and Logical Volume Manager
  • Involved in patching Solaris and RedHat servers.
  • Worked NAS and SAN concepts and technology.
  • Configured and maintained Network Multipathing in Solaris and Linux.
  • Configuration of Multipath, EMC power path on Linux, Solaris Servers.
  • Provided production support and 24/7 support on rotation basis.
  • Performed POC on Tableau which includes running load tests and system performance with large amount of data.

Environment: Solaris 9/10/11, RedHat Linux 4/5/6, AIX, Sun Enterprise Servers E5500/E4500, Sun Fire V 1280/480/440 , Sun SPARC 1000, HP 9000K, L, N class Server, HP & Dell blade servers, IBM RS/6000, VMware ESX Server.

We'd love your feedback!