We provide IT Staff Augmentation Services!

Sr. Hadoop Administrator Resume

2.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY:

  • Result - driven IT Professional with 8+ years of experience in Software analysis, design, development and maintenance in diversified areas of Client-Server, Distributed and embedded applications.
  • Hands on experience with Hadoop Stack - HDFS, MapReduce, YARN, Sqoop, Flume, Hive-Beeline, Impala, Tez, Pig, Zookeeper, Oozie, Solr, Sentry, Kerberos, Centrify DC, Falcon, Hue, Kafka and Storm.
  • Expertise in installing, configuring and optimizing Cloudera Hadoop version CDH3, CDH 4.X and CDH 5.X in a Multi Clustered environment.
  • Skilled on commissioning and de-commissioning the cluster nodes, Data migration. Also, involved in setting up DR cluster with BDR replication setup and Implemented Wire encryption for Data at REST.
  • Ability to plan, manage HDFS storage capacity and disk utilization.
  • Expert in setting up Horton Works cluster with and without using Ambari.
  • Good experience in setting up Cloudera cluster using packages as well as parcels Cloudera manager.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, JobTracker, Task Tracker, NameNode, DataNode and MapReduce concepts.
  • Solid understanding of all phases of development using multiple methodologies i.e. Agile with JIRA, Kanban board along with ticketing tools Remedy and ServiceNow.
  • Strong experience in system Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance monitoring and Fine-tuning on Linux (RHEL) systems.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, Map Reduce, YARN, PIG, Hive, HBase, SmartSense, Zookeeper, Oozie, Ambari, Kerberos, Knox, Ranger, Sentry, Spark, Tez, Accumulo, Impala, Hue, Storm, Kafka, Flume, Sqoop, Solr

Hardware: IBM pSeries, PureFlex, RS/6000, IBM Blade servers, HP Proliant DL 360,380, HP Blade servers C6000, C7000

SAN: EMC CLARiiON, EMC DMX, IBM XIV

Operating Systems: Linux, AIX, CentOS, Solaris &Windows

Networking: DNS, DHCP, NFS, FTP, NIS, Samba, LDAP, Open LDAP, SSH, Apache, NFS, NIM

Tools: & Utilities: ServiceNow, Remedy, Maximo, Nagios, Chipre & SharePoint

Databases: Oracle 10/11g, 12c, DB2, MySQL, HBase, Cassandra, MongoDB

Backups: Veritas NetBackup & TSM Backup

Virtualization: VMware vSphere, VIO

Cluster Technologies: HACMP 5.3, 5.4, Power HA 7.1, VERITAS Cluster Servers 4.1

Web/Application Servers: Tomcat, WebSphere Application Server 5.0/6.0/7.0, Message Broker, MQ Series, WebLogic Server, IBM HTTP Server

Cloud Knowledge: AWS

Scripting & Programming Languages: Shell & Perl Programming

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Sr. Hadoop Administrator

Responsibilities:

  • Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
  • Responsible for maintaining 24x7 production CDH Hadoop clusters running spark, HBase, hive, MapReduce with multiple petabytes of data storage on daily basis.
  • Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
  • Deployed Name Node high availability for major production cluster.
  • Configured Oozie for workflow automation and coordination.
  • Troubleshoot production level issues in the cluster and its functionality.
  • Backup data on regular basis to a remote cluster using Distcp.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • Used Sqoop to connect to the ORACLE, MySQL, Teradata and move the data into Hive /HBase tables.
  • Worked on Hadoop Operations on the ETL infrastructure with other BI teams like TD and Tableau .
  • Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
  • Performed Disk Space management to the users and groups in the cluster.
  • Used Storm and Kafka Services to push data to HBase and Hive tables.
  • Documented slides & Presentations on Confluence Page.
  • Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
  • Used Sqoop, Distcp utilities for data copying and for data migration.
  • Worked on e nd to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
  • Installed Kafka cluster with separate nodes for brokers.
  • Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
  • Effectively worked in Agile Methodology and provide Production On call support
  • Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Monitor Hadoop cluster connectivity and security.
  • Manage and review Hadoop log files.
  • File system management and monitoring.
  • Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
  • Diagnose and resolve performance issues and scheduling of jobs using Cron & Control-M.
  • Used Avro SerDe for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.

Environment: CDH 5.8.3, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper

Confidential, Austin TX

Sr. Hadoop Engineer

Responsibilities:

  • Identifying the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop ecosystem.
  • Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Storm and Spark in Linux servers using Ambari.
  • Set up automated 24x7x365 monitoring and escalation infrastructure for Hadoop cluster using Nagios Core and Ambari.
  • Designed and implemented Disaster Recovery Plan for Hadoop Clusters.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • Integrated Hadoop cluster with AD and enabled Kerberos for Authentication.
  • Implemented Capacity schedulers on the Yarn Resource Manager to share the resources of the cluster for the MapReduce jobs given by the users.
  • Used Nifi to pull the data from different source and to push the data to HBase and Hive.
  • Set up Linux Users, and tested HDFS, Hive, Pig and MapReduce Access for the new users.
  • Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
  • Used Sqoop to connect to Oracle, MySQL, SQL Server, TD and move the pivoted data to Hive tables or HBase tables.
  • Worked with Linux server admin team in administering the server Hardware and operating system.
  • Interacted with networking team to improve bandwidth.
  • Provided User, Platform and Application support on Hadoop Infrastructure.
  • Applied Patches and Bug Fixes on Hadoop Cluster.
  • Used Knox for perimeter security and Ranger for granular access in the cluster.
  • Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
  • Conducted Root Cause Analysis and resolved production problems and data issues.
  • Performed Disk Space management to the users and groups in the cluster.
  • Used Storm and Kafka Services to push data to HBase and Hive tables.
  • Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
  • Used Sqoop, Distcp utilities for data copying and for data migration.
  • Worked on e nd to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs such as MapReduce, Pig, Hive, and Sqoop as well as system specific jobs such as Java programs and Shell scripts.
  • Installed Kafka cluster with separate nodes for brokers.
  • Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
  • Monitored cluster stability, used tools to gather statistics and improved performance.
  • Used Apache(TM) Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Identified disk space bottlenecks and Installed Nagios Log Server and integrated it with the PRD cluster to aggregate service logs from multiple nodes and created dashboards for important service logs for better analyzation based on historical log data.

Environment: Hue, Oozie, Eclipse, HBase, Flume, Splunk, Linux, Java Hibernate, Java JDK, Kickstart, Puppet PDSH, Chef, GCC 4.2, GIT, Cassandra, AWS, NoSQL, RedHat, CDH(4.x), Flume, Impala, MySQL, MongoDB, Nagios, Chef

Confidential, Farmington Hills, MI

Hadoop Engineer

Responsibilities:

  • Provided Administration, management and support for large scale Big Data platforms on Hadoop eco-system.
  • Involved in Cluster Capacity planning, deployment and Managing Hadoop for our data platform operations with a group of Hadoop architects and stakeholders.
  • Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components.
  • Developed backup policies for Hadoop systems and action plans for network failure.
  • Involved in the User/Group Management in Hadoop with AD/LDAP integration.
  • Resource management and load management using capacity scheduling and appending changes according to requirements.
  • Implemented strategy to upgrade entire cluster nodes OS from RHEL5 to RHEL6 and ensured cluster remains up and running.
  • Worked on Hadoop cluster tasks like commissioning and de-commissioning nodes without any effect to running jobs and data for maintenance and backup.
  • Monitored multiple Hadoop clusters environments using Nagios and Ganglia.
  • Worked on Hadoop Operations on the ETL infrastructure with other BI teams like TD and Tableau.
  • Implemented High Availability of Hadoop NameNode using the Quorum Journal Manager.
  • Worked on Partitioning concepts and different file formats supported in Hive and Pig.
  • Developed scripts in shell and python to automate lot of day to day admin activities.
  • Implemented HCatalog for making partitions available for Pig/Java MR and established Remote Hive Meta store using MySQL.
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations
  • Implemented POC by spinning up 8 node IBM Big Insights. (Big Insights for Apache Hadoop Edition) cluster as per the management requirement
  • Moved data from production into IBM Big Insight Cluster for testing and performance
  • Creating hive tables and setting the user permissions.
  • Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
  • Wrote Automatic Scripts for monitoring the file systems.
  • Responsible for giving presentations about new ecosystems to be implemented in the cluster with the teams and managers.
  • Applying patches for cluster.
  • Adding new Data Nodes when needed and re-balancing the cluster
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Dealt with major and minor upgrades to the Hadoop cluster without decreasing cluster performance.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Performed stress and performance testing, benchmark for the cluster using DFSIO and Tera sort.
  • Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
  • Debug and solve the key issues with the help of Cloudera support team.
  • Supported 300+ business users to use Hadoop platform and resolving tickets and issues they run into and helping them to use best ways to achieve their results.
  • Continues integration of new services to the Hadoop cluster.
  • Installed several projects on Hadoop servers and configured each project to run jobs and scripts successfully.

Environment: CDH 3.x, 4.X, 5.X, Cloudera Manager 4&5, Nagios, Ganglia, Tableau, Shell Scripting, Oozie, Pig, Hive, Flume, Bash Scripting, Teradata, AbInitio, Kafka, Impala, Oozie, Sentry, CentOS

Confidential

Hadoop Engineer

Responsibilities:

  • Involved in architectural design cluster infrastructure, Resource mobilization, Risk analysis and reporting.
  • Installation and configuration of Big Insight Cluster with help of IBM engineers.
  • Commissioning and de-commissioning the data nodes and involve in NameNode maintenance.
  • Install security using Kerberos on cluster for AAA (authentication, authorization and auditing).
  • Regular backup and clear logs from HDFS space. This is to utilize data nodes optimally. Write shell scripts for time bound commands execution.
  • Edit and configure HDFS and tracker parameters.
  • Script the requirements using BigSQL and provide time statistics of running jobs.
  • Involved in code review tasks in simple to complex Map/reduce Jobs using Hive and Pig
  • Cluster Monitoring using Big Insights ionosphere tool.
  • Importing of data from various data sources, parse into structured data region wise and date wise.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized MapReduce code, pig scripts and performance tuning and analysis.
  • Implemented a POC with SparkSQL to interpret JSon records.
  • Created table definition and made the contents available as a Schema-Backed RDD.
  • Implemented advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.

Environment: Linux, Hadoop, Big Insights, Hive, puppet, Java, C++

Confidential

Linux Administrator

Responsibilities:

  • Provided system support for 100+ servers of Red hat Linux including routine maintenance, patching, and system backups and restores, and software and hardware upgrades.
  • Worked on building new SUSE & RedHat Linux servers, support lease replacements and implementing system patches using the HP Server Automation tool.
  • Configuration, implementation and administration of Clustered servers on SUSE Linux environment.
  • System administration, System planning, co-ordination and group level and user level management.
  • Create and manage Logical Volumes in Linux.
  • Worked on backup and recovery software like Net-backup on Linux environment.
  • Setting up NagiOS Monitoring software.
  • Resolved Security Access Requests via Peregrine Service center to provide the requested User access related requests.
  • Handling and generating tickets via the BMC Remedy ticketing tool.
  • Performance Monitoring and Performance Tuning using Top, prstat, sar, vmstat, netstat, jps, iostat etc.
  • Creating new file system, managing & checking data consistency of the file system.
  • Documented the procedure of obsolete servers resulting in considerable reduction in time and mistakes for this process as well streamlining the existing process.
  • Communicated and Coordinated with customers internal/external for resolving issues for reducing downtime.
  • Disaster Recovery and Planning.
  • Problem determination, Security, Shell Scripting.

Environment: REDHAT Linux 5, 6, HP Gen 8 Blades and Rack mount Servers, Oracle Virtual Box

Confidential

Linux/Database Administrator

Responsibilities:

  • Installing and maintaining the Linux servers
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers
  • Monitoring System Metrics and logs for any problems.
  • Adding, removing, or updating user account information and resetting passwords.
  • Creating and managing Logical volumes. Using Java JDBC to load data into MySQL
  • Maintaining the MySQL server and Authentication to required users for databases
  • Installing and updating packages using YUM
  • Patches installation and updating on server.
  • Virtualization on RHEL server (Through Xen & KVM Server)
  • Resize LVM disk volumes as needed. Administration of VMware virtual Linux serve
  • Installation and configuration of Linux for new build environment.
  • Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
  • Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitoring the System activity, Performance, Resource utilization.
  • Updating YUM Repository and Red Hat Package Manager (RPM).

Environment: Linux, Centos, Ubuntu, FTP, NTP, MySQL

We'd love your feedback!