We provide IT Staff Augmentation Services!

Lead Hadoop System Administrator Resume

3.00/5 (Submit Your Rating)

CAREER OBJECTIVE:

Over 8 years information technology and 6 years as oracle database administrator2 years in Hadoop administration with Cloudera manager and Ambari.

PROFESSIONAL SUMMARY:

  • 2 years’ experience in installation and configuration of Hadoop ecosystem like Hdfs, Hive, Impala, Yarn, HBase, Sqoop, Flume, Oozie and Spark, Kafka
  • Well - versed with Hadoop distribution in Cloudera's CDH & Hortonworks HDP
  • Knowledge of NameNode High Availability, Cluster planning and MapReduce framework, Configuring Schedulers as needed, Configure local CDH and HDP repository.
  • Experience in Benchmarking, Commissioning and decommissioning of data nodes on Hadoop cluster
  • Hands on experience in importing and exporting data from RDBMS like Oracle, MySQL into HDFS and Hive Warehouse using Sqoop. Also with other data ingestion like flume and kafka
  • Hands on backup and recovery using Snapshot methods, distcp, and Cloudera BDR methodologies.
  • Hands-on experience in configuration and management of security for Hadoop cluster using Kerberos, Sentry in cloudera, Ranger in hortonworks, HDFS ACLs and Data Encryption methodology as recommended by both Cloudera and Hortonworks
  • Knowledge of scripting tools such as bash shell scripts.
  • Experience in supporting L1, L2 issues on Hadoop production cluster.
  • Experience in supporting systems with 24X7 availability and monitoring.
  • Experience working in cross-functional, multi-location teams.
  • 5+ years Oracle DBA experience of daily support of corporate management systems including database management and administration, analysis, design and architecture in client/server and web-based environments with reference to Oracle
  • Experience in Oracle 11g/10g/9i DBA duties in high availability production environment
  • Strong experience with Installing, configuring, creating, supporting and managing Oracle Database systems
  • Knowledge of 11gR2 Real Application Cluster (RAC)
  • Experience with installing and Managing RAC Configuration
  • Experience with installing and configuring oracle Grid infrastructure on Linux
  • Experience in managing RAC database using CRSCTL, SRVCTL and Grid Control, use of data guard broker to manage primary and standby database in dataguard
  • 4+ years’ experience writing SQL, batch and Shell scripts for backups, SQL*Loader, export and import of database
  • Adept in providing custom Oracle server and client installation
  • Knowledge of PL/SQL
  • Strong experience with performance monitoring/proactive and reactive tuning/troubleshooting of Oracle systems running off: UNIX, Linux, and Windows NT/2000/2003 servers
  • System resource management including bottleneck detection, contention tuning on SGA, CPU, Memory and I/O
  • Ability to prioritize and meet operational deadlines in fast paced environment with good stress management skills
  • Perform daily monitoring of Oracle instances, running Statspack, AWR reports to monitor tablespaces, memory structures, undo segments, logs, and alerts
  • Expert in systems analysis and architecture, capacity planning, backup/recovery, installation, configuration, apply patches, troubleshooting, performance monitoring and tuning
  • Strong experience with design, development, and testing of backup and recovery to ensure complete recoverability Experience in Data Guard
  • Experience in RMAN Backup and Recovery
  • Solid experience in disaster recovery, routine maintenance for Oracle databases
  • Security management for database, network, and operating systems
  • 2 solid years of managing database with RAC
  • Maintains and administers Greenplum databases and servers in an AWS environment

TECHNICAL SKILLS:

Cloudera Hadoop Ecosystems: HDFS, Hive, Impala, Yarn, Spark HBase, Sqoop, Flume, Oozie, Pig, Hue, Sentry

Hortonworks Hadoop Ecosystems: HDFS, HIVE, YARN, HBASE, Sqoop, Ranger, Kafka

Distribution platforms: Cloudera manager (5.9, 5.12, 5.14) Hortonworks-Ambari (2.5, 2.6) Oracle BDA 4.7, 4.9

Database: Oracle, PostgreSQL, Mysql

Network: TCP/IP, HTTP/HTTPS, SSH, FTP

OS: Linux - RHE, Centos 6 & 7

Monitoring tools: Cloudera and Ambari manager, Ganglia, Oracle ILOM & OEM cloud control

Security: Knox, Ranger, Sentry, HDFS ACLs & Kerberos

Cloud System: Google Cloud & AWS Cloud

Automation & continuous integration tool: Jenkins, Ansible & GitHub

WORK EXPERIENCE:

Confidential

Lead Hadoop system administrator

Responsibilities:

  • Deployment of Hadoop version 2 using Cloudera Distribution Hub on Pre-production and QA cluster and also Various Hadoop services such as HDFS, YARN, Map-Reduce, Sqoop, Flume, Pig, Hive, Zookeeper, Oozie, Kafka, Storm, HBase, Spark.
  • Enabled NameNode High Availability on Pre-production cluster.
  • Benchmarked the cluster using TestDFSIO, TeraSort and Teragen for measuring HDFS I/O (read/write) performance and to measures time for sorting data with MapReduce.
  • Set up production Hadoop clusters with optimum configurations.
  • Experience in installing and setting up Kerberos.
  • Worked with data delivery teams to setup new Hadoop users, Create Kerberos principal and testing HDFS, Hive, Pig and MapReduce access for the new users
  • Provided support for data analysts, Pig and Hive developers.
  • Troubleshoot the issues by analyzing various log files and raise tickets with Cloudera support if needed.
  • Automated various tasks using shell scripts.
  • Installed and configured CDH 5.13.0 cluster, using Cloudera manager.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Developed scripts for benchmarking with Teragen, Terasort & Teravalidate
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
  • Monitored workload, job performance and capacity planning.
  • Managing and reviewing Hadoop log files and debugging failed jobs.
  • Supported cluster maintenance, Backup and recovery for production cluster.
  • Backed up data on regular basis to a remote cluster using distcp
  • Fine tuning of Hive jobs for better performance.
  • Collected and aggregated large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Implemented Fair scheduler and capacity scheduler to allocate fair amount of resources to small jobs.

Environment: HDFS 2.6.0, Mapreduce 2.6.0, Hive 1.1.0, Yarn 2.6.0, Hbase 1.2.0, Sqoop 1.4.6, Oozie 4.1.0, Zookeeper 3.4.5, Kerberos 0.8

Confidential, Dallas Tx

Lead Hadoop administrator

Responsibilities:

  • Deployment of Hadoop version 2 using Cloudera Distribution Hub on Pre-production and QA cluster and also Various Hadoop services such as HDFS, YARN, Map-Reduce, Sqoop, Flume, Pig, Hive, Zookeeper, Oozie, Kafka, Storm, HBase, Spark.
  • Enabled NameNode High Availability on Pre-production cluster.
  • Benchmarked the cluster using TestDFSIO, TeraSort and Teragen for measuring HDFS I/O (read/write) performance and to measures time for sorting data with MapReduce.
  • Set up production Hadoop clusters with optimum configurations.
  • Experience in installing and setting up Kerberos.
  • Worked with data delivery teams to setup new Hadoop users, Create Kerberos principal and testing HDFS, Hive, Pig and MapReduce access for the new users
  • Provided support for data analysts, Pig and Hive developers.
  • Troubleshoot the issues by analyzing various log files and raise tickets with Cloudera support if needed.
  • Automated various tasks using shell scripts.
  • Installed and configured CDH 5.13.0 cluster, using Cloudera manager.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Developed scripts for benchmarking with Teragen, Terasort / Teravalidate
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
  • Monitored workload, job performance and capacity planning.
  • Managing and reviewing Hadoop log files and debugging failed jobs.
  • Supported cluster maintenance, Backup and recovery for production cluster.
  • Backed up data on regular basis to a remote cluster using distcp
  • Fine tuning of Hive jobs for better performance.
  • Collected and aggregated large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Implemented Fair scheduler and capacity scheduler to allocate fair amount of resources to small jobs.

Environment: HDFS 2.6.0,Mapreduce 2.6.0,Hive 1.1.0, Yarn 2.6.0, Hbase 1.2.0,Sqoop 1.4.6, Oozie 4.1.0, Zookeeper 3.4.5, Kerberos 0.8, BDA 4.7, 4.9

Confidential

Hadoop Administrator

Responsibilities:

  • Deployment of Hadoop version 2 using Cloudera Distribution Hub on Pre-production and QA cluster and also Various Hadoop services such as HDFS, YARN, Map-Reduce, Sqoop, Flume, Pig, Hive, Zookeeper, Oozie, Kafka, Storm, HBase, Spark.
  • Enabled NameNode High Availability on Pre-production cluster.
  • Benchmarked the cluster using TestDFSIO, TeraSort and Teragen for measuring HDFS I/O (read/write) performance and to measures time for sorting data with MapReduce.
  • Set up production Hadoop clusters with optimum configurations.
  • Experience in installing and setting up Kerberos.
  • Worked with data delivery teams to setup new Hadoop users, Create Kerberos principal and testing HDFS, Hive, Pig and MapReduce access for the new users
  • Provided support for data analysts, Pig and Hive developers.
  • Troubleshoot the issues by analyzing various log files and raise tickets with Cloudera support if needed.
  • Automated various tasks using shell scripts.
  • Installed and configured CDH 5.13.0 cluster, using Cloudera manager.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Developed scripts for benchmarking with Teragen, Terasort & Teravalidate
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
  • Monitored workload, job performance and capacity planning.
  • Managing and reviewing Hadoop log files and debugging failed jobs.
  • Supported cluster maintenance, Backup and recovery for production cluster.
  • Backed up data on regular basis to a remote cluster using distcp
  • Fine tuning of Hive jobs for better performance.
  • Collected and aggregated large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Implemented Fair scheduler and capacity scheduler to allocate fair amount of resources to small jobs.

Environment: HDFS 2.6.0, Mapreduce 2.6.0, Hive 1.1.0, Yarn 2.6.0, Hbase 1.2.0, Sqoop 1.4.6, Oozie 4.1.0, Zookeeper 3.4.5, Kerberos 0.8

Confidential

Oracle DBA

Responsibilities:

  • Deployment of Hadoop version 2 using Cloudera Distribution Hub on Pre-production and QA cluster and also Various Hadoop services such as HDFS, YARN, Map-Reduce, Sqoop, Flume, Pig, Hive, Zookeeper, Oozie, Kafka, Storm, HBase, Spark.
  • Enabled NameNode High Availability on Pre-production cluster.
  • Benchmarked the cluster using TestDFSIO, TeraSort and Teragen for measuring HDFS I/O (read/write) performance and to measures time for sorting data with MapReduce.
  • Set up production Hadoop clusters with optimum configurations.
  • Experience in installing and setting up Kerberos.
  • Worked with data delivery teams to setup new Hadoop users, Create Kerberos principal and testing HDFS, Hive, Pig and MapReduce access for the new users
  • Provided support for data analysts, Pig and Hive developers.
  • Troubleshoot the issues by analyzing various log files and raise tickets with Cloudera support if needed.
  • Automated various tasks using shell scripts.
  • Installed and configured CDH 5.10.0 cluster, using Cloudera manager.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Developed scripts for benchmarking with Terasort / Teragen.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
  • Monitored workload, job performance and capacity planning.
  • Managing and reviewing Hadoop log files and debugging failed jobs.
  • Supported cluster maintenance, Backup and recovery for production cluster.
  • Backed up data on regular basis to a remote cluster using distcp
  • Fine tuning of Hive jobs for better performance.
  • Collected and aggregated large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Implemented Fair scheduler and capacity scheduler to allocate fair amount of resources to small jobs.

Environment: HDFS 2.6.0, Mapreduce 2.6.0,Hive 1.1.0, Yarn 2.6.0, Hbase 1.2.0,Sqoop 1.4.6, Oozie 4.1.0, Zookeeper 3.4.5, Kerberos 0.8

Confidential

Production Oracle DBA

Responsibilities:

  • Migration of databases from 11.2.0.2/10.2.02 to 11.2.0.3
  • Migration of databases from standalone server to Exadata database Machine
  • Refresh database using RMAN and datapump export/import
  • Work with developer to configure and install informatica HUBS
  • Generate scripts to automate processes
  • Create RAC database using DBCA
  • Manage database Using TOAD, OEM 12c and SQL*Developer
  • Configure and setup email notification in OEM 12cCreate metric template and assigned it to target in OEM 12c
  • Discover and promote data target in OEM 12c
  • Create blackout for database using OEM 12c to cover maintenance period
  • Monitor database performance and resolve performance issues
  • Run AWR/ADDM to find course of issues and recommendation
  • Troubleshoot and resolved user error and handle request from developers
  • Convert single instance database to RAC with ASM
  • Clone database and upgrade to 11.2.0.3
  • Monitors database activity and file usage, and ensures necessary resources are present so that databases function properly by removing or deleting old or obsolete files

We'd love your feedback!