Hadoop Administrator Resume NYC, NY - Hire IT People

SUMMARY

Strong 9 years of hands - on experience managing and supporting a wide array of software applications from cluster of the 3 PostgreSQL instances to Hadoop cluster from 400+ machines.
Certified CDH Hadoop and Hbase Administrator, and am currently involved in administration, management and support of Big Data applications
Extensive experience with Database administration, maintenance, and schema design for PostgreSQL and MS SQL Server.
Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS, Mapreduce, NameNode, Datanode, Resource Manager, Node Manager, Job Tracker, Task Tracker programming paradigm and Hadoop Ecosystem (Hive, Impala, Sqoop, Flume, Oozie, Zookeeper, Kafka, Spark).
Experience with Cloudera Manager Administration also experience In Installing, UpdatingHadoopand its related components in Single node cluster as well as Multi node cluster environment using Apache, Cloudera.
Experience in Database Administration, performance tuning and backup & recovery and troubleshooting in large scale customer facing environment.
Experience in managing and reviewing Hadoop log files.
Experience in analyzing data using HiveQL, impala and custom MapReduce programs in Java.
Extending Hive and Pig core functionality by writing custom UDFs.
Experience in data management and implementation of Big Data applications using Hadoop frameworks.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data Ingestion, Oozie for scheduling and HBase as a NoSQL data store.
Experienced in deployment of Hadoop Cluster using Ambari, Cloudera Manager.
Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
Proficient in configuring Zookeeper, Flume & Sqoop to the existing Hadoop cluster.
Having good Knowledge in Apache Flume, Sqoop, Hive, Hcatalog, Impala, zookeeper, oozie, Ambari, chef.
Expertise in deployment ofHadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
Servers and maintaining Load balancing and high availability.
Very Good Knowledge in YARN (Hadoop2.x.x) terminology and High availabilityHadoopClusters.
Experience in analysing the log files forHadoopand ecosystem services and finding out the root cause.
Performed Thread Dump Analysis for stuck threads and Heap Dump Analysis for leaked memory with Memory analyzer tool manually.
Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
Involved in all phases of Software Development Life Cycle (SDLC) in large-scale enterprise software using Object Oriented Analysis and Design.
Provided 24/7 on-call Support for production.
Co-ordination with different tighter schedules and efficient in meeting deadlines.
Self- starter, fast learner and a team player with strong communication and interpersonal skills.

TECHNICAL SKILLS

Operating Systems: Sun Solaris, HP-UX, Linux (Centos 5/6, Red Hat 4/5/6, Ubuntu .04 ), Windows

Databases: PostgreSQL 8.4-9.6, Oracle 9i/10g

NoSQL: Redis 2.2-3.2, Memcached 1.2/1.4

Big Data: Hadoop 0.20/2.4/2.5/2.6 , CDH4/5, MRv1/v2, Hive, Impala, HBase, Sqoop, Flume, Oozie, Zookeeper, Kafka.

Tools: Liquibase

Virtualization: KVM, VirtualBox, Docker

Clouds: AWS

Provisioning: ansible, chef, puppet, fabric

Development: git, subversion, mvn, JDK 1.6/1.7/1.8

Web: Apache HTTP 2.2/2.4, Apache Tomcat 5.0/6.0/7.0, Nginx 1.x, Citrix Netscaler 10.x

Others: Jenkins, Nexus, Artifactory, Sonar

Programming Lang: Python, Bash, awk, sed, Java, C

PROFESSIONAL EXPERIENCE

Confidential, NYC, NY

Hadoop Administrator

Responsibilities:

Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & reviewHadooplog files.
Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
Capacity planning, hardware recommendations, performance tuning and benchmarking.
Cluster balancing and performance tuning of Hadoop components like HDFS, Hive, Impala, MRv2.
Adding and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
Implemented Fair schedulers on the Resource Manager to share the resources of the Cluster for the MRv2 jobs given by the users.
Configuring LDAP for Hadoop cluster
Worked with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Working with data delivery teams to setup new Hadoop users, includes setting up Linux users and testing HDFS, Hive, Pig and MRv2 access for the new users.
Experience in Setting up Data Ingestion tools like Flume, Sqoop.
Install and setup Hbase, Hive, Impala.
Setting up Quotas on HDFS, implementing Rack Topology Scripts.
Perform investigation and migration from MRv1 to MRv2.
Worked with Big Data Analysts, Designers and Scientists in troubleshooting MRv1/MRv2 job failures and issues with Hive, Pig, Flume, Apache Spark.
Utilized Apache Spark for Interactive Data Mining and Data Processing.
Accommodate load in its place before the data is analyzed using Apache Kafka with its fast, scalable, fault-tolerant system.
Configuring Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Handle the data exchange between HDFS & Web Applications and databases using Flume and Sqoop.
Used Hive and create Hive tables involved in data loading.
Expertise inHadoopStack MRv2, Sqoop, Flume, Pig, Hive, Hbase, Kafka, Spark.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Setup automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
Experience with luigi framework to build various etl pipelines.
Setup automated processes to archive/clean the unwanted data on the cluster, on Name node and Standby node.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
Worked with systems engineering team to plan and deploy newHadoopenvironments and expand existingHadoopclusters.
Supported technical team members in management and review of Hadoop log files and data backups.
Participated in development and execution of system and disaster recovery processes.
Experience with cloud AWS and service like EC2, ELB, RDS, ElastiCache, Route53, EMR.
Setup provisioning for EC2 instances with Bash, Chef, Puppet and Ansible.
Worked with ETL team and develop different scripts and applications to enhance existing and create new etl pipelines.
Hands on experience with Infobright.
Develop and open source Impala/Hive Liquibase plugin to schema migration in CI/CD pipelines
Hands on experience with container technologies such as Docker, embed containers in existing CI/CD pipelines.
Set up independent testing lifecycle for CI/CD scripts with Vagrant and Virtualbox.
Involve in different automation activities.

Environment: Hadoop, MapReduce2, Hive, HDFS, Sqoop, Oozie, CDH, Flume, Kafka, Spark, HBase, Zookeeper, Impala, LDAP, NoSQL, MySQL, Infobright, Linux, AWS, Ansible, Puppet, Chef.

Confidential

Hadoop/PostgreSQL Administrator

Responsibilities:

Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadooplog files.
Experience in vanilla Hadoop administration, management without any GUI tools.
Enhance and develop all LSB scripts for vanilla Hadoop and ecosystem.
Capacity planning, hardware recommendations, performance tuning and benchmarking.
Cluster balancing and performance tuning of Hadoop components like HDFS, Hive, Impala, MapReduce.
Collaborating with software engineers to setup user experiencing service based on Flume and Hadoop.
Taking Backups of NameNode meta data. Test backup consistency.
Setup HA for Hadoop CDH4 cluster, Apache Tomcat Web Server, using open source tools such as Corosync, Pacemaker, DRBD.
Configure HA multiregional cluster of the PostgreSQL database using existing PostgreSQL possibilities, such as Log Shipping, Hot Standby.
R&D researches with Multi Master replication in PostgreSQL based on Bucardo.
Organize and implement different Load Balancing solutions for PostgreSQL cluster including PgPool-II, PgBouncer.
Perform an performance auditing of the PostgreSQL RDBMS on everyday basic which includes profiling of the SQL queries, log and database health metric collection.
Management and administrating of the Citrix Netscaler application.
Develop and open source Netscaler Manager app which utilized NITRO api to communicate with Citrix Netscaler.
Set up Puppet master and agent to provisioning bare metal and virtual infrastructure.
Manage Hadoop/PostgreSQL cluster configuration with Puppet.
Implement HA cluster of the Redis instances using Corosync and Pacemaker.
Adding and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
Managing Hadoop in multi tenant environment which includes proper scheduler configuration for MRv1.
Configuring LDAP for Hadoop cluster.
Managing HA LDAP cluster on Linux.
Configure replication for HA LDAP cluster on Linux.
Deploy new hardware and software environments required for PostgreSQL/Hadoop and expand existing environment.
Working with data delivery teams to setup new Hadoop users, includes setting up Linux users, testing HDFS, Hive, Impala and MapReduce access for the new users, through the Hue Web UI.
Experience in Setting up Data Ingestion tools like Flume.
Install and setup Hbase, Hive and Impala.
Setup and manage of the Apache Zookeeper ensemble.
Worked with Business Analysts, Software Developers in troubleshooting MRv1 job failures and issues with Hive, Impala, Flume.
Setup different ETL jobs, such as job converting raw TEXT to optimized formats ORC, Parquet.
Management of the Apache ActiveMQ brokers, tuning for performance.
Setup backup and recovery job for PostgreSQL database with subsequent validation on regular basic.
Handle the data exchange between Tomcat instances, Flume and HDFS.
Used Hive and created Hive tables involved in data loading.
Expertise inHadoopstack MRv1, Impala, Hive, and Hbase.
Used MRv1 cron job to analyze user experiencing metrics and send them to Graphite.
Configure Splunk to collect the logs from all servers for further analysis.
Setup and configure different triggers in Zabbix to monitor for Hadoop, PostgreSQL and Redis clusters.
Involve in migration from vanilla Hadoop to CDH4.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
Participated in development and execution of system and disaster recovery processes.
Use python Fabric framework to building various CI/CD pipelines.
Monitoring Java application using JMX.
Performance Tuning, Client/Server Connectivity and Database Consistency Checks using different Utilities.

Environment: PostgreSQL, Hadoop, MapReduce, Hive, HDFS, CDH, Flume, HBase, Zookeeper, Impala, Splunk, LDAP, NoSQL, Linux, Puppet.

Confidential

Oracle/PostgreSQL Database Administrator

Responsibilities:

Setup Oracle 10g database on different platforms Red Hat, Solaris.
Provisioning and setup new hardware boxes.
Performing database upgrades as part of new releases.
Migrated databases between machines.
Installed patch sets and upgraded Oracle (RDBMS).
Setup monitoring solution for Oracle RDBMS using Nagios.
Support various telco services based on Oracle RDBMS, IVR, VoiceMail, VideoMail, USSD, SMS.
Proposed a solution for migration from Oracle to PostgreSQL for some services(112, 911 analogue in US).
Capacity planning, hardware recommendations, performance tuning and benchmarking.
Handling all hardware issue.
Configure and install PostgreSQL database on Red Hat and Solaris systems.
Organize and implement different Load Balancing solutions for PostgreSQL cluster including PgPool-II, PgBouncer.
Perform an performance auditing of the PostgreSQL RDBMS on everyday basic which includes profiling of the SQL queries, log and database health metrics collection.
Set up backup procedure for PostgreSQL using PITR with further validation.
Develop bunch of housekeeping perl scripts for managing PostgreSQL and Oracle RDBMS.
Managed tablespaces and other database objects.
Created custom schemas, maintained archived log files.
Coordinated and interfaced with vendor support for upgrades, software, hardware issues and tars.
User account management - creating users, disabling users, responsibility management.
Setup NRPE based checking for PostgreSQL and Oracle RDBMS.
Develop script for automatic population of the PostgreSQL database for 112 service with updates from emergency services.
Worked with vendors SWE in troubleshooting services
Monitoring Java application using JMX.
Setup and working experience with bunch of Web Servers, such as Apache HTTP Server, Apache Tomcat, Nginx servers
Carrying configurations of hardware and software in the event of unforeseen situations that could lead to malfunction.
Resolve emergencies.
Development and maintenance of working documents.
Maintaining and debugging existing applications that interfaces with a database back end.
Reporting on incidents and following through a resolution.
Identifying offending processed in various systems and prevents future occurrences.
Creating technical procedures to prevent unscheduled outages.
Performance Tuning, Client/Server Connectivity and Database Consistency Checks using different Utilities.
Experience with enterprise HW such as HP, Sun (Oracle).

Environment: Oracle, PostgreSQL, Apache Tomcat, Apache HTTP, Nginx, Linux, Unix, JDK1.6, Solaris 9/10, RHEL 4/5, SQL.

We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

Nyc, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship