Hadoop Consultant Resume Segundo, CA - Hire IT People

SUMMARY:

Over 3 years of experience in Hadoop, Sqoop, Flume, Hive, PIG, MapReduce and HBASE with exposure to projects from different business verticals.
5 years of database and Linux systems administration experience.
Expertise on Hadoop architecture, YARN, MapReduce and Hadoop Eco - System components.
Experience in planning, deploying and managing multi-node development, testing and production Hadoop clusters with eco-system components like HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG and ZOOKEEPER. Cluster management using Cloudera manager and Hortonworks Ambari.
Experience in analyzing the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing using key tab tools.
Experience on setting up Name Node high availability for major production clusters and designed automatic failover control using Zookeeper and Quorum journal nodes.
Experience in importing and exporting data from different databases like MySQL and Oracle DB into HDFS using Sqoop.
Worked on streaming the data into HDFS from web servers using flume.
Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
Experience in benchmarking, backup and disaster recovery of Name Node metadata.
Performed major and minor upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
Python and Shell scripting experience.
Knowledge of Java Virtual Machine and multi-thread processing. Hands on Java programming knowledge.
An expert in analyzing client’s existing Hadoop infrastructure and provide performance tuning accordingly.
Team player and self-starter with excellent communication skills and proven abilities to finish tasks on time.

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zoo Keeper, Cloudera Manager, Ambari

Security: Kerberos

Scripting: Python, Shell scripting, Puppet

Programming languages: Java, C, C++

Databases: MySQL, MsSQL, MongoDB, Cassandra

Monitoring Tools: Nagios, Ganglia, Cloudera Manager, Ambari

Operating Systems: Linux(RHEL/Ubuntu/CentOS), Windows(XP/7/8)

PROFESSIONAL EXPERIENCE:

Confidential, Segundo, CA

Hadoop Consultant

Responsibilities:

Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
Performed various configurations which Includes, networking and IP tables, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
Implemented authentication and authorization service using Kerberos authentication protocol.
Performed benchmarking on the Hadoop cluster using different bench marking mechanisms.
Tuned the cluster by Commissioning and decommissioning the DataNodes.
Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Upgraded the Hadoop clusters from CDH3 to CDH4.
Deployed high availability on the Hadoop cluster (quorum journal nodes).
Implemented automatic failover zookeeper and zookeeper failover controller.
Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
Implemented Kerberos for authenticating all the services in Hadoop Cluster.
Deployed Network file system for NameNode Meta data backup.
Performed a POC on cluster back using distcp, Cloudera manager BDR and parallel ingestion.
Configured and deployed hive metastore using MySQL and thrift server.
Development of Pig scripts for handling the raw data for analysis.
Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
Deployed and configured flume agents to stream log events into HDFS for analysis.
Configured Oozie for workflow automation and coordination.
Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
Custom shell scripts for automating redundant tasks on the cluster.
Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.
Involved in loading data from UNIX file system to HDFS.
Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.

Environment: Map Reduce, HDFS, Hive, Pig, Flume, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos

Confidential, Pleasanton, CA

Hadoop Consultant

Responsibilities:

Performed both Major and Minor upgrades to the existing cluster and also rolling back to the previous version.
Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
Implemented Map Reduce jobs in HIVE by querying the available data.
Used Ganglia and Nagios to monitor the cluster around the clock.
Implemented NFS, NAS and HTTP servers on Linux servers.
Created a local YUM repository for installing and updating packages.
Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
Designed the shell script for backing up of important metadata.
HA implementation of Name Node to avoid single point of failure.
Designed the cluster so that only one Secondary name node daemon could be run at any given time.
Implemented Name node backup using NFS. This was done for High availability.
Supported Data Analysts in running Map Reduce Programs.
Worked on analyzing data with Hive and Pig.
Running cron-tab to back up data.
Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
Implemented Kerberos for authenticating all the services in Hadoop Cluster.
Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
Designed and allocated HDFS quotas for multiple groups.
Configured IPTABLES rules to allow the connection of application servers to the cluster and also setup NFS exports list and blocked unwanted ports.
Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Responsible to manage data coming from different sources.

Environment: RHEL, CCH3, HDFS, Hive, Pig, Sqoop, Flume, Ganglia, Nagios, Kerberos

Confidential

Linux/MySQL Administrator

Responsibilities:

Installation of MYSQL (5.34/5.5/6) databases on Red hat Linux.
Created Mysql Database Backups and tested restore process on Test Environment
Created Database users and setup permissions on Dev and Test Servers
Setting up Replication master-slave, master-master and cascade replication for backup and reporting.
Created Mysql databases on production and development servers.
Verify free space host availability on (backup/archive) directories.
Monitored the database size and increased the size when required, analyzed the database tables and indexes and then rebuilt the indexes if were fragmentations in indexes.
Check completion results for any scheduled jobs, corns or data processing including refreshes.
Review daily invalid object reports for all databases.
Created users, allocation of appropriate table space quotas with necessary privileges and roles for MYSQL databases
Installation and configuration of Linux for new build environment.
Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Experience with Linux internals, virtual machines, and open source tools/platforms.
Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
Ensured data recoverability by implementing system and application level backups.
Performed various configurations which include networking and IP Tables, resolving hostnames, SSH key less login.
Performed scheduled backup and necessary restoration.

Environment: MySQL 5.6/5.5/5.1, NAGIOS, MySQL FABRIC, MySQL Workbench, LINUX 5.0, 5.1

We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Segundo, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship