We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

4.00/5 (Submit Your Rating)

Segundo, CA

SUMMARY:

  • Over 3 years of experience in Hadoop, Sqoop, Flume, Hive, PIG, MapReduce and HBASE with exposure to projects from different business verticals.
  • 5 years of database and Linux systems administration experience.
  • Expertise on Hadoop architecture, YARN, MapReduce and Hadoop Eco - System components.
  • Experience in planning, deploying and managing multi-node development, testing and production Hadoop clusters with eco-system components like HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG and ZOOKEEPER. Cluster management using Cloudera manager and Hortonworks Ambari.
  • Experience in analyzing the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing using key tab tools.
  • Experience on setting up Name Node high availability for major production clusters and designed automatic failover control using Zookeeper and Quorum journal nodes.
  • Experience in importing and exporting data from different databases like MySQL and Oracle DB into HDFS using Sqoop.
  • Worked on streaming the data into HDFS from web servers using flume.
  • Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
  • Experience in benchmarking, backup and disaster recovery of Name Node metadata.
  • Performed major and minor upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
  • Python and Shell scripting experience.
  • Knowledge of Java Virtual Machine and multi-thread processing. Hands on Java programming knowledge.
  • An expert in analyzing client’s existing Hadoop infrastructure and provide performance tuning accordingly.
  • Team player and self-starter with excellent communication skills and proven abilities to finish tasks on time.

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zoo Keeper, Cloudera Manager, Ambari

Security: Kerberos

Scripting: Python, Shell scripting, Puppet

Programming languages: Java, C, C++

Databases: MySQL, MsSQL, MongoDB, Cassandra

Monitoring Tools: Nagios, Ganglia, Cloudera Manager, Ambari

Operating Systems: Linux(RHEL/Ubuntu/CentOS), Windows(XP/7/8)

PROFESSIONAL EXPERIENCE:

Confidential, Segundo, CA

Hadoop Consultant

Responsibilities:

  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
  • Performed various configurations which Includes, networking and IP tables, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
  • Implemented authentication and authorization service using Kerberos authentication protocol.
  • Performed benchmarking on the Hadoop cluster using different bench marking mechanisms.
  • Tuned the cluster by Commissioning and decommissioning the DataNodes.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Upgraded the Hadoop clusters from CDH3 to CDH4.
  • Deployed high availability on the Hadoop cluster (quorum journal nodes).
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Deployed Network file system for NameNode Meta data backup.
  • Performed a POC on cluster back using distcp, Cloudera manager BDR and parallel ingestion.
  • Configured and deployed hive metastore using MySQL and thrift server.
  • Development of Pig scripts for handling the raw data for analysis.
  • Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
  • Deployed and configured flume agents to stream log events into HDFS for analysis.
  • Configured Oozie for workflow automation and coordination.
  • Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
  • Custom shell scripts for automating redundant tasks on the cluster.
  • Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.
  • Involved in loading data from UNIX file system to HDFS.
  • Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.

Environment: Map Reduce, HDFS, Hive, Pig, Flume, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos

Confidential, Pleasanton, CA

Hadoop Consultant

Responsibilities:

  • Performed both Major and Minor upgrades to the existing cluster and also rolling back to the previous version.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Used Ganglia and Nagios to monitor the cluster around the clock.
  • Implemented NFS, NAS and HTTP servers on Linux servers.
  • Created a local YUM repository for installing and updating packages.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Designed the shell script for backing up of important metadata.
  • HA implementation of Name Node to avoid single point of failure.
  • Designed the cluster so that only one Secondary name node daemon could be run at any given time.
  • Implemented Name node backup using NFS. This was done for High availability.
  • Supported Data Analysts in running Map Reduce Programs.
  • Worked on analyzing data with Hive and Pig.
  • Running cron-tab to back up data.
  • Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured IPTABLES rules to allow the connection of application servers to the cluster and also setup NFS exports list and blocked unwanted ports.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Responsible to manage data coming from different sources.

Environment: RHEL, CCH3, HDFS, Hive, Pig, Sqoop, Flume, Ganglia, Nagios, Kerberos

Confidential

Linux/MySQL Administrator

Responsibilities:

  • Installation of MYSQL (5.34/5.5/6) databases on Red hat Linux.
  • Created Mysql Database Backups and tested restore process on Test Environment
  • Created Database users and setup permissions on Dev and Test Servers
  • Setting up Replication master-slave, master-master and cascade replication for backup and reporting.
  • Created Mysql databases on production and development servers.
  • Verify free space host availability on (backup/archive) directories.
  • Monitored the database size and increased the size when required, analyzed the database tables and indexes and then rebuilt the indexes if were fragmentations in indexes.
  • Check completion results for any scheduled jobs, corns or data processing including refreshes.
  • Review daily invalid object reports for all databases.
  • Created users, allocation of appropriate table space quotas with necessary privileges and roles for MYSQL databases
  • Installation and configuration of Linux for new build environment.
  • Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
  • Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Experience with Linux internals, virtual machines, and open source tools/platforms.
  • Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
  • Ensured data recoverability by implementing system and application level backups.
  • Performed various configurations which include networking and IP Tables, resolving hostnames, SSH key less login.
  • Performed scheduled backup and necessary restoration.

Environment: MySQL 5.6/5.5/5.1, NAGIOS, MySQL FABRIC, MySQL Workbench, LINUX 5.0, 5.1

We'd love your feedback!