Hadoop Consultant Resume
Segundo, CA
SUMMARY:
- Over 3 years of experience in Hadoop, Sqoop, Flume, Hive, PIG, MapReduce and HBASE with exposure to projects from different business verticals.
- 5 years of database and Linux systems administration experience.
- Expertise on Hadoop architecture, YARN, MapReduce and Hadoop Eco - System components.
- Experience in planning, deploying and managing multi-node development, testing and production Hadoop clusters with eco-system components like HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG and ZOOKEEPER. Cluster management using Cloudera manager and Hortonworks Ambari.
- Experience in analyzing the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing using key tab tools.
- Experience on setting up Name Node high availability for major production clusters and designed automatic failover control using Zookeeper and Quorum journal nodes.
- Experience in importing and exporting data from different databases like MySQL and Oracle DB into HDFS using Sqoop.
- Worked on streaming the data into HDFS from web servers using flume.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
- Experience in benchmarking, backup and disaster recovery of Name Node metadata.
- Performed major and minor upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
- Python and Shell scripting experience.
- Knowledge of Java Virtual Machine and multi-thread processing. Hands on Java programming knowledge.
- An expert in analyzing client’s existing Hadoop infrastructure and provide performance tuning accordingly.
- Team player and self-starter with excellent communication skills and proven abilities to finish tasks on time.
TECHNICAL SKILLS:
Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zoo Keeper, Cloudera Manager, Ambari
Security: Kerberos
Scripting: Python, Shell scripting, Puppet
Programming languages: Java, C, C++
Databases: MySQL, MsSQL, MongoDB, Cassandra
Monitoring Tools: Nagios, Ganglia, Cloudera Manager, Ambari
Operating Systems: Linux(RHEL/Ubuntu/CentOS), Windows(XP/7/8)
PROFESSIONAL EXPERIENCE:
Confidential, Segundo, CA
Hadoop Consultant
Responsibilities:
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Performed various configurations which Includes, networking and IP tables, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
- Implemented authentication and authorization service using Kerberos authentication protocol.
- Performed benchmarking on the Hadoop cluster using different bench marking mechanisms.
- Tuned the cluster by Commissioning and decommissioning the DataNodes.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Upgraded the Hadoop clusters from CDH3 to CDH4.
- Deployed high availability on the Hadoop cluster (quorum journal nodes).
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Network file system for NameNode Meta data backup.
- Performed a POC on cluster back using distcp, Cloudera manager BDR and parallel ingestion.
- Configured and deployed hive metastore using MySQL and thrift server.
- Development of Pig scripts for handling the raw data for analysis.
- Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
- Deployed and configured flume agents to stream log events into HDFS for analysis.
- Configured Oozie for workflow automation and coordination.
- Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
- Custom shell scripts for automating redundant tasks on the cluster.
- Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.
- Involved in loading data from UNIX file system to HDFS.
- Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.
Environment: Map Reduce, HDFS, Hive, Pig, Flume, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos
Confidential, Pleasanton, CAHadoop Consultant
Responsibilities:
- Performed both Major and Minor upgrades to the existing cluster and also rolling back to the previous version.
- Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Implemented NFS, NAS and HTTP servers on Linux servers.
- Created a local YUM repository for installing and updating packages.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Designed the shell script for backing up of important metadata.
- HA implementation of Name Node to avoid single point of failure.
- Designed the cluster so that only one Secondary name node daemon could be run at any given time.
- Implemented Name node backup using NFS. This was done for High availability.
- Supported Data Analysts in running Map Reduce Programs.
- Worked on analyzing data with Hive and Pig.
- Running cron-tab to back up data.
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Designed and allocated HDFS quotas for multiple groups.
- Configured IPTABLES rules to allow the connection of application servers to the cluster and also setup NFS exports list and blocked unwanted ports.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources.
Environment: RHEL, CCH3, HDFS, Hive, Pig, Sqoop, Flume, Ganglia, Nagios, Kerberos
ConfidentialLinux/MySQL Administrator
Responsibilities:
- Installation of MYSQL (5.34/5.5/6) databases on Red hat Linux.
- Created Mysql Database Backups and tested restore process on Test Environment
- Created Database users and setup permissions on Dev and Test Servers
- Setting up Replication master-slave, master-master and cascade replication for backup and reporting.
- Created Mysql databases on production and development servers.
- Verify free space host availability on (backup/archive) directories.
- Monitored the database size and increased the size when required, analyzed the database tables and indexes and then rebuilt the indexes if were fragmentations in indexes.
- Check completion results for any scheduled jobs, corns or data processing including refreshes.
- Review daily invalid object reports for all databases.
- Created users, allocation of appropriate table space quotas with necessary privileges and roles for MYSQL databases
- Installation and configuration of Linux for new build environment.
- Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
- Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Experience with Linux internals, virtual machines, and open source tools/platforms.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recoverability by implementing system and application level backups.
- Performed various configurations which include networking and IP Tables, resolving hostnames, SSH key less login.
- Performed scheduled backup and necessary restoration.
Environment: MySQL 5.6/5.5/5.1, NAGIOS, MySQL FABRIC, MySQL Workbench, LINUX 5.0, 5.1
