Hadoop Engineer Resume Mountain view, CA - Hire IT People

SUMMARY:

3 years of experience in Hadoop, Sqoop, Flume, Hive, PIG, MapReduce and HBASE with exposure to projects from different business verticals.
4 years of database and Linux systems administration experience.
Expertise on Hadoop architecture, YARN, MapReduce and Hadoop EcoSystem components.
Experience in planning, deploying and managing multi - node development, testing and production Hadoop clusters with eco-system components like HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG and ZOOKEEPER. Cluster management using Cloudera manager and Hortonworks Ambari.
As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
Experience in Amazon AWS cloud services (EC2, EBS, S3). Experience in managing Hadoop clusters using Cloudera Manager Tool.
Experience in analyzing the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing using key tab tools.
Experience on setting up NameNode high availability for major production clusters and designed automatic failover control using Zookeeper and Quorum journal nodes.
Experience in importing and exporting data from different databases like MySQL and Oracle DB into HDFS using Sqoop.
Worked on streaming the data into HDFS from web servers using flume.
Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
Experience in benchmarking, backup and disaster recovery of NameNode metadata.
Performed major and minor upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
Python and Shell scripting experience.
Knowledge of Java Virtual Machine and multi-thread processing. Hands on Java programming knowledge.
Expert in analyzing client’s existing Hadoop infrastructure and provide performance tuning accordingly.
Team player and self-starter with excellent communication skills and proven abilities to finish tasks on time.

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Cloudera Manager

Security: Kerberos

Scripting: Python, Shell scripting, Chef

Programming Languages: Java, C, C++

Databases: MySQL, MsSQL, MongoDB, Cassandra

Monitoring Tools: Nagios, Ganglia, Cloudera Manager

Operating Systems: Linux(RHEL/Ubuntu/CentOS), Windows(XP/7/8)

PROFESSIONAL EXPERIENCE:

Confidential, Mountain view, CA

Hadoop Engineer

Responsibilities:

Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
Experience in Amazon AWS cloud services (EC2, EBS, S3). Experience in managing Hadoop clusters using Cloudera Manager.
Experience in tableau administration.
Implemented LTM over hive servers to reach the maximum utilization and failover.
Experience in benchmarking Hadoop cluster for analysis of queue usage.
Involved in setting up the chef server to push the configuration across the cluster.
Involved in extracting the data from various sources into Hadoop HDFS for processing.
Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
Rack Aware Configuration.
Good Experience on HIVE writing SQLs.
Experience in loading data from various data sources to HDFS using Kafka.
Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
Expertise using Apache Spark fast engine for large-scale data processing and shark fast hive SQL on spark.
Worked on a POC for performance evaluation of using Apache spark/Shark against Apache Hive
Experience in HDFS data storage and support for running mapreduce jobs.
Experience in scheduling the jobs through Tidal, Oozie.
Experience in Monitoring, scheduling Informatica jobs through tidal.
Experience in monitoring tools like Ganglia, Nagios
Commissioning and decommissioning of Hadoop nodes to tune the cluster.
Involved in setting up the Kerberos authentication.
Experience in Vastool for managing the user and dataset permissions.
Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.

Environment: MapReduce, HDFS, Hive, Pig, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos, Ganglia, Tidal, Tableau, Informatica.

Confidential, Segundo, CA

Hadoop Admin

Responsibilities:

Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
Performed various configurations which Includes, networking and IP tables, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
Implemented authentication and authorization service using Kerberos authentication protocol.
Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms.
Tuned the cluster by Commissioning and decommissioning the Data Nodes.
Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Upgraded the Hadoop clusters from CDH3 to CDH4.
Deployed high availability on the Hadoop cluster (quorum journal nodes).
Implemented automatic failover zookeeper and zookeeper failover controller.
Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
Implemented Kerberos for authenticating all the services in Hadoop Cluster.
Deployed Network file system for NameNode Metadata backup.
Performed a POC on cluster back using distcp, Cloudera manager BDR and parallel ingestion.
Configured and deployed hive metastore using MySQL and thrift server.
Development of Pig scripts for handling the raw data for analysis.
Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
Deployed and configured flume agents to stream log events into HDFS for analysis.
Configured Oozie for workflow automation and coordination.
Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
Custom shell scripts for automating redundant tasks on the cluster.
Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.
Involved in loading data from UNIX file system to HDFS.
Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.

Environment: MapReduce, HDFS, Hive, Pig, Flume, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos

Confidential

Linux/MySQL Administrator

Responsibilities:

Installation of MYSQL (5.34/5.5/6) databases on Red hat Linux.
Created Mysql Database Backups and tested restore process on Test Environment
Created Database users and set up permissions on Dev and Test Servers
Setting up Replication master-slave, master-master and cascade replication for backup and reporting.
Created Mysql databases on production and development servers.
Verify free space host availability on (backup/archive) directories.
Monitored the database size and increased the size when required, analyzed the database tables and indexes and then rebuilt the indexes if were fragmentations in indexes.
Check completion results for any scheduled jobs, corns or data processing including refreshes.
Review daily invalid object reports for all databases.
Created users, allocation of appropriate tablespace quotas with necessary privileges and roles for MYSQL databases
Installation and configuration of Linux for new build environment.
Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Experience with Linux internals, virtual machines, and open source tools/platforms.
Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
Ensured data recoverability by implementing system and application level backups.
Performed various configurations which include networking and IP Tables, resolving hostnames, SSH key less login.
Performed scheduled backup and necessary restoration.

Environment: MySQL 5.6/5.5/5.1, NAGIOS, MySQL FABRIC, MySQL Workbench, LINUX 5.0, 5.1

We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

Mountain View, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship