Hadoop Engineer Resume
Mountain View, CA
SUMMARY:
- 3 years of experience in Hadoop, Sqoop, Flume, Hive, PIG, MapReduce and HBASE with exposure to projects from different business verticals.
- 4 years of database and Linux systems administration experience.
- Expertise on Hadoop architecture, YARN, MapReduce and Hadoop EcoSystem components.
- Experience in planning, deploying and managing multi - node development, testing and production Hadoop clusters with eco-system components like HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG and ZOOKEEPER. Cluster management using Cloudera manager and Hortonworks Ambari.
- As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Experience in Amazon AWS cloud services (EC2, EBS, S3). Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Experience in analyzing the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing using key tab tools.
- Experience on setting up NameNode high availability for major production clusters and designed automatic failover control using Zookeeper and Quorum journal nodes.
- Experience in importing and exporting data from different databases like MySQL and Oracle DB into HDFS using Sqoop.
- Worked on streaming the data into HDFS from web servers using flume.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
- Experience in benchmarking, backup and disaster recovery of NameNode metadata.
- Performed major and minor upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
- Python and Shell scripting experience.
- Knowledge of Java Virtual Machine and multi-thread processing. Hands on Java programming knowledge.
- Expert in analyzing client’s existing Hadoop infrastructure and provide performance tuning accordingly.
- Team player and self-starter with excellent communication skills and proven abilities to finish tasks on time.
TECHNICAL SKILLS:
Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Cloudera Manager
Security: Kerberos
Scripting: Python, Shell scripting, Chef
Programming Languages: Java, C, C++
Databases: MySQL, MsSQL, MongoDB, Cassandra
Monitoring Tools: Nagios, Ganglia, Cloudera Manager
Operating Systems: Linux(RHEL/Ubuntu/CentOS), Windows(XP/7/8)
PROFESSIONAL EXPERIENCE:
Confidential, Mountain view, CA
Hadoop Engineer
Responsibilities:
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
- Experience in Amazon AWS cloud services (EC2, EBS, S3). Experience in managing Hadoop clusters using Cloudera Manager.
- Experience in tableau administration.
- Implemented LTM over hive servers to reach the maximum utilization and failover.
- Experience in benchmarking Hadoop cluster for analysis of queue usage.
- Involved in setting up the chef server to push the configuration across the cluster.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
- Rack Aware Configuration.
- Good Experience on HIVE writing SQLs.
- Experience in loading data from various data sources to HDFS using Kafka.
- Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Expertise using Apache Spark fast engine for large-scale data processing and shark fast hive SQL on spark.
- Worked on a POC for performance evaluation of using Apache spark/Shark against Apache Hive
- Experience in HDFS data storage and support for running mapreduce jobs.
- Experience in scheduling the jobs through Tidal, Oozie.
- Experience in Monitoring, scheduling Informatica jobs through tidal.
- Experience in monitoring tools like Ganglia, Nagios
- Commissioning and decommissioning of Hadoop nodes to tune the cluster.
- Involved in setting up the Kerberos authentication.
- Experience in Vastool for managing the user and dataset permissions.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
Environment: MapReduce, HDFS, Hive, Pig, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos, Ganglia, Tidal, Tableau, Informatica.
Confidential, Segundo, CAHadoop Admin
Responsibilities:
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Performed various configurations which Includes, networking and IP tables, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
- Implemented authentication and authorization service using Kerberos authentication protocol.
- Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms.
- Tuned the cluster by Commissioning and decommissioning the Data Nodes.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Upgraded the Hadoop clusters from CDH3 to CDH4.
- Deployed high availability on the Hadoop cluster (quorum journal nodes).
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Network file system for NameNode Metadata backup.
- Performed a POC on cluster back using distcp, Cloudera manager BDR and parallel ingestion.
- Configured and deployed hive metastore using MySQL and thrift server.
- Development of Pig scripts for handling the raw data for analysis.
- Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
- Deployed and configured flume agents to stream log events into HDFS for analysis.
- Configured Oozie for workflow automation and coordination.
- Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
- Custom shell scripts for automating redundant tasks on the cluster.
- Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.
- Involved in loading data from UNIX file system to HDFS.
- Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.
Environment: MapReduce, HDFS, Hive, Pig, Flume, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos
ConfidentialLinux/MySQL Administrator
Responsibilities:
- Installation of MYSQL (5.34/5.5/6) databases on Red hat Linux.
- Created Mysql Database Backups and tested restore process on Test Environment
- Created Database users and set up permissions on Dev and Test Servers
- Setting up Replication master-slave, master-master and cascade replication for backup and reporting.
- Created Mysql databases on production and development servers.
- Verify free space host availability on (backup/archive) directories.
- Monitored the database size and increased the size when required, analyzed the database tables and indexes and then rebuilt the indexes if were fragmentations in indexes.
- Check completion results for any scheduled jobs, corns or data processing including refreshes.
- Review daily invalid object reports for all databases.
- Created users, allocation of appropriate tablespace quotas with necessary privileges and roles for MYSQL databases
- Installation and configuration of Linux for new build environment.
- Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
- Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Experience with Linux internals, virtual machines, and open source tools/platforms.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recoverability by implementing system and application level backups.
- Performed various configurations which include networking and IP Tables, resolving hostnames, SSH key less login.
- Performed scheduled backup and necessary restoration.
Environment: MySQL 5.6/5.5/5.1, NAGIOS, MySQL FABRIC, MySQL Workbench, LINUX 5.0, 5.1