We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

Mountain View, CA


  • 3 years of experience in Hadoop, Sqoop, Flume, Hive, PIG, MapReduce and HBASE with exposure to projects from different business verticals.
  • 4 years of database and Linux systems administration experience.
  • Expertise on Hadoop architecture, YARN, MapReduce and Hadoop EcoSystem components.
  • Experience in planning, deploying and managing multi - node development, testing and production Hadoop clusters with eco-system components like HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG and ZOOKEEPER. Cluster management using Cloudera manager and Hortonworks Ambari.
  • As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Experience in Amazon AWS cloud services (EC2, EBS, S3). Experience in managing Hadoop clusters using Cloudera Manager Tool.
  • Experience in analyzing the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing using key tab tools.
  • Experience on setting up NameNode high availability for major production clusters and designed automatic failover control using Zookeeper and Quorum journal nodes.
  • Experience in importing and exporting data from different databases like MySQL and Oracle DB into HDFS using Sqoop.
  • Worked on streaming the data into HDFS from web servers using flume.
  • Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
  • Experience in benchmarking, backup and disaster recovery of NameNode metadata.
  • Performed major and minor upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
  • Python and Shell scripting experience.
  • Knowledge of Java Virtual Machine and multi-thread processing. Hands on Java programming knowledge.
  • Expert in analyzing client’s existing Hadoop infrastructure and provide performance tuning accordingly.
  • Team player and self-starter with excellent communication skills and proven abilities to finish tasks on time.


Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Cloudera Manager

Security: Kerberos

Scripting: Python, Shell scripting, Chef

Programming Languages: Java, C, C++

Databases: MySQL, MsSQL, MongoDB, Cassandra

Monitoring Tools: Nagios, Ganglia, Cloudera Manager

Operating Systems: Linux(RHEL/Ubuntu/CentOS), Windows(XP/7/8)


Confidential, Mountain view, CA

Hadoop Engineer


  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
  • Experience in Amazon AWS cloud services (EC2, EBS, S3). Experience in managing Hadoop clusters using Cloudera Manager.
  • Experience in tableau administration.
  • Implemented LTM over hive servers to reach the maximum utilization and failover.
  • Experience in benchmarking Hadoop cluster for analysis of queue usage.
  • Involved in setting up the chef server to push the configuration across the cluster.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
  • Rack Aware Configuration.
  • Good Experience on HIVE writing SQLs.
  • Experience in loading data from various data sources to HDFS using Kafka.
  • Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Expertise using Apache Spark fast engine for large-scale data processing and shark fast hive SQL on spark.
  • Worked on a POC for performance evaluation of using Apache spark/Shark against Apache Hive
  • Experience in HDFS data storage and support for running mapreduce jobs.
  • Experience in scheduling the jobs through Tidal, Oozie.
  • Experience in Monitoring, scheduling Informatica jobs through tidal.
  • Experience in monitoring tools like Ganglia, Nagios
  • Commissioning and decommissioning of Hadoop nodes to tune the cluster.
  • Involved in setting up the Kerberos authentication.
  • Experience in Vastool for managing the user and dataset permissions.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.

Environment: MapReduce, HDFS, Hive, Pig, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos, Ganglia, Tidal, Tableau, Informatica.

Confidential, Segundo, CA

Hadoop Admin


  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
  • Performed various configurations which Includes, networking and IP tables, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
  • Implemented authentication and authorization service using Kerberos authentication protocol.
  • Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms.
  • Tuned the cluster by Commissioning and decommissioning the Data Nodes.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Upgraded the Hadoop clusters from CDH3 to CDH4.
  • Deployed high availability on the Hadoop cluster (quorum journal nodes).
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Deployed Network file system for NameNode Metadata backup.
  • Performed a POC on cluster back using distcp, Cloudera manager BDR and parallel ingestion.
  • Configured and deployed hive metastore using MySQL and thrift server.
  • Development of Pig scripts for handling the raw data for analysis.
  • Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
  • Deployed and configured flume agents to stream log events into HDFS for analysis.
  • Configured Oozie for workflow automation and coordination.
  • Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
  • Custom shell scripts for automating redundant tasks on the cluster.
  • Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.
  • Involved in loading data from UNIX file system to HDFS.
  • Defined Oozie workflow based on time to copy the data upon availability from different Sources to Hive.

Environment: MapReduce, HDFS, Hive, Pig, Flume, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos


Linux/MySQL Administrator


  • Installation of MYSQL (5.34/5.5/6) databases on Red hat Linux.
  • Created Mysql Database Backups and tested restore process on Test Environment
  • Created Database users and set up permissions on Dev and Test Servers
  • Setting up Replication master-slave, master-master and cascade replication for backup and reporting.
  • Created Mysql databases on production and development servers.
  • Verify free space host availability on (backup/archive) directories.
  • Monitored the database size and increased the size when required, analyzed the database tables and indexes and then rebuilt the indexes if were fragmentations in indexes.
  • Check completion results for any scheduled jobs, corns or data processing including refreshes.
  • Review daily invalid object reports for all databases.
  • Created users, allocation of appropriate tablespace quotas with necessary privileges and roles for MYSQL databases
  • Installation and configuration of Linux for new build environment.
  • Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
  • Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Experience with Linux internals, virtual machines, and open source tools/platforms.
  • Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
  • Ensured data recoverability by implementing system and application level backups.
  • Performed various configurations which include networking and IP Tables, resolving hostnames, SSH key less login.
  • Performed scheduled backup and necessary restoration.

Environment: MySQL 5.6/5.5/5.1, NAGIOS, MySQL FABRIC, MySQL Workbench, LINUX 5.0, 5.1

Hire Now