We provide IT Staff Augmentation Services!

Hadoop Admin Resume

Irvine, CA


  • 8 plus years of IT industrial experience in Administrating Linux, Database management, developing Map - reduce applications, designing, building and administrating large scale Hadoop production Clusters
  • 2.5 years of experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Flume, Sqoop, Zookeeper, And NoSQL: Cassandra and Hbase.
  • Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Hortonworks Ambari.
  • Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
  • Strong knowledge of Apache Hive data warehouse, data cubes, Hive server, partitioning, bucketing, clustering and writing UDFS, UDAFS, and UDTFS in Java for hive.
  • Solid experience in Pig administration and development and writing PIG UDFS (Eval, Filter, Load and Store) and macros.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
  • Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
  • Strong knowledge in configuring Name Node High Availability and Name Node Federation.
  • Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
  • Experience in using Flume to stream data into HDFS - from various sources.
  • Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2.
  • Experience in installing and administering PXE Server with kick start, setting up FTP, DHCP, DNS servers and Logical Volume Management.
  • Experience in configuring and managing storage devices NAS (file level access - NFS) and SAN (block level access-iSCSI)
  • Experience in Storage management including JBOD, RAID Levels 1 5 6 10, Logical Volumes, Volume Groups and Partitioning
  • Exposure to Maven/Ant, GIT along with Shell Scripting for Build & Deployment Process.
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing key tab using key tab tools.
  • Experience in handling multiple relational databases: MySQL, SQL Server.
  • Familiar with Agile Methodology (SCRUM) and Software Testing.
  • Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.



NoSQL Database: Hbase, Cassandra

Security: Kerberos

Database: MySQL, SQL Server

Cluster management Tools: Cloudera Manager, Ambari

Os: LINUX (Centos, RHEL), windows, mac


Confidential, Irvine, CA

Hadoop Admin


  • Managing 5 Hortonworks cluster size of 1500 nodes altogether (Development, R&D, Discovery PROD, MARS HBase and MARS PROD)
  • Designed and Architected R&D cluster with HDP 2.3.2 andAmbari2.2.0
  • Worked on 4 different versions of HDP (1.3.2, 2.1.5, 2.2.6, 2.3.2 Latest Enterprise Release )
  • Upgraded HDP 1.3.2 and 2.1.5 to 2.2.6 using Blueprints. 2.2.6 to 2.3.2 using Rolling upgrade with no downtime to PROD Cluster
  • Configured Hadoop High Availability on Namenode, HBase, Hive, Yarn and Storm (Nimbus)
  • Configured Hadoop security Kerberos. Ranger and Knox for secured cluster
  • Configured HDFS data at rest Encryption using Ranger KMS
  • Configured Storm HA
  • Installed and configured Spark
  • Created kafka topics, produced and consumed messages
  • Cluster performance tuning
  • Setup 3 instance of zookeeper dedicated for HBase, Storm and kafka. 1st instance managed byAmbariand other 2 are out ofAmbari
  • Configured Apache Ranger centralized security and auditing for HDFS, YARN, HIVE, HBase, Storm and Kafka.
  • Installed and configured Informatica 9.6.1 HF1 Big Data Edition for Hadoop ETL
  • Commissioning and decommissioning of datanodes
  • Troubleshoot the issues reported by Nagios
  • Built and configured log data loading into HDFS using Flume.
  • Wrote shell script to monitor few components out ofAmbari
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive
  • Recovering from node failures and troubleshooting common Hadoop cluster issues
  • Supporting Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts, and HBase ingest Required

Confidential, San Francisco, CA

Hadoop Admin


  • Designed and developed data solutions to help business and product teams make data driven decisions
  • Worked closely with data analysts to construct creative solutions for their analysis tasks
  • Lead end-to-end efforts to design, develop, and implement data warehousing and business intelligence solutions
  • Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
  • Developed Puppet modules to automated the installation, configuration and deployment of software, OS's and network infrastructure at a cluster level
  • Implemented Namenode HA and automatic failover infrastructure to overcome single point of failure for Namenode utilizing Zookeeper services
  • ImplementedClouderaManager on existing cluster
  • Optimized our Hadoop infrastructure at both the software and hardware level
  • Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams
  • Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop
  • Installed, Configured and managed Flume Infrastructure
  • Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
  • Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
  • Configured Hive metastore to use MySQL Database, to make available all the tables created in Hive different users simultaneously.
  • Using HiveQL developed many queries and extracted the business required information.
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.

Confidential, Jersey City, NJ

Hadoop Admin


  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
  • Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
  • Implemented authentication service using Kerberos authentication protocol.
  • Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
  • Master nodes disks are configured with RAID 1+0
  • Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms.
  • Tuned the cluster by Commissioning and decommissioning the Data Nodes.
  • Upgraded the Hadoop cluster.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Deployed high availability on the Hadoop cluster quorum journal nodes.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Configured Ganglia which include installing GMOND and GMETAD daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Deployed Network file system for Name Node Metadata backup.
  • Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured and deployed hive metastore using MySQL and thrift server.
  • Used hive schema to create relations in pig using Hcatalog.
  • Development of Pig scripts for handling the raw data for analysis.
  • Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
  • Deployed and configured flume agents to stream log events into HDFS for analysis.
  • Performed deploying yarn, which facilitate multiple applications to run on the cluster.
  • Configured Oozie for workflow automation and coordination.
  • Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
  • Custom shell scripts for automating redundant tasks on the cluster.
  • Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.



Java Developer


  • Involved in the design and followed Agile Software Development Methodology throughout the software development lifecycle.
  • Designed Use Cases, Class Diagrams, and Sequence Diagrams using Visual Paradigm to model the detail design of the application.
  • Developed User Interface using JSP standard tags and Java script, HTML, CSS for Presentation layer.
  • Used the spring validation for Web Form Validation by implementing the Validator interface.
  • Application was built on Spring MVC framework and Hibernate as ORM
  • Used Spring-Core module for Dependency Injection and integrated view using Apace Tiles.
  • Consumed Web Services (WSDL, SOAP, UDDI) from third party for authorizing payments to/from customers using CXF Framework
  • Used JMS Queue communication in authorization module.
  • Mapped (one-to-many, one-to-one, many-to-one relations) DTOs to Oracle Database tables and Java data types to SQL data types by creating Hibernate mapping XML files
  • Oracle database was used, wrote stored procedures for common SQL queries
  • Used ANT for building the enterprise application modules, Used CVS for Version control, Log4J to monitor the error logs and performed unit testing using J Unit.
  • Deployed the applications on IBM Web Sphere Application Server 5.0.

Hire Now