We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

Houston, TX


  • Experience in deploying and managing the multi - node Hadoop cluster with different Hadoop components (Hive, Pig, Sqoop, Oozie, Tez, Flume, Hcatalog, Hbase, Zookeeper)on Cloudera and Horton works clusters .
  • Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2, Rackspace and on private cloud infrastructure - OpenStack cloud platform
  • Maintained Hadoop clusters for dev/staging/production
  • Creation of Volumes, Security group rules, Key pairs, Floating IPs, Images and Snapshots and Deploying Instances on OpenStack
  • Hands-on experience with “Productionalizing” Hadoop applications such as administration, configuration management, debugging and performance tuning
  • Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required
  • Experience in performance tuning for Mapreduce, Hive andSqoop
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager
  • Worked with system engineering team to plan and deploy Hadoop hardware and software environments
  • Experience with cluster maintenance tasks such as Commissioning and decommissioning of nodes, cluster monitoring and troubleshooting. Manage and review Hadoop log files
  • Worked on Disaster management with Hadoop cluster
  • Experience installing and implementing security using SSL encryption for Hadoop clusters with Kerberos
  • Strong knowledge in configuring Namenode high availability and Namenode federation
  • Designed ingestion framework using flume for streaming logs and aggregated data into HDFS. Built data transform framework using MapReduce and Pig
  • Designed, delivered and helped manage a device data analytics at a very large storage vendor
  • Experience in benchmarking, performing backup and recovery of Namenode metadata and data residing on the cluster
  • Well experienced in managing DHCP, PXE with Kickstart, DNS and NFS and used them in building infrastructure in Linux Environment and working with Puppet for application deployment
  • Strong knowledge on work automation using Chef and Puppet
  • Hands on experience in developing Sqoop jobs to import the data from RDBMS sources like MySQL, PostgreSQL into HDFS as well as exporting vice versa.
  • Experience in writing workflows using ApacheOozie with job controllers like Hive and Sqoop.
  • Excellent working knowledge of HBase and data pre-processing using FLUME-ng.
  • Strong knowledge on Nosql databases Like Hbase, Cassandra and MongoDB
  • Strong knowledge on implementing Kafka .
  • Knowledge on Apache Tez, a MapReduce alternative to Hive for speedy execution of queries in HiveQL.
  • Worked with big data teams to move ETL tasks to Hadoop
  • Hands on experience with AWS based deployments on S3,EC2-Auto-scaling, Elastic Search, etcin a high available production environment.
  • Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON, Web Services and XML files.
  • Strong knowledge on new technologies like Spark, Shark, Docker and heavy containers to catch up with industry developments
  • Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines


Languages: C, C++, Java

Big Data Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop,TezOozie, Flume, ZooKeeper

Security: Kerberos

Hadoop management tools: Cloudera Manager, Ambari, Ganglia, Nagios

Databases: Oracle, MySQL, SQL Server

Scripting language: shell, Python

Software Development Tool: Rational Rose, Eclipse, NetBeans

Web Development: PHP, HTML, AJAX, Java script, CSSWeb Designing Tools: Adobe Photoshop, Adobe Dream Weaver

Nosql Databases: Hbase, Cassandra

Operating Systems: Windows, Linux - Redhat, CentOS

Build Tools: Maven, ANT


Hadoop Engineer

Confidential, Houston, TX


  • Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions
  • Worked closely with application developers and end-users for analyzing requirements
  • Responsible for architecting both Horton works and Cloudera Clusters.
  • Installed, configured and monitored Horton works Ambari and Cloudera Manager
  • Configured and deployed cluster monitoring tools Nagios and Gangila.
  • Responsible for Batch architecture and cluster migration planning
  • Excellent knowledge On VPC security configuration and in network access for AWS
  • Installed Ambari 2.0 and CDH 4.7 and CDH 5.1.3 with HA.
  • Implemented high-availability for Management Services using NFS and Heartbeat
  • Provided high-availability for Management Services components like - host monitor, service monitor, activity monitor, event server, reports manager
  • Tested the high-availability architecture with enabling/disabling SSL, enabling/disabling Kerberos, Management Services directories as NFS mount
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive, Sqoop and a few system specific jobs
  • Development of Pig scripts for handling the raw data for analysis
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns
  • Worked extensively in writing Sqoop scripts for importing and exporting of data from RDBMS
  • Responsibility of extracting the data from various sources into Hadoop HDFS for processing.
  • Worked on streaming the real time log data into HDFS from web servers using flume.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Hive queries using indexes and buckets for time efficiency
  • Designed and implemented Pig UDF’s for evaluation, filtering, loading and storing of data.
  • Hands on experience in installing Hbase, Hregion servers and creating HBase tables .
  • Integrated Cognos and Informatica to access hive tables from the cluster.

Environment: JDK1.6, CentOS, FLUME, HBase, HDFS, Maven, Spark,, Map-Reduce, Hive, Oozie,RedHat Zookeeper, sqoop, pig

Hadoop Admin

Confidential, Santa Clara, CA


  • Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
  • Involved in installing Hadoop, Hive, Hbase, Zookeeper, Flume and Oozie.
  • Worked closely with data analysts to construct creative solutions for their analysis tasks
  • Lead end-to-end efforts to design, develop, and implement data warehousing and business intelligence solutions.
  • Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
  • Implemented Cloudera Manager on existing cluster
  • Optimized our Hadoop infrastructure at both the software and hardware level
  • Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams
  • Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop
  • Installed, Configured and managed Flume Infrastructure
  • Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
  • Configured Hive metastore to use MySQL Database, to make available all the tables created in Hive different users simultaneously
  • Using HiveQL developed many queries and extracted the business required information
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
  • Installed and configured Datastax Cassandra cluster

Environment: JDK1.6, CentOS, FLUME, HBase, HDFS, Maven, Redis, Cassandra, Map-Reduce, Hive, Oozie, Zookeeper.

Hadoop Consultant

Confidential, Kansas City, MO


  • Installed and configured Cloudera Hadoop on a 24 node cluster.
  • Loaded data from Oracle database into HDFS.
  • Analyzed the Big Data business requirements and transformed it into Hadoop centric technologies.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
  • Developed MapReduce pipeline jobs in Apache Crunch to process the data and create necessary HFiles.
  • Loaded the created HFiles into HBase for faster access of large customer base without taking Performance hit.
  • Used apache Maven for project build.
  • Performed unit testing of MapReduce jobs on cluster using MRUnit.
  • Used Oozie scheduler system to automate the pipeline workflow.
  • Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
  • Implemented data serialization using apache Avro.
  • Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
  • Analyzed Business Requirement Document written in JIRA and participated in peer code reviews in Crucible.

Environment: Cloudera Hadoop, MapReduce, HDFS, Crunch, HBase, Avro, Oozie, Java (jdk1.6), JIRA, Crucible, GitHub, Maven.

Linux/ Database Administrator



  • Installing and maintaining the Linux servers
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers
  • Monitoring System Metrics and logs for any problems.
  • Running cron-tab to back up data
  • Adding, removing, or updating user account information, resetting passwords, etc
  • Creating and managing Logical volumes. Using Java JDBC to load data into MySQL
  • Maintaining the MySQL server and Authentication to required users for databases
  • Installing and updating packages using YUM

Hire Now