Hadoop Administrator Resume
Round Rock, TX
PROFESSINAL SUMMARY:
- 3+ years of professional IT experience which includes experience in Big data ecosystem related technologies.
- Around 3 years of hands on experience working with Hadoop, HDFS, Map Reduce framework and Hadoop ecosystem like Hive, HBase, Sqoop and Oozie.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
- Extensive experience in writing Map Reduce, Hive, PIG Scripting and HDFS.
- In - depth understanding of Data Structure and Algorithms.
- Hands-on experience to setup >100 node Hortonworks, MapR and PHD clusters.
- Experience in managing and reviewing Hadoop log files.
- Installed and configured 15 node Apache Solr.
- Experience working on Greenplum Database.
- Knowledge in designing, implementing and managing Secure Authentication mechanism to Hadoop Cluster with Kerberos.
- Implemented Capacity Sheduler in Hortonworks and Cloudera.
- Installed and configured Tomcat, HTTPD Webserver, SSL, LDAP and SSO for application called Collibra(Data Governance tool).
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Extensive experience on data lake implementation.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, Cassandra.
- Used Zookeeper for various types of centralized configurations.
- Implemented in setting up standards and processes for Hadoop based application design and implementation.
- Good Knowledge in using SPARK for real time streaming of data into the cluster.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in managing Hadoop clusters using Cloudera Manager tool.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Linux Redhat.
- Hands on experience as Linux System Admin.
- Performed Linux admin activities like patching, maintenance and software installations.
- Experience managing groups of RHEL/Centos hosts at a scale of 100+ nodes, including installation and configuration for Hadoop cluster.
- Knowledge in Oracle, Postgress, SQL Server and My SQL database.
- Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.
- Scripting to deploy monitors, checks and critical system admin functions automation.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Ability to adapt to evolving technology, strong sense of responsibility and .
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, HBase, Hadoop MapReduce, Zookeeper, HivePig, Sqoop, Flume, Oozie, Kafka, Cassandra, Spark
Hadoop Distributions: Cloudera, MapR, Hortonworks,AWS EMR and PHD
Languages: C, C++, Java, SQL/PLSQL
Methodologies: Agile, waterfall.
Database: Oracle 10g, DB2, MySQL, MongoDB, CouchDB, MS
IDE / Testing Tools: Eclipse, STS.
Operating System: Windows, UNIX, Linux
Scripts: JavaScript, Shell Scripting
PROFESSIONAL EXPERIENCE:
Hadoop Administrator
Confidential, Round Rock, TX
Responsibilities:
- Engineer in Big Data team, worked with Hadoop and its Ecosystem.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig, Hive, Oozie and Sqoop.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Managing 100+ Node HDP and Cloudera clusters daily basis.
- Worked on shell scripts for automation.
- Implemented YARN ACL Checks, Ranger and Kerberos for HDP and Cloudera Hadoop cluster.
- Worked on DistCp data transfer jobs schedule from Production cluster to Analytics cluster on daily basis for Analytics team reports purpose.
- Involved in writing sqoop jobs to migrate data from Oracle, sql server to hdfs
- Investigated and Implemented Hortonworks Smartsence recommendations.
- Implemented Bug fixes on Hive and Tez as per Hortonworks recommendations.
- Involved on LDAP configuration for Cloudera Manager, Ambari and Hue.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data
- Managed and reviewed Hadoop Log files
- Extensively used Oozie scheduler, clear understanding of Oozie workflows, coordinators and Bundles.
- Worked extensively with Sqoop for importing metadata from Oracle
- Installed and configured large scale Kafka cluster in Cloudera.
- Responsible for smooth error-free configuration of DWH-ETL solution and Integration with Hadoop
- Designed a data warehouse using Hive
- Used Control-m scheduling tool to schedule the daily jobs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Configured Splunk to generate alerts over system/service failures.
- Working as a Production code deployment engineer in sprint basis.
Environment: Hadoop, HDP-2.8, Hortonworks,Cloudera 5.15, MapReduce, HDFS, Hive, HBase, Kafka, Java 7 & 8, MongoDB, Pig, Informatica, Oracle, Informatica BDM, Linux, Eclipse, Zookeeper, Apache Solr, R and Rstudio,Control-M, Redis, Tableau, Qlikview, DataStax, Spark, Splunk
Hadoop Administrator
Confidential
Responsibilities:
- Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Cloudera and Horton works.
- Adding/Installation of new components and removal of them through Cloudera and Horton works.
- Monitoring workload, job performance, capacity planning using Cloudera.
- Major and Minor upgrades and patch updates.
- Creating and managing the Cron jobs.
- Installed Hadoop eco system components like Pig, Hive, HBase and Sqoop in a Cluster.
- Experience in setting up tools like Nagios for monitoring Hadoop cluster.
- Handling the data movement between HDFS and different web sources using Flume and Sqoop.
- Extracted files from SQL database like MSSQL, ORACLE through Sqoop and placed in HDFS for processing.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Installed and configured HA of Hue to point Hadoop Cluster in Cloudera Manager.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
- Installed and configured Map Reduce, HDFS and developed multiple Map Reduce jobs in Hive for data cleaning and pre-processing.
- Kafka- Used for building real-time data pipelines between clusters.
- Ran Log aggregations, website Activity tracking and commit log for distributing system using Apache kafka.
- Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Experience in Python and Shell scipts.
- Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
- Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
Environment: Hadoop, Cloudera, Hortanworks, MapReduce, HDFS, Hive, HBase, Java 6 & 7, MongoDB, Pig, Informatica, Oracle, Informatica BDM, Linux, Eclipse, Zookeeper, Apache Solr, R and Rstudio,Control-M, Redis, Tableau, Spark, Splunk