Senior Hadoop Administrator Resume
Hartford, CT
SUMMARY
- Over seven years of Information technology experience in Big - data HADOOP, HDFS, and other Hadoop Ecosystem components like Hive, Pig, Sqoop, HBase, Oozie, Flume and Map-Reduce.
- Hadoop clusters, setting up and configuring a core/customized Hadoop cluster, loading data from different sources using Sqoop, enabling High availability, commissioning and de-commissioning, Report generation of running nodes using various benchmarking Operations, scaling, Node failure recovery and Documenting all production scenarios, issues and resolutions.
TECHNICAL SKILLS
Big Data Eco Systems: Hadoop (HDFS & Map Reduce), PIG, HIVE, HBASE, Zookeeper, Kerberos, Sqoop, Flume, Kafka, Apache Spark, Impala, Oozie, NiFi.
Databases: Oracle, SQL server, My SQL, MariaDB, PostgreSQL.
No SQL Databases: HBase, Cassandra, Mongo DB.
Hadoop Distributions: Cloudera, Horton works, Apache.
Cloud: AWS, AZURE.
Operating Systems: MacOS, Linux, UNIX and Windows.
Languages: Scala, Python, C.
IDE: Eclipse, Net beans, IntelliJ.
Source Code Control: GitHub, CVS, SVN
ETL Tools: Talend, Informatica.
Development Methodologies: Agile.
PROFESSIONAL EXPERIENCE
Confidential, Hartford, CT
Senior Hadoop Administrator
Responsibilities:
- Administered more than 200 data nodes across three different data clusters, including the production cluster; monitored, commissioned, and decommissioned data nodes, while also managing troubleshooting, cluster planning, data management, reviews, backups, as well as reviewing and administering log files.
- Installed and configured data management software, including conducting performance tuning and capacity management to improve the usability of Hadoop clusters.
- Evaluated data application performance and communicated findings to developers, recommending and implementing improvements to database performance.
- Worked with development teams to deploy Oozie workflow jobs to run multiple Hive and Pig jobs which run independently with time and data availability.
- Provided operational support services throughout all phases of Hadoop infrastructure and application installation efforts.
- Re-balance data across the Data Nodes, moving blocks from over-utilized to under-utilized nodes.
- Commissioning and decommissioning of data nodes when required.
- Implemented, analyzed, and transformed data by implementing Bash, Hive, and Pig scripts in a Hadoop environment.
- Created Spark applications using the Scala programming language.
- Integrated Spark Streaming with Spark SQL to query streaming data in real time.
- Responsible for Kafka tuning, capacity planning, disaster recovery, replication, and troubleshooting.
- Maintaining Kafka connectors to move data between systems.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Using data ingestion to pull data into NiFi, from numerous data sources and create flow files.
- Visualize Dataflow at the enterprise level using NiFi.
- Enabled enterprises to link critical data by managing and implementing Master Data Management (MDM) applications.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
Environment: Cloudera, Datastax DSE distribution, Spark, Scala, MongoDB, Hadoop, AWS, Java, MapReduce, HDFS, Hive, Pig, Impala, Kafka, Cassandra, Clouera Manager, Sqoop, Flume, Oozie, NiFi, Zookeeper, Kerberos, MySql, Eclipse.
Confidential, Arlington, VA
Hadoop Administrator
Responsibilities:
- Designing and implementing the non-production multi node environments.
- Upgraded Ambari from 2.2.1 to 2.4.0.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring, troubleshooting, manage and review data backups, and manage and review Hadoop log files.
- Upgraded for HDP with rolling upgrade, including using Ranger and Ranger KMS to integrate security features and encrypted HDFS data.
- Designing a multi-node clusters for production environment based on the future data growth. re-balanced Data Nodes across cluster and protect cluster from overloading.
- Upgraded HDP from 2.3 to 2.4 for production cluster and upgraded HDP to v2.5.3.0 on POC Lab.
- As a part of DTV migration used Falcon for migrating data from CDH to HDP
- Automated transferring the dataflow from Kafka to HDFS.
- Setting up DistCP for the inter cluster data transfer.
- Sizing of the cluster exercise performed along with stake holders to understand the data ingestion pattern and provided recommendations.
- Commissioning and decommissioning of data nodes when required.
- Installed and configured HDP cluster and other Hadoop ecosystem components.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Deployed a Kafka cluster with a separate zookeeper to enable processing of data using spark streaming in real-time and storing it in HBase.
- Implemented Capacity scheduler to securely share the available resources among multiple groups.
- Processed massive streams of real-time data using Spark Streaming.
- Provided support for all the maintenance activities like OS patching, Hadoop upgrades, configuration changes.
- Developed automated scripts using Unix Shell for running Balancer, file system health check and User/Group creation on HDFS.
Environment: Hortonworks (HDP 2.4), Ambari, Map Reduce 2.0(Yarn), HDFS, Hive, HBase, Pig, Oozie, Sqoop, Spark, Flume, Kerberos, Zookeeper, Smart sense, Airflow, NiFi, Falcon, DB2, SQL Server 2014, RHEL 6.x, python.
Confidential, Hartford, CT
Hadoop Administrator
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- In depth understanding of Classic MapReduce and YARN architectures.
- Involved in loading data from UNIX file system to HDFS.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, and Hive.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Using Ambari in Azure HDInsight cluster recorded and managed the data logs of name node and data node.
- Developed a framework to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Monitor Hadoop cluster connectivity and performance.
- As an admin followed standard Back up policies to make sure the high availability of cluster.
- Assisted with data capacity planning and node forecasting.
- Deployed the data in Hadoop Cluster on Azure for data lake.
- Started using apache NiFi to copy the data from local file system to HDFS.
- Integrated Spark Streaming with data sources, including Kafka and Flume.
- Involved in continuous monitoring of operations using Storm.
- Implemented indexing for logs from Oozie to Elastic Search.
Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Apache Kafka, AZURE, Apache Storm, Oozie, SQL, Flume, Spark1.6.1, HBase and GitHub.