We provide IT Staff Augmentation Services!

Sr. Hadoop Administrator Resume

5.00/5 (Submit Your Rating)

Pleasanton, CA

SUMMARY:

  • Around 7+ years of IT experience and Extensive experience in the administration, modification, installation and maintenance of Hadoop on Linux RHEL operating system.
  • Hands on experience in deploying and managing multi - node development, testing and production of Hadoop Cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
  • Hand on experience in Big Data Technologies/Framework like Hadoop, HDFS, YARN, MapReduce, HBase, Hive, Pig, Sqoop, NoSQL, Flume, Oozie.
  • Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat.
  • I Supporton Liaison with Software to technical team for automation, installation and configuration tasks.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
  • As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.
  • Used Namespace support to map Phoenix schemas to HBase namespaces.
  • Performed administrative tasks on Hadoop Clusters using Cloudera.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop,Apache NiFi, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
  • Experience in designing and implementation of secure Hadoop cluster using Kerberos.
  • Experience with Hadoop Architecture and Big Data users to implement new Hadoop eco-system technologies to support multi-tenancy cluster.
  • Skilled in monitoring servers using Nagios, Data dog, Cloud watch and using EFK Stack Elastic search, Fluentd Kibana.
  • Implemented DB2/LUW replication, federation, and partitioning (DPF).
  • Areas of expertise and accomplishment include: Database Installation/Upgrade, Backup/Recovery,
  • Hands on experience in installing, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
  • Experience on capacity planning, hdfs management and yarn resource management.
  • Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMware and Amazon Web Services (AWS) using an EC2 instances.
  • Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
  • Installed and configured a Hortonworks HDP 2.3.0 using AMBARI 2.1.1 manager.
  • Hands on experience in upgrading the cluster from HDP 2.0 to HDP 2.3.
  • Expertise in interactive data visualization and analyzing with BI tools like Tableau.
  • Worked with Different Relational Database systems like Oracle/PL/SQL.Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
  • Used NoSQL database with Cassandra, MongoDB, Monod and Designed table.
  • Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using Zookeeper and Quorum Journal Nodes.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Familiar with writing Oozie workflows and Job Controllers for job automation.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING for data
  • Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
  • Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.

PROFESSIONAL EXPERIENCE:

Sr. Hadoop Administrator

Confidential, Pleasanton, CA

Responsibilities:

  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Monitoring and support through Nagios and Ganglia.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2 and followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Created MapR DB tables and involved in loading data into those tables.
  • Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
  • Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
  • Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
  • Cloudera Navigator installation and configuration using Cloudera Manager.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
  • Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS. Hadoop security setup using MIT Kerberos, AD integration(LDAP) and Sentry authorization.
  • Cloudera RACK awareness and JDK upgrade using Cloudera manager.
  • Integrated Attunity and Cassandra with CDH Cluster.
  • Sentry installation and configuration for Hive authorization using Cloudera manager.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
  • Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Configured AWS IAM and Security Groups.
  • Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
  • Ability to work with incomplete or imperfect data, experience with real-time transactional data. Strong collaborator and team player with an agile hand on experience on Impala.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
  • Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
  • Responsible for copying 210 TB of Hbase table from Production to DR cluster.
  • Created SOLR collection and replicas for data indexing.
  • Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
  • Upgraded Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
  • Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
  • Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
  • Investigate the root cause of Critical and P1/P2 tickets.

Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, OozieZookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.

Sr Big Data/ Hadoop Administrator

Confidential - St.Louis, MO

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Worked on Installing and configuring the HDP Hortonworks 2.x and Cloudera (CDH 5.5.1) Clusters in Dev and Production Environments
  • Worked on Capacity planning for the Production Cluster
  • Installed HUE Browser.
  • Involved in loading data from UNIX file system to HDFS using Sqoop.
  • Involved in creating Hive tables, loading the data and writing hive queries which will run internally in map reduce way.
  • Worked on Installation of HORTONWORKS 2.1 in AZURE Linux Servers.
  • Worked on Configuring Oozie Jobs.
  • Extensively used Sqoop to move the data from relational databases to HDFS. Used Flume to move the data from web logs onto HDFS.
  • Worked on Configuring High Availability for Name Node in HDP 2.1.
  • Worked on Configuring Kerberos Authentication in the cluster.
  • Worked on cluster upgradation in Hadoop from HDP 2.1 to HDP 2.3.
  • Worked on Configuring queues in capacity scheduler.
  • Worked on installing and configuring Solr 5.2.1 in Hadoop cluster.
  • Worked on taking Snapshot backups for HBase tables.
  • Worked on trouble shooting the Hadoop cluster issues and fixing the cluster issues.
  • Involved in Cluster Monitoring backup, restore and troubleshooting activities.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure
  • Managed and reviewed Hadoop log files.
  • Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
  • Successfully Generated consumer group lags from kafka using their API Kafka- Used for building real-time data pipelines between clusters.
  • Worked on indexing the HBase tables using Solr and indexing the Json data and Nested data.
  • Worked on configuring Queues in Capacity scheduler
  • Worked on configuring queues in Oozie scheduler
  • Worked on Performance Optimization for the Hive queries
  • Worked on Performance tuning in the Cluster level
  • Worked on adding the Users in the clusters
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Adding/installation of new components and removal of them through Ambari.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Monitored workload, job performance and capacity planning
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Creating and deploying a corresponding Solr Cloud collection.
  • Creating collections and configurations, Register a Lily HBase Indexer configuration with the Lily HBase Indexer Service.
  • Creating and managing the Cron jobs.

Hadoop/ AWS Administrator

Confidential, Baltimore, MD

Responsibilities:

  • Installed and configured Hadoop clusters and Eco-system components like spark, Hive, Scala, Yarn, Map Reduce and HBase.
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning and slots configuration.
  • Trouble shooting many cloud related issues such as Data Node down, Network failure and data block missing.
  • Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
  • Worked with various AWS Components such as EC2, S3, IAM, VPC, RDS, Route 53, SQS, SNS.
  • Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
  • Expertise in performing tuning of Spark Applications for setting right Batch Interval time.
  • Created EC2 instances and managed the volumes using EBS.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Expertise in Using Sqoop to connect to the ORACLE, MySQL, SQL Server, TERADATA and move the pivoted data to Hive tables or Hbase tables
  • Updated cloud formation templates with IAM roles for S3 bucket access, security groups, subnet ID, EC2 Instance Types, Ports and AWS Tags.
  • Worked on bitbucket, git and bamboo to deploy EMR clusters.
  • Used Storm service extensively to connect to Active MQ and KAFKA to push data to HBASE and HIVE tables.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud watch
  • Reviewed firewall settings security group and updated on Amazon AWS.
  • Worked on Multiple instances, managing the Elastic load balancer, Auto Scaling, setting the security groups to design a fault tolerant and high available system.
  • Loading data from large data files into Hive tables and HBase NoSQL databases.
  • Created S3 buckets policies, IAM role based policies and customizing the JSON template.
  • Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3).
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Developed automated scripts to install Hadoop clusters.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Extensively worked with continuous Integration of application using Jenkins.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in extraction of jobs to Import data into Hadoop file system from oracle traditional systems using Sqoop Import tasks.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Cloudera Manger 5.10, CDH 5.10, AWS, Yarn, Spark, HDFS, Python, Hive, HBase, Oozie, Tableau, Oracle 12c, Linux.

Hadoop Cluster Engineer

Confidential - San Antonio, TX

Responsibilities:

  • Manage over ~2500 Hadoop ETL jobs in production. Manage Production cluster comprises of 220 nodes.
  • Launching and Setup of HADOOP Cluster on AWS as well as physical servers, which includes configuring different components of HADOOP.
  • Created a local YUM repository for installing and updating packages. Configured and deployed hive metastore using MySQL and thrift server.
  • Responsible for building system that ingests terabytes of data per day into Hadoop from a variety of data sources providing high storage efficiency and optimized layout for analytics.
  • Developed data pipelines that ingests data from multiple data sources and process them.
  • Configured Kerberos for authentication, Knox for perimeter security and Ranger for granular access in the cluster.
  • Configured and installed several Hadoop clusters in both physical machines as well as the AWS cloud for POCs.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig. Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Used Pig to apply transformations validations, cleaning and deduplication of data from raw data sources.
  • Worked on installing SPARK and performance tuning. Upgraded the Hadoop cluster from HDP 2.2 to HDP 2.4
  • Integrated schedulers Tidal and Control- Confidential with the Hadoop clusters to schedule the jobs and dependencies on the cluster.
  • Worked closely with the Continuous Integration team to setup tools like Github, Jenkins and Nexus for scheduling automatic deployments of new or existing code.
  • Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
  • Have worked on installing Hadoop services on cloud integrated with AWS.Integrated BI tool Tableau to run visualizations over the data.
  • Worked closely with the Continuous Integration team to setup tools like Github, Jenkins and Nexus for scheduling automatic deployments of new or existing code.
  • Provided 24 x 7 on call support as part of a scheduled rotation with other team members

Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, OOZIE, STORM, AWS S3, EC2, ZOOKEEPER, SPLUNK.

Hadoop/ Linux Administrator

Confidential

Responsibilities:

  • Managing UNIX Infrastructure involves day-to-day maintenance of servers and troubleshooting.
  • Provisioning Red Hat Enterprise Linux Server using PXE Boot according to requirements.
  • Performed Red Hat Linux Kickstart installations on RedHat 4.x/5.x, performed Red Hat Linux Kernel Tuning, memory upgrades.
  • Working with Logical Volume Manager and creating of volume groups/logical performed Red Hat Linux Kernel Tuning.
  • Checking and cleaning the file systems whenever it's full. Used Log watch 7.3, which reports server info as scheduled.
  • Had hands on experience in installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
  • Configured Job Tracker to assign Map Reduce Tasks to Task Tracker in cluster of Nodes
  • Implemented Kerberos security in all environments. Defined file system layout and data set permissions.
  • Implemented Capacity Scheduler to share the resources of the cluster for the Map Reduce jobs given by the users
  • Worked on importing the data from oracle databases into the Hadoop cluster. Commissioning and Decommissioning Nodes from time to time.
  • Installation, configuration and administration of Red Hat Linux servers and support for Servers and regular upgrades of Red Hat Linux Servers using kick start based network installation.
  • Provided 24x7 System Administration support for Red Hat Linux 3.x, 4.x servers and resolved trouble tickets on shift rotation basis.
  • Configured HP ProLiant, Dell Power edge, R series, and Cisco UCS and IBM p-series machines, for production, staging and test environments.
  • Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
  • Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, and 5.7.
  • Performance monitoring utilities like IOSTAT, VMSTAT, TOP, NETSTAT and SAR.
  • Worked on Support for Aix matrix sub system device drivers. Coordinating with SAN team for allocation of LUN's to increase file system space.
  • Worked on with the computing by both physical and virtual from the desktop to the data center using the SUSE Linux. Worked with the team members to create, execute, and implement the plans.
  • Installation, Configuration, and Troubleshooting of Tivoli Storage Manager.
  • Remediating failed backups, take manual incremental backups of failing servers.
  • Upgrading TSM from 5.1.x to 5.3.x. Worked on HMC Configuration and management of HMC Console which included up gradation, micro partitioning.
  • Installation of adapter cards cables and configuring them. Worked on Integrated Virtual Ethernet and building up of VIO servers.
  • Install SSH Keys for Successful login of SRM data into the server without prompting password for daily backup of vital data such as processor utilization, disk utilization, etc.
  • Coordinating with application and database team for troubleshooting the application. Provide redundancy with HBA card, Ether channel configuration and network devices.
  • Configuration and administration of Fiber Card Adapter's and handling AIX part of SAN.

Environment: Red Hat Enterprise Linux 3.x/4.x/5.x, Sun Solaris 10, on Dell Power Edge servers, Hive, HDFS, Map Reduce, Swoop, Hbase.

We'd love your feedback!