Hadoop Admin Resume
2.00/5 (Submit Your Rating)
CA
SUMMARY
- 7+ years of professional IT experience which includes proven 3 years of experience in Hadoop Administration on Cloudera (CDH), Hortonworks (HDP) Distributions, Vanilla Hadoop, MapR and strong experience in AWS, Kafka, ElasticSearch, Devops and Linux Administration. Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
- Knowledge on HBase and zookeeper.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa.
- Experience on Ambari, Nagios and Ganglia tool.
- Scheduling all Hadoop/hive/Sqoop/ HBase jobs using Oozie
- Experienced in the Hadoop ecosystem components like Hadoop Map
- In-depth knowledge of Hadoop Eco system - HDFS, Yarn, MapReduce, Hive, Hue, Sqoop, Flume, Kafka, Spark, Oozie, NiFi and Cassandra.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Experience in performing minor and major upgrades.
- Experience in performing commissioning and decommissioning of data nodes on Hadoop cluster.
- Strong knowledge in configuring Name Node High Availability and Name Node Federation.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, Sqoop automation.
PROFESSIONAL EXPERIENCE
Hadoop Admin
Confidential, CA
Responsibilities:
- Worked on analyzing Hortonworks Hadoop cluster and different big data analytic tools including Pig, HBase Database and Sqoop.
- Hands on experience in installing, configuring Cloudera, MapR, Hortonworks clusters and installed Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper Administering and Maintaining Cloudera 6.0 Hadoop Clusters Provision physical Linux systems, patch, and maintain them. Installed, Configured, Tested Datastax Enterprise Cassandra multi-node cluster which has 4 Datacenters and 5 nodes each.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2 and followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files.
- Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and through command line interface.
- Developed scripts for tracking the changes in file permissions of the files and directories through audit logs in HDFS.
- Configured memory and v-cores for the dynamic resource pools within the fair and capacity scheduler.
- Implemented test scripts to support test driven development and continuous integration.
- Worked intact for integrating the LDAP server and active directory with the Ambari through command line interface.
- Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
- Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Designed and implemented by configuring Topics in new Kafka cluster in all environment.
- Successfully Generated consumer group lags from kafka using their API Kafka- Used for building real-time data pipelines between clusters.
- Designed and Implemented CI & CD Pipelines achieving the end to end automation Supported server/VM provisioning activities, middleware installation and deployment activities via puppet.
- Written puppet manifests Provision several pre-prod environments.
- Written puppet modules to automate our build/deployment process and do an overall process improvement to any manual processes.
- Designed, Installed and Implemented / puppet. Good Knowledge in automation by using Puppet
- Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue)
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Started using Apache NiFi to copy the data from local file system to HDP
- Worked with Nifi for managing the flow of data from source to HDFS.
- Experience in job workflow scheduling and scheduling tools like Nifi.
- Good knowledge in installing, configuring & maintaining Chef server and workstation
- Ingested data into HDFS using Nifi with different processors, developed custom Input Adaptors
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI .
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Converted complex Oracle stored procedures code to Spark and Hive using Python and Java.
- Involved in the database migrations to transfer data from one database to other and complete virtualization of many client applications
- Extensive experience working in Oracle DB2 SQL Server and My SQL database Scripting to deploy monitors checks and critical system admin functions automation
- Expertise in Using Sqoop to connect to the ORACLE, MySQL, SQL Server, TERADATA and move the pivoted data to Hive tables or Hbase tables.
- Responsible for loading data les from various external sources like ORACLE, MySQL into staging area in MySQL databases.
- Experience on data analysis and developing scripts for pig and Hive.
- Proficient in PLSQL programming in Oracle 11g, 10g, 9i including Oracle Architecture, Data Dictionary and DBMS Packages.
- Experience with Data flow diagrams, Data dictionary, Database normalization theory techniques, Entity relation modeling and design techniques.
- Hibernate Configuration files were written to connect Oracle database and fetch data
- Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
Hadoop Admin
Confidential, San Jose, CA
Responsibilities:
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Primary tasks and responsibilities center on O&M support of a Secure (Kerberized) Cloudera distribution of Hadoop systems.
- Installing and Configuring Systems for use with Cloudera distribution of Hadoop (consideration given to other variants of Hadoop such as Apache, MapR, Hortonworks, Pivotal, etc.)
- Working with cloud infrastructure like Amazon Web Services (AWS) and Rackspace.
- UNIX scripts to handle data quality issues and also to invoke the Informatica workflows.
- Primarily using Cloudera Manager but some command-line.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2 and followed standard Back up policies to make sure the high availability of cluster.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
- Launching and Setup of HADOOP Cluster on AWS as well as physical servers, which includes configuring different components of HADOOP.
- Created a local YUM repository for installing and updating packages. Configured and deployed hive metastore using MySQL and thrift server.
- Responsible for building system that ingests terabytes of data per day into Hadoop from a variety of data sources providing high storage efficiency and optimized layout for analytics.
- Developed data pipelines that ingests data from multiple data sources and process them.
- Expertise in Using Sqoop to connect to the ORACLE, MySQL, SQL Server, TERADATA and move the pivoted data to Hive tables or Hbase tables.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files.
- Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and through command line interface.
- Developed scripts for tracking the changes in file permissions of the files and directories through audit logs in HDFS.
- Deploy Kubernetes in both AWS and Google cloud. Setup cluster, replicator. Deploy multiple containers
- Configured memory and v-cores for the dynamic resource pools within the fair and capacity scheduler.
- Implemented test scripts to support test driven development and continuous integration.
- Worked intact for integrating the LDAP server and active directory with the Ambari through command line interface.
- Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
- Worked on smoke tests of each service and client upon installation and configuration. Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data. Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files. Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
- Installed Knox gateway on a separate node for providing REST API services and to give HDFS access to various http applications.
- Balancing HDFS manually to decrease network utilization and increase job performance.
- Commission and decommission the Data nodes from cluster in case of problems.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on HDFS and Local file system.
- Set up and manage High Availability of Name node federation using Quorum Manager to avoid single point of failures in large clusters.
Hadoop Admin
Confidential, Indianapolis, IN
Responsibilities:
- Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform, provide training to users to make Hadoop usability simple and updating them for best practices.
- Manage several Hadoop clusters in production, development, Disaster Recovery environments.
- Work with engineering software developers to investigate problems and make changes to the Hadoop environment and associated applications.
- Expertise in recommending hardware configuration for Hadoop cluster
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks
- Trouble shooting many cloud related issues such as Data Node down, Network failure and data block missing.
- Managing and reviewing Hadoop and HBase log files
- Proven results-oriented person with a focus on delivery
- Built and configured log data loading into HDFS using Flume.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Managed cluster coordination services through Zoo Keeper.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Falcon, Smartsense, Storm, Kafka.
- Recovering from node failures and troubleshooting common Hadoop cluster issues.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Supporting Hadoop developers and assisting in optimization of Map reduce jobs, Pig Latin scripts, Hive Scripts, and HBase ingest required.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- System/cluster configuration and health check-up.
- Continuous monitoring and managing the Hadoop cluster through Ambari.
- Created user accounts and given users the access to the Hadoop cluster.
- Resolving tickets submitted by users, troubleshoot the error documenting, resolving the errors.
- Performed HDFS cluster support and maintenance tasks like Adding and Removing Nodes without any effect to running jobs and data.
Linux Administrator
Confidential
Responsibilities:
- Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
- Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux
- Performed administration and monitored job processes using associated commands
- Manages systems routine backup, scheduling jobs and enabling cron jobs
- Maintaining and troubleshooting network connectivity
- Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
- Configures DNS, NFS, FTP, remote access, and security management, Server hardening
- Installs, upgrades and manages packages via RPM and YUM package management
- Logical Volume Management maintenance
- Experience administering, installing, configuring and maintaining Linux
- Creates Linux Virtual Machines using VMware Virtual Center
- Administers VMware Infrastructure Client 3.5 and vSphere 4.1
- Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
- Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
- Supporting infrastructure environment comprising of RHEL and Solaris.
- Installation, Configuration, and OS upgrades on RHEL 5.X/6.X/7.X, SUSE 11.X, 12.X.
- Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
- Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
- Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.