Hadoop Administrator Resume
4.00/5 (Submit Your Rating)
SUMMARY
- 6+ years of experience in IT industry including 5+ years of Hadoop consultation with Telecom and financial clients.
- Hands on experience on installing, configuring, maintaining of cluster for application and development. Expertise in Hadoop tools like HDFS, Hive, Spark, Oozie, YARN, Kafka, Impala, Tez, Hbase, Zookeeper, Hue, Sqoop, Cloudera Manager and Ambari.
- Experienced in both Cloudera CDH 5.x/CDP 7.x and Hortonworks HDP environment.
- Experience in capacity planning, validating hardware and software requirements, building and configuring small to medium size cluster, managing and performance tuning of Hadoop cluster for different use cases.
- Experience in configuring external database to store metadata of cloudera manager, activity monitor, report manager, hive, oozie, hue and sentry.
- Experience in Configuring namenode High Availability by configuring standby name node and service level High Availability for hive metastore, hiveserver2, and yarn resource manager.
- Highly experience to manually setup all the hadoop configuration file likecore - site.xml, hdfs-site.xml, hadoop-env.sh, mapred-site.xml and yarn-site.xml.
- Worked on Performance Optimization for the Hive queries while performance tuning in the Cluster level and adding the users in the clusters.
- Good understanding of Navigator components for security, data management, Data governance and data optimization.
- Experience in using version controlling tools like Git, Bitbucket, and installing ETL tool like Talend and manage Talend job.
- Design, configure and manage backup and disaster recovery using Data Replication, Snapshots, Cloudera BDR utilities.
- Experienced in developing and deploying code using Spark SQL.
- Comfortable in RDD creation, converting them into Data frame and Datasets and save the output in hive.
- Actively participated meeting with application owners and big data architects and reviewed the use cases complying the resource requirements and also the technology stacks.
- Experience in monitoring and troubleshooting hive, yarn and spark jobs.
- Experience in analyzing Log files and figuring out the root causes of jobs/services/use cases failure.
- Fix the issues by communicating with dependent and support teams and log the cases based on the priorities.
- Experience in Implementing Rack Awareness for data locality optimization.
- Hands on experience in managing cluster resource using yarn components like resource manager, node manager, resource pool and scheduler (Capacity, FIFO, Fair, and DRF).
- Involved in Cluster level security management like security perimeter (Authentication- Cloudera Manager, LDAP and Kerberos), access (Authorization - Sentry, FACL), data governance (Audit and Lineage -Navigator) and data encryption (Data Confidential Rest - cloudera navigator and Data Confidential motion - SSL/TLS).
- Implemented Role based authorization for HDFS, HIVE, IMPALA using Apache Sentry, Apache Ranger.
- Experienced managing Kerberos enabled CDH environment and Upgrading to newer versions.
- Implemented TLS on all CDH services managed by Cloudera Manager.
- Participated on upgrading CDH and CM major and minor versions.
- Work on Cloud Platforms like AWS and its services like EC2, S3, EMR etc.
- Commissioning and De-commissioning of data nodes from cluster in case of hardware failure, network and OS related issues.
- To migrate data between, and across the clusters using distcp SCP/SFTP.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Involved in data ingestion using Sqoop, spark and ETL tool like Talend.
- Experience in automating Hive, regular spark processes, spark ETL processes and Sqoop using oozie and schedule them according to client requirement.
- Good understating knowledge Python and java.
- Highly experience in troubleshooting data pipeline issues in NIFI and monitoring them on a daily basis.
- Hands on experience in Linux administration on RHEL & CentOS.
- Good understanding of documentation, standard practices and company's policy meeting with SLA and OLA.
- Participating in the application on-boarding meetings along with Application owners, Architects and helps them to identify/review the technology stack, use case and estimation of resource requirements.
- Additional responsibilities including interacting with offshore team on a daily basis, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
- Experience in documentation following standard practices, compliance policies and workarounds.
- Skilled on performance optimization and technical improvements with an understanding of cost-effective decision making and usability.
PROFESSIONAL EXPERIENCE
HADOOP ADMINISTRATOR
Confidential
Responsibilities:
- Extensively involved in Installation and configuration of Cloudera distributed Hadoop and its components like Name Node, Standby Name Node, Resource Manager, Node Manager and Data Nodes.
- Responsible for performance testing and benchmarking of the cluster.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and HBase.
- Supported Data Analysts in running Hadoop related jobs.
- Implemented Fair scheduler to allocate the fair amount of resources to small jobs.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Responsible for scheduling jobs in Hadoop using FIFO, Fair scheduler and Capacity scheduler.
- Expertise in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Worked on Hadoop production environment and HA implementation of Name Node to avoid single point of failure.
- Experience working on LDAP user accounts and configuring LDAP authentication on client machines.
- Automated day to day activities using shell scripting and used Cloudera Manager to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Involved in planning and installation of Hadoop cluster infrastructure, resources capacity and build plan for Hadoop cluster installations.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Installed and configured Hive in Hadoop cluster and help business users/application teams to fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
- Performance tuning of the Hadoop Cluster and map reduce jobs. Also the real-time applications with best practices to fix the design flaws.
- Implemented Oozie work-flow for ETL Process for critical data feeds across the platform.
- Implementing Kerberos Security Authentication protocol for existing cluster.
- Built high availability for production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
- Worked closely with ETL tool like Talend (data processes with Talend) and SAS.
- Provided System support/maintenance for 24x7 for Customer Experience Business Services.
Environment: CDH5.x,CDP 7.x, HADOOP, HDFS, HIVE, YARN,OOZIE, HUE, SQOOP, IMPALA, SPARK, KERBEROS, KAFKA, Hbase, LDAP, Sentry, Talend, SAS, AutoSys, SQL Developer,mySql/Oracle,JAVA, python and RedHat Linux.
HADOOP CONSULTANT
Confidential
Responsibilities:
- Experience in managing scalable Hadoop cluster environments.
- Involved in managing, administering and monitoring clusters in Hadoop Infrastructure.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur.
- UsedSqoop to import and export data from HDFS to RDBMS and vice-versa.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Experience in HDFS maintenance and administration.
- Involved in automates the movement of data between different data sources and systems, making data ingestion fast, easy and secure using apache NIFI.
- Managing nodes on Hadoop cluster connectivity and security.
- Experience in commissioning and decommissioning of nodes from cluster.
- Experience in Name Node HA implementation.
- Worked on architected solutions that process massive amounts of data on corporate and AWS cloud-based servers.
- Working with data delivery teams to setup new Hadoop users.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and HBase jobs.
- Experience in HDFS data storage and support for running Map Reduce jobs.
- Performance tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files.
- Maintaining and monitoring clusters. Loaded data into the cluster using spark RDD and from hive and relational database management systems using Sqoop.
- Have good understanding of DAG and its steps in detail.
- Importing and Exporting Data from MySQL/Oracle to Hive using SQOOP.
- Experience in using distcp SCP/SFTP to migrate data between and across the clusters.
- Hands on experience in analyzing Log files for Hadoop eco system services.
- Coordinate root cause analysis efforts to minimize future system issues.
- Create and manage job scheduling using AutoSys.
- Highly involved in operations and troubleshooting Hadoop clusters.
- Troubleshooting of hardware issues and closely worked with various vendors for Hardware/OS and Hadoop issues.
Environment: Cloudera 5.x,HDFS, YARN, Hive, Impala, Spark, Sqoop, NIFI, HBase,KAFKA, zookeeper, Chef, Ganglia, Tableau, Micro strategy, Shell Scripting, python and RedHat Linux.
