Hadoop Admin Resume
CA
SUMMARY
- Professional with 7+ years of IT experience in industries including Banking, Healthcare and Cloudera Certified Hadoop administrator with 5+ years of experience in activities such as installation and configuration of clusters using Apache Hadoop, Cloudera (CDH), Hortonworks (HDP), AWS Elastic Map Reduce (EMR).
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies.
- Deployed Hadoop cluster on on - premises Data Centers and Private Cloud Environments.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Yarn, MapReduce, Spark, HBase, Oozie, Hive, Sqoop, Flume, Storm, Kafka and Sentry.
- Installed and configured setting up of automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
- Experienced on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Experienced in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain.
- Experienced in creating service principles, user principles and establishing cross-realm trust.
- Strong knowledge on enforcing RBAC to hive data and metadata on a Hadoop cluster using Sentry.
- Experienced in setting up the High-Availability Hadoop Clusters and BDR clusters.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Good experience on configuring and managing the backup and disaster recovery for Hadoop data.
- Hands on experience in managing, reviewing and analyzing Log files for Hadoop and eco system services and finding root cause by performing root cause analysis.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future s.
- Experienced in HDFS data storage and support for running map-reduce jobs and spark jobs.
- Experienced in importing and exporting the data using Sqoop from HDFS to RDBMS and Scheduling all Hadoop/Hive/Sqoop/HBase jobs using Oozie.
- Experienced on Rack awareness configuration for quick availability and processing of data.
- Involved in balancing the loads on server, tuning of server for optimal performance of the cluster.
- Handsome experience in Linux admin activities such as user management, scheduling cron jobs, setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, Swappiness, Selinux and installing Java.
- Experienced in configuration management tools like puppet, Ansible and chef.
- Good understanding in Deployment of Hadoop Clusters Using Automated Puppet scripts
- Experienced in hardware recommendations, performance tuning and benchmarking
- Experienced in IP Management (IP Addressing, Sub-netting, Ethernet Bonding, Static IP)
- Flexible with Unix/Linux and Windows Environments working with OS like Centos 5/6, Ubuntu 10/11 and RedHat 6/7.
- Experienced in Linux Storage Management. Configuring RAID Levels, Logical Volumes.
TECHNICAL SKILLS
Scripting Language: Shell, bash
Hadoop Ecosystem: Hadoop 2.7.x, Spark 2.1.0, MapReduce, Hive 2.1.1, Sqoop 1.99.7, Flume 1.7.0, Kafka 2.1.0, Oozie 3.1.3, Yarn 0.21.3, Pig 0.14, Zookeeper 3.4.6
Database: MySQL 5.x, Oracle 11g, HBase 1.3.0, Cassandra 3.10, PL/SQL 11g, MS SQL Server.
Hadoop Distributions: Cloudera, Hortonworks, AWS EMR, Apache Hadoop
IDE Application: Eclipse 4.6, Net beans
Collaboration: Git 2.12.0, Scala Test 3.0.1
Operating Systems: Windows10, Mac OS, Ubuntu, Centos, Red hat
Data Analysis & Viz: Tableau
Cloud Environment: AWS EC2, S3, IAM, VPC, ROUTE53, Cloud watch, Cloudera FormationWeb Services RESTful, SOAP
PROFESSIONAL EXPERIENCE
Confidential, CA
Hadoop Admin
Responsibilities:
- Provided infrastructure support for multiple clusters like Production(Prod), Pre-Production(Pre-prod), Quality (QA) and Disaster Recovery(DR)
- Installed and configured Hadoop cluster across various environments through Cloudera Manager
- Installed and configured MYSQL and Enabled High Availability.
- Installed and configured Sentry server to enable schema level Security.
- Installed and configured Hadoop services HDFS, Yarn, MapReduce, Spark, HBase, Oozie, Hive, Sqoop, Flume, Kafka and Sentry.
- Configured Fair schedulers in cluster, created resource pools, and dynamic resource allocation of resources during regular monitoring of resource intensive jobs
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services
- Day to day responsibilities includes solving hadoop developer issues and providing instant solution to reduce the impact and documenting the same and preventing future issues.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experienced in upgrades, patching, Rolling Upgrades activities without any data loss and with proper backup plans.
- Integrated external components like Tibco and Tableau with Hadoop using Hive server2.
- Implemented HDFS snapshot feature, migrated data across clusters using DISTCP.
- Performed both major and minor upgrades to the existing Cloudera Hadoop cluster.
- Integrated Hadoop with Active Directory and enabled Kerberos for Authentication.
- Build a new sandbox cluster for the testing purpose and move data from secure cluster to insecure sandbox cluster by using a tool DISTCP (distributed copy).
- Installed Kafka cluster with separate nodes for brokers.
- Performed Kafka operations on regular basis.
- Expertise in Performance tuning and optimized Hadoop clusters to achieve high performance.
- Implemented schedulers on the Resource Manager to share the resources of the cluster.
- Monitoring Hadoop Clusters using Cloudera Manager and 24x7 on call support.
- Expertise in implementation and designing of disaster recovery plan for Hadoop Cluster.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Prepared System Design document with all functional implementations.
- Worked with SQOOP import and export functionalities to handle large data set transfer between traditional databases and HDFS.
- Experience in working with Amazon EC2, S3, Glaciers.
- Experience in creating life cycle policies in AWS S3 for backups to Glaciers
- Created monitors, alarms and notifications for EC2 hosts using Cloudwatch.
- Creating new IAM users & groups, defining roles and policies and Identity providers.
- Experience in assigning MFA in AWS using IAM and s3 buckets
- Defined AWS Security Groups which acted as virtual firewalls that controlled the traffic allowed to reach one or more AWS EC2 instances using default and custom Virtual Private Cloud (VPC) to create private cloud environments with public and private subnets
Environment: Hadoop Hdfs, Mapreduce, Hive, Pig, Oozie, Sqoop, Cloudera Manager, Storm, AWS S3, Ec2, IAM, Zookeeper, spark
Confidential
Hadoop Admin
Responsibilities:
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
- Conducting RCA to find out data issues and resolve production problems.
- Worked on Hive optimization techniques to improve the performance of long running jobs.
- Experienced in managing and reviewing Hadoop log files
- Worked with Sqoop in Importing and exporting data from different RDMS into HDFS and Hive.
- Worked on setting up HA for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Experience on HBase High availability and manually tested using failover tests.
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Experience in upgrading the cluster to newer versions of CDH 5.8.2 and CM 5.9.1
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Falcon, Smartsense, Storm, Kafka and Spark.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Continuous monitoring and managing the Hadoop cluster through Cloudera manager.
- Created user accounts and given users the access to the Hadoop cluster.
Environment: Hadoop Hdfs, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, spark, Cloudera Manager.
Confidential, Westbury, NY
Hadoop Admin
Responsibilities:
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Site scope for monitoring and Alerting.
- Launching and Setup of HADOOP Cluster on physical servers, which includes configuring different components of HADOOP.
- Created a local YUM repository for installing and updating packages.
- Developed data pipelines that ingests data from multiple data sources and process them.
- Expertise in Using Sqoop to connect to the ORACLE, MySQL, SQL Server, TERADATA and move the pivoted data to Hive tables or HBase tables.
- Implemented Kerberos authentication-KDC server setup, creating realm /domain, managing principles, generating key tab file for each service and managing key tab using key tab tools.
- Configured Knox for perimeter security and Ranger for granular access in the cluster.
- Configured and installed several Hadoop clusters in both on-premises and AWS cloud for POCs.
- Configured and deployed hive metastore using MySQL and thrift server.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Extensively used Sqoop to move the data from relational databases to HDFS.
- Used Flume to move the data from web logs onto HDFS.
- Used Pig to apply transformations, validations, cleaning and deduplication of data from sources.
- Actively monitored the Hadoop Cluster with Hortonworks distribution with HDP 2.4.
- Performed various configurations, which includes, networking and Iptables, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
- Worked on performing minor upgrade from HDP 2.2.2 to HDP 2.2.4
- Upgraded the Hadoop cluster from HDP 2.2 to HDP 2.4 and HDP 2.4 to HDP 2.5
- Integrated BI tool Tableau to run visualizations over the data.
- Solving hardware related Issues Ticket assessment on daily basis.
- Automate administration tasks using scripts and Job Scheduling using CRON.
- Provided 24 x 7 on call support as part of a scheduled rotation with other team members
Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, OOZIE, SQOOP, AMBARI, STORM, AWS S3, EC2, IDENTITY ACCESS MANGEMENT, ZOOKEEPER, NIFI
Confidential
DBA
Responsibilities:
- Maintained SQL Server DB; ensured accuracy of information and automated numerous functions.
- Fine tuning of database objects and server to ensure efficient data retrieval.
- Monitor and optimize system performance using SQL Profiler and DB Engine Tuning Advisor.
- Designed and implemented incremental and full back up policies and procedures.
- Created and implemented database design solutions in collaboration with programming team.
- Developed user defined functions and triggers to implement the requirements of the business.
- Performed database logical and physical design, maintenance, tuning, archiving, backups, replication, recovery, software upgrades, capacity planning and optimization for SQL Server database.
- Database consistency checks using DBCC utilities, Performance Baselining, Performance Tuning
- Production level support for onsite and offshore Clients.
- SQL Server Performance Dashboard Reports for Monitoring.
- Database Security management
Confidential
SQL Developer
Responsibilities:
- Developed stored procedures, functions and database triggers. Maintained referential integrity and implemented complex business logic.
- Involved in installation and configuration of SQL server 2005 with latest service packs.
- Created and executed SSIS packages to populate data from the various data sources.
- Created SSIS packages using SSIS designer for export heterogeneous data from OLEDB Source (Oracle), Excel spreadsheet to SQL Server 2005.
- Migrated DTS packages to SSIS packages and modified those packages.
- Designed ETL packages dealing with different data source and loaded the data into target data sources by performing different kinds of transformations using SSIS.
- Experience in creating multiple reports (SSRS) in Drill mode using tables, crosstabs, and charts. Design, deployment and maintenance of various SSRS in SQL Server 2005.
- Designed and implemented parameterized and cascading parameterized reports using SSRS.
- Managed the security of servers, creating the new logins and users, changing roles of users.
- Involved in developing logical and physical model of database using Erwin.