Hadoop Admin Resume
San Jose, CA
SUMMARY:
- 7+ years of professional IT experience which includes proven Around 5 years of experience in Hadoop Administration on Cloudera (CDH), Hortonworks (HDP) Distributions, Vanilla Hadoop and MapR.
- Strong experience in AWS, Kafka, ElasticSearch, Devops and 2+ years in Linux Administration. Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
- Experience in Hadoop infrastructure which include Map reduce, Hive, Oozie, Scoop, Hbase, hive, Pig, HDFS, Yarn, Hbase, HUE, Spark, Kafka,Key - value store Indexer in direct Client role.
- Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux 4, 5 and 6, familiar with Solaris 9 &10.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Strong knowledge of Hadoop platforms and other distributed data processing platforms.
- Experience in performing minor and major upgrades.
- Experience in performing commissioning and decommissioning of data nodes on Hadoop cluster.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, Sqoop automation.
TECHNICAL SKILLS:
HDFS, Map Reduce, Pig, Hive, Hbase, Sqoop, Zookeeper, Oozie, Hue, HCatalog, Storm, Kafka, Key Value Store Indexer, and Flume.
MySQL, Oracle 8i/9i/10g, SQL Server, PL/SQL.
Hbase, Cassandra, Cloudera Impala, Mongo DB.
HDP Ambari, Cloudera Manager, Hue, SolrCloud.
Shell scripting, HTML scripting, Puppet, Ansible.
Apache Tomcat, JBOSS and Apache Http web server.
Net Beans, Eclipse, Visual Studio, Microsoft SQL Server, MS Office.
Kerberos, NagiOS & Ganglia
Java, HTML, MVC, Struts, Hibernate, Servlet, spring, Web services.
Windows XP, 7, 8, UNIX, MAC, MS DOS.
PROFESSIONAL EXPERIENCE:
Hadoop Admin
Confidential - San Jose, CA
Responsibilities:
- Configuring, Maintaining, and Monitoring Hadoop Cluster using Cloudera Manager (CDH5) distribution.
- Responsible for Cluster configuration maintenance and troubleshooting and tuning the cluster.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required. Point of Contact for Vendor escalation
- Cloudera Manager Up gradation from 5.8 to 5.9 versions.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
- Expertise in Capacity Planning, Configuration, Operationalizing of small to medium sized BIGDATA Hadoop Clusters
- Responsible for building scalable distributed data solutions using Hadoop.
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
- Integrating Hadoop cluster with Kerberos authentication for secured authentication & authorization of Hadoop cluster and monitored the connectivity
- Built & Deployed Hadoop clusters with different Hadoop components (HDFS, YARN, HBASE and ZOOKEEPER)
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Deployed Puppet, Puppet Dashboard, and Puppet DB for configuration management to existing infrastructure.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduces jobs given by the users.
- Designed the cluster so that only one secondary name node daemon could be run at any given time.
- Rack Aware Configuration, Configuring Client Machines Configuring, Monitoring and Management Tools
- Having knowledge on Installation and configuration of Cloudera Hadoop on single or cluster environment.
- Worked on NoSQL databases including HBase, Mongo DB, and Cassandra. Implemented multi-data center and multi-rack Cassandra cluster.
- Partitioned and queried the data in Hive for further analysis by the BI team. Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop. Utilized Kafka and Flume to gain real-time data stream and save it in HDFS and HBase from the different data sources.
- Troubleshoot and debug Hadoop eco system runtime issues and recovering from node failures and common Hadoop issues.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Setting up Hadoop clusters (Hortonworks/Cloudera) and performing upgrades, configuration changes of hadoop clusters. Working with Linux commands to maintain Linux RedHat servers and adding them to the Hadoop environment to perform various data operations.
- Spinning clusters in Azure using Cloudera director, Implemented this for POC for the cloud migration project.
- Migration of Existing Infrastructure on Azure Service Manager (ASM) to Azure Resource Manager (ARM).
- Convert VMware (vmdk) to Azure (vhd) using Microsoft Virtual Machine Converter (MVMC).
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Cloudera Manager Enterprise.
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Working as hadoop administrator in MapR hadoop distribution for 5 clusters ranges from POC clusters to PROD clusters contains more than 1000 nodes.
- Build Big Data processing containers & pipelines using Docker/ECS/ECR and Kinesis for data acquisition and transformation, NOSQL/DynamoDB for data persistence and RDS/Postgress and Redshift for reporting data marts.
- Strong in databases like MySQL, Teradata, Oracle, MS SQL.
Hadoop Admin
Confidential - Flint, MI
Responsibilities:
- Experienced as admin in Hortonworks (HDP 2.5.3) distribution for 5 clusters ranges from POC to PROD.
- Cluster capacity planning depend upon the data usage
- Design and configure the Baston-Edge node configuration
- Design and configure HA of Hive &Hbase services
- Identify the root casue of zookeeper and spark logs (spark and zookeeper log are .out only) and collected all log files and integrated to cloud watch (AWS-ec2)
- Changed the zookeeper and journal node edit directories (zookeeper and journal nodes has multiple directories)
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage &review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Automated all the jobs from pulling data from databases to loading data into SQL server using shell scripts.
- Experienced on adding/installation of new components and removal of them through Ambari.
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business.
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures.
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
- Changing the configurations based on the requirements of the users for the better performance of the jobs.
- Experienced in Ambari-alerts(critical & warning) configuration for various components and managing the alerts.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Good troubleshooting skills on Hue, which provides GUI for developer's/business users for day to day activities.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented complex Map Reduce programs to perform joins on the Map side using distributed cache
- Implemented Name Node HA in all environments to provide high availability of clusters.
- Involved in snapshots and mirroring to maintain the backup of cluster data and even remotely.
- Experienced in managing and reviewing log files. (identify the maxbackup index and maxbackup size of vlog4j properties of all services in Hadoop)
- Helping the users in production deployments throughout the process.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- As an admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new environments and expand existing clusters.
- Monitored multiple clusters environments using AMBRI Alerts, Metrics.
Hadoop Admin
Confidential
Responsibilities:
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- In depth understanding of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Resource Manager, Node Manager and YARN / Map Reduce programming paradigm.
- Work closely with our partners and clients to develop and support ongoing API integrations.
- Extensively worked on commissioning and decommissioning of cluster nodes, file system integrity checks and maintaining cluster data replication.
- Setup monthly cadence with Hortonworks to review upcoming releases and technologies and review issues or needs.
- Configured Journal nodes and Zookeeper Services for the cluster using Hortonworks.
- Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
- Responsible for efficient operations of multiple Cassandra clusters.
- Implemented Python script which calculates the cycle time from the Rest API and fix the wrong cycle time data in Oracle database.
- Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Involved in developing new work flow Map Reduce jobs using Oozie framework.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Consumed REST based Micro services with Rest template based on RESTful APIs.
- Involved and experienced in Cassandra cluster connectivity and security.
- Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
- Setting up HDFS Quotas to enforce the fair share of computing resources.
- Strong Knowledge in Configuring and maintaining YARN Schedulers (Fair, and Capacity)
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Experience in projects involving movement of data from other databases to Cassandra with basic knowledge of Cassandra Data Modeling.
- Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
- Involved in setting up HBase which includes master and region server configuration, High availability configuration, performance tuning and administration.
- Created user accounts and provided access to the Hadoop cluster.
- Involved in loading data from UNIX file system to HDFS.
- Worked on ETL process and handled importing data from various data sources, performed transformations.
Linux/Unix Admin
Confidential
Responsibilities:
- Installation of Red Hat Enterprise Linux 4.x, 5.x using Kickstart and PXE on HP Blade Servers.
- Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux
- Performed administration and monitored job processes using associated commands
- Manages systems routine backup, scheduling jobs and enabling cron jobs
- Maintaining and troubleshooting network connectivity
- Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
- Installation and configuration of MySQL on Windows Server nodes.
- Configures DNS, NFS, FTP, remote access, and security management, Server hardening
- Installs, upgrades and manages packages via RPM and YUM package management
- Logical Volume Management maintenance
- Experience administering, installing, configuring and maintaining Linux
- Creates Linux Virtual Machines using VMware Virtual Center
- Administers VMware Infrastructure Client 3.5 and vSphere 4.1
- Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
- Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
- Supporting infrastructure environment comprising of RHEL and Solaris.
- Installation, Configuration, and OS upgrades on RHEL 5.X/6.X/7.X, SUSE 11.X, 12.X.
- Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
- Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
- Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.