Senior Hadoop Administrator Resume
Oldsmar, FL
SUMMARY:
- 8 years of experience in design, development and implementations of robust technology systems, with specialized expertise in Hadoop Administration and Linux Administration. Able to understand business and technical requirements quickly; Excellent communications skills and work ethics; Able to work independently.
- 4 years of experience in Hadoop Administration & Big Data Technologies and 4 years of experience into Linux administration.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks, Cloudera.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Design Big Data solutions for traditional enterprise businesses.
- Backup configuration and Recovery from a Name Node failure.
- Excellent command in creating Backups & Recovery and Disaster recovery procedures and Implementing BACKUP and RECOVERY strategies for off - line and on-line Backups.
- Involved in bench marking Hadoop/HBase cluster file systems various batch jobs and workloads.
- Making Hadoop cluster ready for development team working on POCs.
- Experience in minor and major upgrades of Hadoop and Hadoop eco system.
- Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.
- Good Experience in setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, swappiness, Selinux and installing Java.
- Good Experience in Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
- Installing and configuring Hadoop eco system like pig, hive.
- Hands on experience in Installing, Configuring and managing the Hue and HCatalog.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Experience in importing and exporting the logs using Flume.
- Optimizing performance of Hbase/Hive/Pig jobs.
- Hands on experience in Zookeeper and ZKFC in managing and configuring in NameNode failure scenarios.
- Handsome experience in Linux admin activities on RHEL & Cent OS .
- Experience in deploying Hadoop 2.0(YARN).
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack cloud platform.
TECHNICAL SKILLS:
Hadoop Framework: Hdfs, Map Reduce, Pig, Hive, Hbase, Sqoop, Zookeeper, Oozie, Hue, Hcatalog, Storm, Kafka, Spark, Key Value Store Indexer, Flume.
NoSQL Databases: Hbase
Programming Language: Java, HTML
Microsoft: MS Office, MS Project, MS Visio, MS Visual Studio 2003/ 2005/ 2008
Databases: MySQL, Oracle 8i/9i/10g, SQL Server, PL/SQL Developer.
Operating Systems: Linux, Cent OS,RHEL,Windows 2000/2003/2008/ XP/Vista
Scripting: Shell Scripting, HTML Scripting, puppet
Programming: C, C++, Core Java, PL/SQL.
WEB Servers: Apache Tomcat, JBOSS and Apache Http web server
Cluster Management Tools: HDP Ambari, Cloudera Manager, Hue, SolrCloud.
IDE: Net Beans, Eclipse, Visual Studio, Microsoft SQL Server, MS Office
PROFESSIONAL EXPERIENCE:
Senior Hadoop Administrator
Confidential, Oldsmar, FL
Responsibilities:
- Currently working as admin on Cloudera (CDH 5.5.1) distribution for 6 clusters ranges from POC to PROD.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory loading the data from local system to hdfs
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Fine tuning hive jobs for optimized performance.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF’s.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Monitoring Solr transparency’s, stat’s, cle’s dashboard and review the solr servers.
- Creating and deploying a corresponding solrCloud collection.
- Apache Solr administration and configuration experience.
- Experience with Solr integration with Hbase using Lily indexer/Key-Value Indexer.
- Creating and truncating hbase tables in hue and taking backup of submitterId(s).
- Configuring, Managing permissions for the users in hue.
- Responsible for building scalable distributed data solutions using Hadoop.
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Creating and managing the Cron jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Experience in configuring the Storm in loading the data from MYSQL to HBASE using jms
- Responsible to manage data coming from different sources.
- Involved in loading data from UNIX file system to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hdfs, Mapreduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Cdh5, Apache Hadoop 2.6, Spark, Solr, Storm, Cloudera Manager, Redhat, Mysql And Oracle.
Senior Hadoop Administrator
Confidential, Tallahassee, FL
Responsibilities:
- Worked as admin in Hortonworks distribution for 4 clusters ranges from POC to PROD.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage &review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Experienced on adding/installation of new components and removal of them through Ambari.
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business.
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures.
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
- Changing the configurations based on the requirements of the users for the better performance of the jobs.
- Experienced in Ambari-alerts configuration for various components and managing the alerts.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Good troubleshooting skills on Hue, which provides GUI for developer’s/business users for day to day activities.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented complex MapReduce programs to perform joins on the Map side using distributed cache
- Setup flume for different sources to bring the log messages from outside to Hadoop Hdfs.
- Implemented Name Node HA in all environments to provide high availability of clusters.
- Capacity scheduler implementation in all environments to provide resources based on the allocation.
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Experienced in Setting up the project and volume setups for the new projects.
- Involved in snapshots and mirroring to maintain the backup of cluster data and even remotely.
- Implementing the SFTP for the projects to transfer data from External servers to servers.
- Experienced in managing and reviewing log files.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
- Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers.
- Helping the users in production deployments throughout the process.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- As an admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new environments and expand existing clusters.
- Monitored multiple clusters environments using AMBRI Alerts, Metrics and Nagios.
Environment: Hadoop Hdfs, Mapreduce, Hive, Pig, Flume, Oozie, Sqoop, Eclipse, Hortonworks, Ambari.
Hadoop Administrator
Confidential, Carrollton, TX
Responsibilities:
- Hadoop installation, Configuration of multiple nodes using Cloudera platform.
- Major and Minor upgrades and patch updates.
- Handling the installation and configuration of a Hadoop cluster.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS.
- Monitoring the Hadoop cluster functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performed by the cluster.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
- Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
Environment: Java (JDK 1.7), Linux, Shell Scripting, Teradata, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.
Linux/ Hadoop Administrator
Confidential, Chicago, IL.
Responsibilities:
- Managing UNIX Infrastructure involves day-to-day maintenance of servers and troubleshooting.
- Provisioning Red Hat Enterprise Linux Server using PXE Boot according to requirements.
- Performed Red Hat Linux Kickstart installations on RedHat 4.x/5.x, performed Red Hat Linux Kernel Tuning, memory upgrades.
- Working with Logical Volume Manager and creating of volume groups/logical performed Red Hat Linux Kernel Tuning.
- Checking and cleaning the file systems whenever it's full. Used Logwatch 7.3, which reports server info as scheduled.
- Had hands on experience in installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
- Configured Job Tracker to assign MapReduce Tasks to Task Tracker in cluster of Nodes
- Configured Job Tracker to assign MapReduce Tasks to Task Tracker in cluster of Nodes
- Implemented Kerberos security in all environments.
- Defined file system layout and data set permissions.
- Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users
- Worked on importing the data from oracle databases into the Hadoop cluster.
- Managed and reviewed data backups and log files and worked on deploying Java applications on cluster.
- Commissioning and Decommissioning Nodes from time to time .
Environment: Red Hat Enterprise Linux 3.x/4.x/5.x, Sun Solaris 10, on Dell Power Edge servers, Hive, HDFS, MapReduce, Swoop, Hbase .
Hadoop Developer
Confidential, Philadelphia, PA
Responsibilities:
- Maintain the Datasets and load the data from legacy sources to oracle databases.
- Monitoring the Datastage environment to process the regular feed files for ETL process.
- Schedule the jobs at Tivoli with dependency to run the process.
- Integrating the entire system and filling the gap between the technology and domain knowledge as much as possible to build the automated and robust system.
- Create the Hive tables like Managed and External tables to load ETL process.
- Create HQL scripts to make the Transformations and loadings with batch wise.
- Used Flume to transport logs to HDFS
- Imported data using Sqoop to load data from Oracle to HDFS on regular basis. Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig and also written PigLatin scripts.
- Wrote MapReduce job using Pig Latin.
- Used Impala querying for ad-hoc data analysis
- Have solid understanding of REST architecture style and its application to well performing web sites for global usage.
- Involved in ETL, Data Integration and Migration
- Designed ETL flow for several newly on-boarding Hadoop Applications.
- Reviewed ETL application use cases before on boarding to Hadoop.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Written Hive queries for data analysis to meet the business requirements.
- Involved in performance activities like partitioning the tables and index on tables.
- Involved in creating custom UDF's to support extended business process.
- Coordinating with SME's and Architects to build the robust system in HIVE.
- Write and enhance the Shell scripts to support the file handlings.
Environment: HDFS, MapReduce, Hive, Pig, Impala, S qoop, Oozie, Flume, Hue, Java, MySQL, REST
Hadoop Developer
Confidential, San Francisco, CA
Responsibilities:
- Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest behavioral data into HDFS for analysis.
- Importing and exporting data into HDFS from database and vice versa using Sqoop.
- Written hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data 3
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest behavioral data into HDFS for analysis.
- Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
- Cluster co-ordination services through Zookeeper.
- Used Maven extensively for building jar files of Map Reduce programs and deployed to Cluster.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
- Got good experience with various NoSQL databases.
- Experienced with handling administration activations using Cloudera manager.
- Supported MapReduce programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
Environment: RHEL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Oozie, Mahout, HBase, Maven, Apache Hadoop, Java, JDK1.6, J2EE, JDBC, Servlets, JSP, Spring 2.0, Linux, XML,WebLogic, SOAP, WSDL, ZooKeeper, NoSQL, HBase, Map-Reduce, Cloudera, Impala, Tableau, MySQL.