Sr. Hadoop Administrator Resume
Minneapolis, MN
SUMMARY
- Around 8+years of experience including 4 years in Hadoop and related technologies.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, YARN programming paradigm.
- Knowledge in installing, configuring, and using Hadoop ecosystem components like
- Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Hcatalog, Hue, Impala, Mahout, Sentry, Zookeeper, Sqoop, Pig, Flume, and Whirr.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java. .
- Extending Hive and Pig core functionality by writing custom UDFs.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia .
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop Distribution - CDH3, CDH 4 and CDH 5 clusters, Hortonworks HDP 1.3, HDP 2.1 clusters.
- Experience in configuring Zookeeper to provide Cluster coordination services.
- Loading logs from multiple sources directly into HDFS using tools like Flume.
- Good experience in performing minor and major upgrades.
- Experience in benchmarking, performing backup and recovery of Name node metadata and data residing in the cluster.
- Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
- Adept at configuring Name Node High Availability.
- Worked on Disaster Management with Hadoop Cluster.
- Worked with Puppet for application deployment.
- Hands on experience in application development using Java, RDBMS, and Knowledge in Linux shell scripting.
- Worked on Amazon Web Services(AWS)
- Experience with distributed computation tools such as Apache Spark, Hadoop.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Experience in Shell (bash,sh), hive, sqoop and Pig Latin scripting.
- Extensively worked on database applications using DB2, Oracle, SQL*Plus, PL/SQL, SQL*Loader
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
SKILLS
Hadoop Eco-system: Spark,Hive, Pig, Sqoop, Flume, HBase, Hcatalog, Hue, Impala, Mahout, Oozie, Whirr, Sentry and Zookeeper,.
Operating System: Linux (RHEL, Ubuntu, CentOs), Windows (XP/7/8), VMWare
Languages: Java, C / C++, Shell scripting (bash)
Databases: MySQL, Cassandra, HBase
Tools: Rational Rose, Eclipse, NetBeans,BI & EDW
Web development: HTML, XML, CSS
Monitoring Tools: Cloudera Manager, Ganglia, Nagios
WORK EXPERIENCE:
Confidential Minneapolis, MN
SR. HADOOP ADMINISTRATOR
Responsibilities:
- Worked on performing major upgrade of cluster from CDH3u6 to CDH4.2.0.
- Implemented Name node High Availability on the Hadoop cluster to overcome single point of failure.
- Installed Cloudera Manager on an already existing Hadoop cluster.
- Involved in efficiently collecting and aggregating large amounts of streaming log data into Hadoop Cluster using Apache Flume.
- User behavior and their patterns were studied by performing analysis on the data stored in HDFS using Hive.
- Launched R-statistical tool for statistical computing and Graphics.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, and testing HDFS, Hive.
- Supporting developers for deploying their jobs.
- Installed Redis and Configured HA for it. Previously they used RabbitMQ for mirroring the data but they upgraded to Redis.
- Geico uses distribution IBM BigInsights
- Installed and Configured GDEfor encrpting the data
- Upgraded BigInsights cluster from 2.1 to 4.0.
- Cluster maintenance as well as creation and removal of nodes.
- Monitor Hadoop cluster connectivity and security
- Manage and review Hadoop log files.
- File system management and monitoring.
- Used HiveQL to write Hive queries from the existing SQL queries.
- The analyzed data mined from huge volumes of data was exported to MySQL using Sqoop.
- Developed custom MapReduce programs and custom User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
- Involved in installing and configuring Kerberos to implement security to the Hadoop cluster and providing authentication for users.
- Worked on installation of DataStax Cassandra cluster.
- Worked with Big Data Analysts, Designers and Scientists in troubleshooting map reduce job failures and issues with Hive, Pig, and Flume etc.
Environment: Hadoop, Cloudera, Hive, Oozie, Sqoop, Flume, Cloudera Manager, Shell Script.
Confidential, Charlotte, NC
HADOOP ENGINEER
Responsibilities:
- Upgrading the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster Integrate the HIVE with existing applications
- Gather the business requirements from the Business Partners ad Subject Matter Experts.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
- Involve in installing Hadoop Ecosystem components and NoSQL ( HBase )
- Manage/Monitor Hadoop and Ecosystem tools.
- Responsible to manage data coming from different sources.
- Supporting Application running on Production clusters.
- Involve in HDFS maintenance and administration.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes
- Implemented Fair scheduler on the job tracker to allocate fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on the Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop
- Configured ZooKeeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume anddefined channel selectors to multiplex data into different sinks.
- Worked on developing scripts for performing benchmarking with Terasort/Teragen.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Write UDFs and Map Reduce jobs using Java and Pig Latin
- Import data using Sqoop to load data from RDBMS to HDFS on regular basis.
- Develop scripts and batch jobs to schedule various Hadoop jobs.
- Automating jobs using Oozie workflow engine to chain together Shell scripts, Flume, MapReduce jobs, Hive and pig scripts.
- Write Hive queries for data analysis to meet the business requirements.
- Create Partitioned Hive tables and work on them using Hive QL.
Enviroonment: Hadoop, MapReduce, HDFS, Hive, Java (JDK1.6), Pig, Sqoop, Impala, HCatalog, XML, MySQL
Confidential, West Chester, PA
BIG DATA ANALYST
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Designed and implemented Map Reduce jobs for analyzing the data collected by the flume server.
- Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
- Designed and implemented APIs to retrieve the data from Hadoop Platform to Web Application.
- Created Hive tables and working on them using Hive QL
- Used Apache Log4J for logging.
- Involved in Bug fixing of various modules that were raised by the Testing teams in the application during the Integration testing phase.
- Facilitated knowledge transfer sessions.
Environment: Hadoop 0.23, Cloudera CDH3, Java 1.6, Eclipse, Log4J, Pig 0.12, Hive 0.13.1, Ubuntu 12.04.x, shell scripting
Confidential
SYSTEM ADMIN
Responsibilities:
- Installing and maintaining the Red hat and Centos Linux servers.
- Installed centos using Pre-Execution environment boot and kick-start method on multiple servers.
- Responsible for performance tuning and troubleshooting Linux servers.
- Running crontab to back up data.
- Adding, removing, updating user account information, resetting passwords, etc.
- Maintaining the SQL server and Authentication to required users for databases.
- Applied Operating System updates, patches and configuration changes.
- Used different methodologies to increase the performance and reliability of the IT infrastructure.
- Responsible for System performance tuning and successfully engineered a virtual private network (VPN).
- Installing, configuring and maintaining SAN and NAS storage.
Confidential, PA
SYSTEM ANALYST/ADMIN
Responsibilities:
- Worked on Installation, configuration and upgrading of Oracle server software and related products.
- Responsible for installation, administration and maintenance of Linux servers.
- Established and maintain sound backup and recovery policies and procedures.
- Take care of the Database design and implementation.
- Implement and maintain database security (create and maintain users and roles, assign privileges).
- Performed database tuning and performance monitoring.
- Plan growth and changes (capacity planning).
- Worked as part of a team and provide 7x24 support when required.
- Performed general technical trouble shooting on trouble tickets to bring to resolution.
