Hadoop Administrator Resume
OK
SUMMARY
- 7+ years of professional experience in analysis, design, development, implementation, integration and testing of Client - Server applications using Object Oriented Analysis Design (OOAD) with 5+ years of experience in deploying, maintaining, monitoring and upgradingHadoopClusters. (ApacheHadoop, Cloudera)
- Experience with managing, troubleshooting and security networks
- Experience working with, extending, and enhancing monitoring systems like Nagios.
- Storage experience with JBOD, NFS, SAN and RAID
- Hands on experience usingHadoopecosystem components likeHadoopMap Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig.
- Worked on Multi Clustered environment and setting up Cloudera Hadoopecho-System.
- In-depth understanding of Data Structure and Algorithms.
- Experience in setting up monitoring tools like Nagios and Ganglia forHadoop.
- Experience in Importing and exporting data from different databases like SQL Server, Oracle into HDFS and Hive using Sqoop.
- Experience in configuring Zookeeper to provide Cluster coordination services.
- Experience in providing security forHadoopCluster with Kerberos.
- Experience in setting up the High-AvailabilityHadoopClusters.
- Experience in writing UDFs for Hive and Pig.
- Experience in upgrading the existingHadoopcluster to latest releases.
- Working experience of Amazon Web Services (AWS) like S3.
- Experience in developing Shell Scripts for system management.
- Setting up and maintaining NoSQL Databases like Cassandra, MongoDB.
- Experience is deploying, Cassandra cluster (Apache Cassandra, Datastax).
- Monitoring a Cassandra Cluster using OpsCenter
- Ability to diagnose network problems
- Understanding of TCP/IP networking and its security considerations
- Excellent in communicating with clients, customers, managers, and other teams in the enterprise at all levels.
- Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines.
- Motivated to produce robust, high-performance software.
- Experience in Software Development Lifecycle (SDLC), application design, functional and technical specs, and use case development using UML.
- Client interaction for requirement gathering/analysis and Testing.
- Excellent programming skills in writing/maintaining, Oracle and PL/SQL procedures, triggers, cursors, views, and complex SQL Queries.
- Ability to learn and use new technologies quickly.
TECHNICAL SKILLS
Bigdata Technologies: MapReduce, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, Hbase
Bigdata Frameworks: HDFS, YARN, Spark
Hadoop Distributions: Cloudera(CDH3, CDH4, CDH5), Hortonworks
Programming Langauges: Core Java, Shell Scripting, PowerShell
Operating Systems: Windows 98/2000/XP/Vista/NT/ 8.1, Redhat Linux/Centos 4, 5, Unix
Database: Oracle 10g/11g, T-SQL, MongoDB, PL/SQL
ETL Stack: Informatica 8x/9x, SSIS, SSRS, SSAS, IBM BigInsights
Business Modeling tools: UML, MS office, Remedy, Service Now, MS-Visio
PROFESSIONAL EXPERIENCE
Confidential, OK
Hadoop Administrator
Responsibilities:
- Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
- Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
- Provided Hadoop, OS, Hardware optimizations.
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Installed and configured Hadoop ecosystem components like MapReduce, Hive, Pig, Sqoop, HBase, ZooKeeper and Oozie.
- Involved in testing HDFS, Hive, Pig and MapReduce access for the new users.
- Cluster maintenance as well as creation and removal of nodes using Cloudera and Hortonworks Manager Enterprise.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Implemented Fair scheduler on the job tracker to allocate fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on the Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
- Configured ZooKeeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Worked on developing scripts for performing benchmarking with Terasort/Teragen.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Involved with statistical computing tools like IBM InfoSphere BigInsights for data visualization, sampling, analysis, model selection, estimator bias, calculated variance, and significance testing.
- Backed up data on regular basis to a remote cluster using DistCp.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
- Installed and maintain puppet-based configuration management system.
- Deployed Puppet, Puppet Dashboard, and PuppetDB for configuration management to existing infrastructure.
- Using Puppet configuration management to manage cluster.
Environment: Hadoop HDFS, Horton Works, Mapreduce, Hive, Pig, Oozie, Sqoop, Cloudera Manager
Confidential, FL
Hadoop Administrator
Responsibilities:
- Hands on experience with Apache & Hortonworks Hadoop Ecosystem components such as Scoop, Hbase and Mapreduce.
- Instillation/configuration and troubleshooting to Apache Hadoop cluster (48+nodes) for application development.
- Installation, monitoring, managing, troubleshooting, applying patches in different environments such as Development Cluster, Test Cluster and Production environments.
- Excellent understanding / knowledge ofHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce.
- Experience in managingHadoopinfrastructure like commissioning, decommissioning, rack topology implementation.
- Experience in managing the cluster resources by implementing fair scheduler and capacity scheduler.
- Experience in Implementing High Availability of Name Node andHadoopCluster capacity planning
- Developed automated scripts using Unix Shell for running Balancer, file system health check and User/Group creation on HDFS.
- Experience in managing and reviewingHadooplog files.
- Experience in upgradingHadoopcluster from current version to minor version upgrade as well as to major versions.
- Experience in designing and building disaster recovery planning across the data centers to provide business continuity.
- Experience in Complete development life (SDLC) cycle from gathering Requirements, Testing, Implementation and Post Implementation Support.
- Expertise in design and implementation of Slowly Changing Dimensions (SCD) type1, type2 and type3.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Help design of scalable Big Data clusters and solutions.
- Implemented Fair schedulers to share the resources of the cluster for the map reduce jobs given by the users.
- Experience on writing automation scripts and setting up cron jobs to maintain cluster stability and healthy.
- Monitoring and controlling local file system disk space usage, local log files, cleaning log files with automated scripts.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration XML files.
- Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
- Work with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
- Functioned with expert consulting resources around Hadoop integration points with any ETL, BI, and EDW teams
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- Provided analytic support to marketing, finance, product development, fraud, customer service, engineering and operations using IBM InfoSphere BigInsights.
- Production support responsibilities include cluster maintenance and on call support on weekly rotation 24/7.
Environment: RHEL, CentOS, Ubuntu, CDH3, Apache Hadoop, HDFS, Map, Reduce, Hbase, Shell Scripts
Confidential, IA
Hadoop Administrator
Responsibilities:
- Provided and administered from the ground up - Apache Hadoop & Cloudera clusters for BI and other product/systems development.
- Worked in the administration activities in providing installation, upgrades, patching and configuration for all hardware and software Hadoop components.
- Work closely with other administrators (database, storage, Windows) and developers to keep the Hadoop environment running efficiently and effectively.
- Configured and implemented Apache Hadoop technologies i.e, Hadoop distributed file system (HDFS), MapReduce framework, Pig, Hive, HCatalog, Sqoop, Flume.
- Helped in setting up Rack topology in the cluster.
- Helped in the day to day support for operation.
- Worked on performing minor upgrade from CDH3-u4 to CDH3-u6
- Upgraded the Hadoop cluster from CDH3 to CDH4.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Network file system for Name Node Metadata backup.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitoring workload, job performance and capacity planning using Apache Hadoop.
- Worked with application teams to install OS level updates, patches and version upgrades required for Hadoop cluster environments.
- Created a local YUM repository for installing and updating packages.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Configured and deployed hive metastore using MySQL and thrift server.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment
- Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Worked with various tools and technologies to improve cluster performance, ETL & BI environments.
- Designed and allocated HDFS quotas for multiple groups.
- Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
- Solutions design and implementation with NoSQL databases such as MongoDB, Cassandra.
Environment: Linux, HDFS, Pig, Hive, Sqoop, Mapreduce, Kdc, Nagios, Ganglia, Oozie, Sqoop, Apache Hadoop, Cloudera
Confidential
Hadoop Administrator
Responsibilities:
- Worked on Administration, monitoring and fine tuning on an existing Cloudera Hadoop Cluster used by internal and external users as a Data and Analytics as a Service Platform.
- Worked on Cloudera Cluster backup and recovery, performance monitoring, load balancing, rebalancing and tuning, capacity planning, and disk space management
- Assisted in designing, development and architecture of HADOOP and HBase systems.
- Coordinated with technical teams for installation of HADOOP and third related applications on systems.
- Formulated procedures for planning and execution of system upgrades for all existing HADOOP clusters.
- Supported technical team members for automation, installation and configuration tasks.
- Suggested improvement processes for all process automation scripts and tasks.
- Provided technical assistance for configuration, administration and monitoring of HADOOP clusters.
- Conducted detailed analysis of system and application architecture components as per functional requirements.
- Participated in evaluation and selection of new technologies to support system efficiency.
- Assisted in creation of ETL processes for transformation of data sources from existing RDBMS systems.
- Designed and developed scalable and custom HADOOP solutions as per dynamic data needs.
- Coordinated with technical team for production deployment of software applications for maintenance.
- Provided operational support services relating to HADOOP infrastructure and application installation.
- Supported technical team members in management and review of HADOOP log files and data backups.
- Participated in development and execution of system and disaster recovery processes.
- Formulated procedures for installation of HADOOP patches, updates and version upgrades.
- Automated processes for troubleshooting, resolution and tuning of HADOOP clusters.
Environment: Cloudera 4.X, Java, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Oozie and Hbase, ETL
Confidential
UNIX Administrator
Responsibilities:
- Installing and upgrading OE & Red hat Linux and Solaris 8/ & SPARC on Servers like HP DL 380 G3, 4 and 5 & Dell Power Edge servers.
- Experience in LDOM's and Creating sparse root and whole root zones and administered the zones for Web, Application and Database servers and worked on SMF on Solaris 5.10.
- Experience working in AWS Cloud Environment like EC2 & EBS.
- Implemented and administered VMware ESX 3.0, for running the Windows, Centos, SUSE and Red hat Linux Servers on development and test servers.
- Installed and configured Apache on Linux and Solaris and configured Virtual hosts and applied SSL certificates.
- Implemented Jumpstart on Solaris and Kick Start for Red hat environments.
- Experience working with HP LVM and Red hat LVM.
- Experience in implementing P2P and P2V migrations.
- Involved in Installing and configuring Centos & SUSE 11 & 12 servers on HP x86 servers.
- Implemented HA using Red hat Cluster and VERITAS Cluster Server 4.0 for Web Logic agent.
- Managing DNS, NIS servers and troubleshooting the servers.
- Troubleshooting application issues on Apache web servers and also database servers running on Linux and Solaris.
- Experience in migrating Oracle, MYSQL data using Double take products.
- Used Sun Volume Manager for Solaris and LVM on Linux & Solaris to create volumes with layouts like RAID 1, 5, 10, 15.
- Re-compiling Linux kernel to remove services and applications that are not required.
- Performed performance analysis using tools like prstat, mpstat, iostat, sar, vmstat, truss and Dtrace.
- Experience working on LDAP user accounts and configuring LDAP on client machines.
- Worked on patch management tools like Sun Update Manager.
- Experience supporting middle ware servers running Apache, Tomcat and Java applications.
- Worked on day to day administration tasks and resolve tickets using Remedy.
- Used HP Service center and change management system for ticketing.
Environment: Redhat Linux/CentOS 4, 5, Logical Volume Manager, Hadoop, VMware ESX 3.0, Apache and Tomcat Web Server, Oracle 9g, Oracle RAC, HPSM, HPSA.
