Sr. Hadoop Admin Resume
Irving, TexaS
SUMMARY:
- 7+ years of professional experience in IT industry as a Linux/Hadoop Administrator and production support of various applications on Red Hat Enterprise Linux, Sun Solaris, windows, Cloudera, Hortonworks, and MapR distribution of Hadoop environment.
- 4+ years of experience in Configuring, installing, benchmarking and managing Apache Hadoop in various distributions like Cloudera, Hortonwoks, and MapR.
- 3+ years of experience in installing, patching, upgrading and configuring Linux based operating systems like Redhat, Centos, RHEL on a large set of clusters.
- Experience in full lifecycle development process including planning, design, development, testing and implementation of moderate to advanced complexity systems.
- Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS, Namenode, Datanode, MapReduce, Yarn, Job tracker, Task tracker, Node manager, Resource manager, Application master.
- Hands on experience in installing, configuring Hadoop ecosystem components like MapReduce, spark, HDFS, HBase, Oozie, Hive, Hcatalog, Pig, impala, zookeeper, Kafka, sqoop.
- Experience in improving the Hadoop cluster performance by considering the OS kernel, storage, Networking, Hadoop HDFS and MapReduce by setting appropriate configuration parameters.
- Experience in Installation and configuration, Hadoop Cluster Maintenance, Cluster Monitoring and Troubleshooting. Experience in tuning the performance of Hadoop eco system as well as monitoring.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure - KDC server setup, creating and managing the real domain
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in securing Hadoop Cluster using sentry
- Experience in importing and Exporting the data using sqoop from relational database/mainframe to HDFS and vice versa. And logs using Flume.
- Experience in configuring Zookeeper to provide high availability and Cluster service co-ordination
- Strong knowledge in Name Node High Availability and recovery of Name node metadata and data residing in the cluster.
- Experience in performing minor and major upgrades on Hadoop cluster and commissioning/decommissioning of Data nodes.
- Strong knowledge and Experience in benchmarking and performing backup/disaster data recovery of Namenode and other important data on cluster.
- Expertise knowledge and hands on experience in importing and exporting of data from different sources using different services like sqoop, flume.
- Experience in configuration and management of security for Hadoop cluster using Kerberos and integration with LDAP/AD at an Enterprise level.
- Experience with scripting languages like shell, python and java script .
- Creating and maintaining user accounts, profiles, security, disk space and process monitoring.
- Experience in Logic Volume manager (LVM), Creating new file systems, mounting file systems and unmounting file systems.
- Expert in Installing, configuring and maintaining Apache/Tomcat, samba, sendmail, Web Sphere Application Servers .
- Experience in Networking Concepts , DNS, NIS, NFS and DHCP, troubleshooting network problems such as TCP/IP , providing support for users in solving their problems.
- Experience in using automation tools like Puppet, and Chef.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Operating Systems: RedHat Linux 5.X, 6.X, Windows 95, 98, NT, 2000, Windows Vista, 7,Centos.
Scripting: Unix Shell Script, Python, JavaScript.
Hadoop Eco Systems: Hive, Pig, Flume, Oozie, Sqoop, Spark, Kafka, Impala, zookeeper, Mahout, Hcatalog and HBase.
Monitoring Tools: Ganglia, Nagios, Cloud watch.
Database: Oracle (SQL) 10g, MYSQL, SQL SERVER 2008, Teradata, PostgreSQL.
Configuration Management Tools: Puppet, Chef
Management: Cloudera Manager, Ambari Hortonworks, MapR
Other Software s: MS Office, Packet Tracer, ns-2, SQL Developer, HP Fortify, Eclipse.
PROFESSIONAL EXPERIENCE:
Confidential,Irving,Texas
Sr. Hadoop Admin
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop .
- Installed and configured Hadoop clusters and eco-system components like spark, Hive, Scala, Yarn, Map Reduce and Hbase.
- Developed automated scripts to install Hadoop clusters.
- Responsible for building scalable distributed data solutions using Hadoop .
- Work with business stakeholders to understand requirements / business use cases.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration .
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Loading data from large data files into Hive tables and also Hbase NoSQL databases.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Hadoop YARN, Spark Streaming, Spark SQL, Scala, Python, Hive, Hbase, Golden Gate, Elastic Search, Tableau, TWS, Jenkins, MapR 5.2.1, Oracle 12c, Linux.
Confidential, New Brunswick,NJ
Hadoop Admin
Responsibilities:
- Installed and configured various components of Hadoop ecosystem and maintained their integrity on Cloudera.
- Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
- Experience in installing, configuring and optimizing Cloudera Hadoop version CDH4 and Hortonworks (HDP 2.2.4.2) in a Multi Clustered environment.
- Performed benchmark test on Hadoop clusters and tweak the solution, based on test results.
- Experience in dealing with structured, semi-structured and unstructured data in Hadoop.
- Experienced on adding/installation of new components and removal of them through Cloudera Manager.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
- Monitoring systems and services through Cloudera manager dashboard to make the clusters available for the business.
- Involved in upgrading clusters to Cloudera Distributed versions and deployed into CDH5.
- Extensively working on Spark using Python for testing and development environments.
- Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
- Changing the configurations based on the requirements of the users for the better performance of the jobs.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Experience in configuring the Storm in loading the data from MYSQL to HBASE.
- Used IMPALA to write and query the Hadoop data in HDFS or HBase and pull the data from Hive tables.
- Monitored multiple clusters environments using Cloudera Manager Alerts, Metrics and Nagios.
- Setup Flume for different sources to bring the log messages from outside to Hadoop HDFS.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts
Environment: Cloudera distribution of Hadoop 5(CDH 5), Cloudera Manager, Hotornworks, Ambari Spark, HDFS, Map Reduce, YARN, Pig, Hive, HBase, Flume, Sqoop, Windows 2000/2003, Unix, Linux, Shell Scripting., Oozie, Oracle 10g, MySQL, Impala, Nagios, Sentry, Kerberos.
Confidential
Hadoop Admin
Responsibilities:
- Experience in Cluster Planning, Performance tuning, Monitoring and Troubleshooting the Hadoop Cluster.
- Responsible in building a Hortonworks cluster from scratch on HDP 2.x and deployed a Hadoop cluster using CDH4 integrated with Nagios and Ganglia.
- Good experience on cluster audit findings and tuning configuration parameters.
- Experience in configuring MySQL to store the hive metadata.
- Helped in setting up Rack topology in the cluster.
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
- Commissioning and Decommissioning Nodes from time to time.
- Installed and Configured Hadoop monitoring and Administrating tools: Nagios and Ganglia.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Responsible on adding/installation of new services and removal of them through Ambari.
- Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.
- Responsible in setting log retention policies and setting up of trash interval time period.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Monitoring the data streaming between web sources and HDFS.
- Installed OOZIE workflow engine to run multiple Hive and pig jobs.
- Back up of data from active cluster to a backup cluster using DISTCP.
- Worked on analyzing Data with HIVE and PIG.
- Hands on experience working on Hadoop ecosystem components like Hadoop Map Reduce, HDFS, Zookeeper, Oozie, Hive, Sqoop, Pig, Flume.
- Deployed Network file system for Name Node Metadata backup.
- Monitor Hadoop cluster connectivity and security.
- Install operating system and Hadoop updates, patches, version upgrades when required.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
Environment: Hortonworks, Ambari, HDFS, Zookeeper, Unix/Linux, HDFS, Map Reduce, YARN, Pig, Hive, HBase, Flume, Sqoop, Shell Scripting, Oozie workflow, Ambari, Kerberos, Nagios&Ganglia.
Confidential, Santa Clara,CA
Hadoop Admin
Responsibilities:
- Used Hortonworks distribution of Hadoop to store and process huge data generated from different enterprises
- Experience in installing, configuring, monitoring HDP stacks.
- Good experience on cluster audit findings and tuning configuration parameters.
- Implemented Kerberos security in all environments.
- Experience in cluster planning, performance tuning, Monitoring, and troubleshooting the Hadoop cluster.
- Responsible for cluster MapReduce maintenance tasks: commissioning and decommissioning task trackers and MapReduce jobs .
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
- Experience in configuring MySQL to store the hive metadata.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Working with data delivery teams to setup new Hadoop users . This job includes setting up Linux users, setting up Kerberos principals and testing MFS, and Hive.
- Help design of scalable Big Data clusters and solutions.
- Commissioning and Decommissioning Nodes from time to time.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts .
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
- Worked with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- Production support responsibilities include cluster maintenance
Environment: Hortonworks 2.2.1, Ambari, HDFS, Hive, Pig, Mapreduce, Sqoop, HBase, kerberos, MySQL, Shell Scripting, RedHat Linux.
Confidential, Pleasanton,CA
Hadoop admin
Responsibilities:
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4.
- Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce, Involved in defining job flows.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS. Installation SQL and DB Backup.
- Installed and configured Hive and also written Hive QL scripts.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Experienced with working on Avro Data files using Avro Serialization system.
- Solved small file problem using Sequence files processing in Map Reduce.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Performed cluster co-ordination through Zookeeper.
- Involved in support and monitoring production Linux Systems.
- Expertise in Archive logs and monitoring the jobs.
- Monitoring Linux daily jobs and monitoring log management system.
- Expertise in troubleshooting and able to work with a team to fix large production issues.
- Expertise in creating and managing DB tables, Index and Views.
- User creation and managing user accounts and permissions on Linux level and DB level.
- Extracted large data sets from different sources with different data-source formats which include relational databases, XML and flat files using ETL extra processing.
- Translating coded values, Encoded free-form values and validated the relevant data from tables and referential files for slowly changing dimensions.
Environment: Cloudera Distribution, Apache Hadoop, Cloudera manager, Mapreduce, HDFS, PIG, HIVE, ETL, HBase, Zookeeper.
Confidential
Responsibilities:
- Installation, configuration and administration of Linux (RHEL 4/5) and Solaris 8/9/10 servers.
- Maintained and support mission critical, front end and back-end production environments.
- Configured RedHat Kickstart server for installing multiple production servers.
- Provided Tier 2 support to issues escalated from Technical Support team and interfaced with development teams, software vendors, or other Tier 2 teams to resolve issues.
- Installing and partitioning disk drives. Creating, mounting and maintaining file systems to ensure access to system, application and user data.
- Maintenance and installation of RPM and YUM package installations and other server.
- Creating users, assigning groups and home directories, setting quota and permissions; administering file systems and recognizing file access problems.
- Experience in Managing and Scheduling Cron jobs such as enabling system logging, network logging of servers for maintenance, performance tuning and testing.
- Maintaining appropriate file and system security, monitoring and controlling system access, changing permission, ownership of files and directories, maintaining passwords, assigning special privileges to selected users and controlling file access
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- AWS administrator which manages AWS, and also includes to manage EC2 (Amazon Elastic Compute Cloud) resources.
- Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
- Configured several load balanced web servers using nix based load balancer like Amazons elastic load balancer.
- Build, configure, deploy, support, and maintain enterprise class servers and operating systems.
- Building Centos 5/6 Servers, Oracle Enterprise Linux 5 and RHEL (4/5) Servers from scratch.
- Organized the patch depots and act as POC for the patch related issues.
- Configuration & Installation of Red hat Linux 5/6, Cent OS 5 and Oracle Enterprise Linux 5 by using Kick Start to reduce the installation issues.
- Attended team meetings, change control meetings to update installation progress and for upcoming changes in environment.
- Handled patch upgrades and firmware upgrades on and RHEL Servers, Oracle Enterprise Linux Servers.
- User and Group administration on RHEL Systems.
- Creation of various user profiles and environment variables to ensure security.
- Server hardening and security configurations as per the client specifications.
- A solid understanding of networking/distributed computing environment concepts, including principles of routing, bridging and switching, client/server programming, and the design of consistent network-wide file system layouts.
- Strong understanding of Network Infrastructure Environment.
Environment: Red Hat Linux AIX, RHEL, Oracle 9i/10g, Samba, NT/2000 Server, VMware 2.x, Tomcat 3.x,4.x,5.x, Apache Server 1.x,2.x, MQ V5.3, V6.0,DB2 Connect, Korn, Bash.