Hadoop Administrator Resume
TexaS
SUMMARY:
- Around 6 years of experience in IT with over around 4 years of hands - on experience as Hadoop Administrator.
- Hands on experience in deploying and managing multi-node development, testing and production of Hadoop Cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
- Hand on experience in Big Data Technologies/Framework like Hadoop, HDFS, YARN, MapReduce, HBase, Hive, Pig, Sqoop, NoSQL, Flume, Oozie.
- Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.
- Used Namespace support to map Phoenix schemas to HBase namespaces.
- Designed and implemented database software migration procedures, and guidelines.
- Performed administrative tasks on Hadoop Clusters using HortonWorks.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Apache NiFi, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
- Experience in designing and implementation of secure Hadoop cluster using Kerberos.
- Experience with Hadoop Architecture and Big Data users to implement new Hadoop eco-system technologies to support multi-tenancy cluster.
- Skilled in monitoring servers using Nagios, Data dog, Cloud watch and using EFK Stack Elasticsearch, Fluentd Kibana.
- Implemented DB2/LUW replication, federation, and partitioning (DPF).
- Areas of expertise and include: Database Installation/Upgrade, Backup/Recovery,
- Hands on experience in installing, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
- Experience on capacity planning, hdfs management and yarn resource management.
- Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMWare and Amazon Web Services (AWS) using an EC2 instances.
- Installed and configured a Hortonworks HDP 2.3.0 using AMBARI 2.1.1 manager.
- Hands on experience in upgrading the cluster from HDP 2.0 to HDP 2.3.
- Having strong experience/expertise in different Data warehouse tools including ETL tools like DataStage, etc. and BI tools like SSRS, Tableau.
- Expertise in interactive data visualization and analyzing with BI tools like Tableau.
- Worked with Different Relational Database systems like Oracle/PL/SQL.Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Used NoSQL database with Cassandra, MongoDB, Monod and Designed table.
- Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using Zookeeper and Quorum Journal Nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem.
- Importing data from various data sources, transformation using Hive, Pig, and loaded data into HBase.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Analysed the client's existing Hadoop infrastructure to understand the performance bottlenecks and provided performance tuning accordingly.
- Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
- Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hadoop 2.2, HDFS, MapReduce, Hive, Pig, Zookeeper, Sqoop, Oozie, Yarn, Apache NiFi, Apache Phoenix, Spark, Kafka, Storm, Ambari1.0-2.1.1, Kerberos, Flume.
Hadoop Management &Security: Hortonworks, Cloudera Manager.
Web Technologies: HTML, XHTML, XML, XSL, CSS, JavaScript
Server Side Scripting: Shell, Perl, Python.
Database: Oracle 10g, Microsoft SQL Server, MySQL, DB2, SQL, RDBMS.
Web Servers: Apache Tomcat 5.x, BEA WebLogic 8.x, IBM, WebSphere 6.0/ 5.1.1
Programming Languages: C, Java, Pl SQLNO SQL
Databases: HBase, Mongo DB
Virtualization: VMware, ESXI, VSphere, VCenter Server.
SDLC Methodology: Agile (SCRUM), Waterfall.
Operating Systems: Windows 2000 Server, Windows 2000 Advanced Server, Windows Server 2003 Centos, Windows 98/XP UNIX, Linux RHEL, DB2
WORK EXPERIENCE:
Hadoop Administrator
Confidential, Texas
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
- Experience in cluster coordination using Zookeeper.
- Expertise with NoSQL databases likeHbase, Cassandra, DynamoDB (AWS) and MongoDB.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Worked on security components like Kerberos, ranger, sentry, hdfs encryption.
- Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- I was also involved in core components of HDP like Yarn and HDFS, which I was using to get architect platform.
- Complete end to end design and intergration of Apache NiFi.
- Involved in setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- I use Phoenix for Hive integration to join huge tables to other huge tables.
- With Phoenix Execute aggregate queries through server-side hooks (called co-processors).
- Installed and configured Hortonworks Distribution Platform (HDP 2.3) on Amazon EC2 instances.
- Used Python for instantiating multi-threaded application and running with other applications.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
- Created HD Insight cluster in Azure (Microsoft Specific tool) was part of the deployment and Component unit testing using Azure Emulator.
- Design the Elasticsearch configuration files based on number of hosts available, naming the cluster and node accordingly.
- Worked on troubleshooting for LDAP and SiteMinder issues with Support Teams for newer initiatives at organization level.
- Created interactive dashboards utilizing parameters, actions, and calculated fields utilizing Tableau Desktop.
- Provided support on Kerberos related issues and Coordinated Hadoop installations/upgrades and patch installations in the environment.
Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Azure, Hortonworks, Apache Phoenix, Ambari 2.0, Apache NiFi, Linux Cent OS, HBase, Splunk, MongoDB, Elasticsearch, Teradata, Puppet,Kerberos, Kafka, Cassandra, Linux/Unix, Python, Agile/scrum.
Hadoop Administrator
Confidential, MN
Responsibilities:
- Involved in capacity planning, with to the growing data size and the existing cluster size.
- Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL, and databases, Flume, Oozie and Sqoop.
- Experience in designing, implementing and maintaining of high performing Bigdata, Hadoop clusters and integrating them with existing infrastructure.
- Deployed the application and tested on Websphere Application Servers.
- Configured SSL for Ambari, Ranger, Hive and Knox.
- Experience in methodologies such as Agile, Scrum, and Test driven development.
- Creating principles for new users in the Kerberos and Implemented and maintained Kerberos cluster and integrated with the Active Directories (AD).
- Worked with data pipeline using Kafka and Storm to store data into Hdfs.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Involved in migrating java test framework to python flask.
- Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation.
- Monitoring and analysing MapReduce jobs and look out for any potential issues and address them.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Moving the data from Oracle, Teradata, MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Assisted in discussions of redesigning LDAP architecture for older environments.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally inMapReduce way.
- I use Phoenix Query Server to create a new ZooKeeper connection for each “client session”.
- With Phoenix Support for updatable view to extend primary key of base table.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Good knowledge in implementing Name Node Federation and High Availability of Name Node and HadoopCluster using Zookeeper and Quorum-Journal Manager.
- Good knowledge in adding security to the cluster using Kerberos and Sentry.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Hands-On experience in setting up ACL (Access Control Lists) to secure access to the HDFS file system.
- Analyze escalated incidences within the Azure SQL database.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Fine Tuned Hadoop cluster by setting proper number of map and reduced slots for the Task Trackers.
- Experience in tuning the heap size to avoid any disk spills and to avoid OOM issues.
- Familiar with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
- Experience managing users and permissions on the cluster, using different authentication methods.
- Involved in regular Hadoop Cluster maintenance such as updating system packages.
- Experience in managing and analysing Hadoop log files to look troubleshooting issues.
- Good knowledge in NoSQL databases, like HBase, MongoDB, etc.
- Working on Hadoop Hortonworks distribution which managed services viz. HDFS, MapReduce2
Environment: Hadoop, YARN, Hive, HBase, Flume, Hortonworks, Apache Phoenix, Kafka, Zookeeper, Oozie and Sqoop, MapReduce, Ambari, HDFS, Teradata Splunk, Elasticsearch, Jenkins, GitHub, Kerberos, MySQL, Apache NiFi, NoSQL, MongoDB, Java, Shell Script, Python, Linux/Unix.
Hadoop Administrator
Confidential, Greenville, South Carolina
Responsibilities:
- Responsible for architecting Hadoop cluster.
- Involved in source system analysis, data analysis, data modelling to ETL (Extract, Transform and Load) and HiveQL
- Strong Experience in Installation and configuration of Hadoop ecosystem like Yarn, HBase, Flume, Hive, Pig, Sqoop.
- Expertise in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Manage and review Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively with Sqoop for importing data.
- Designed a data warehouse using Hive.
- Created partitioned tables in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Extensively used Pig for data cleansing.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Worked on Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Build and published customized interactive reports and dashboards using Tableau server.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Worked on pulling the data from relational databases, Hive into the Hadoop cluster using the Sqoop import for visualization and analysis.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Environment: Hadoop, HDFS, Map-Reduce, hive, Hortonworks, pig, Kafka, Oozie, Zookeeper, Sqoop, Nagios, Cloudera Manager MySQL, NoSQL, MongoDB, Java, Linux/Unix.
Linux Administrator
Confidential
Responsibilities:
- Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Creating, cloning Linux Virtual Machines.
- Installing Red Hat Linux using kick start and applying security polices for hardening the server based on the company policies.
- RPM and YUM package installations, patch and other server management.
- Managing systems routine backup, scheduling jobs like disabling and enabling cronjobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
- Tech and non-tech refresh of Linux servers, which includes new hardware, OS, upgrade, application installation, testing.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, and user and group quota.
- Installing MySQLDB in Linux and Customize the MySQL DB parameters.
- Working with Service Now incident tool.
- Creating physical volumes, volume groups and logical volumes.
- Samba Server configuration with Samba Clients.
- Knowledge of IP tables, SELINUX.
- Modified existing Linux file systems to a Standard EXT3.
- Configuration and administration of NFS FTP, SAMBA, NIS.
- Maintenance of DNS, DHCP and APACHE services on Linux machines.
- Installing and configuring Apache and supporting them on Linux production servers.
Environment: Red-Hat Linux Enterprise servers, VERITAS Cluster Server 5.0, Windows 2003 server, Shell programming, Unix/Linux.