Hadoop Admin Resume
SUMMARY:
- Over 7 years of experience including 4+ years of experience with Hadoop Ecosystem in installation and administrated of all UNIX/LINUX servers and configuration of different Hadoop eco - system components in the existing cluster project.
- Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions.
- Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
- Good knowledge on implementation and design of big data pipelines.
- Knowledge in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks, Cloudera and Map Reduce.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Excellent knowledge of NOSQL databases like HBase, Cassandra.
- Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Experience in job scheduling using different schedulers like FAIR, CAPACITY & FIFO and cluster co-ordination through DISTCP tool.
- Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Involved in implementing security on HDF and HDF Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambari, Ranger, and NiFi.
- Responsible for support of Hadoop Production environment which includes Hive, YARN, Spark, Impala, Kafka, SOLR, Oozie, Sentry, Encryption, HBase, etc.
- Migrating applications from existing systems like MySQL, Oracle, DB2 and Teradata to Hadoop.
- Good hands on experience on LINUX Administration and troubleshooting issues related to Network and OS level.
TECHNICAL SKILLS:
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), Hortonworks HDP (Ambari 2.6.5)
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Servers: Web logic server, WebSphere and JBoss, Web Applications Tomcat and Nginx.
Programming Languages/Scripting: Java, Pl SQL, Shell Script, Perl, Python.
Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.
Databases: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.
Processes: Systems Administration, Incident Management, Release Management, Change Management.
WORK EXPERIENCE:
Hadoop Admin
Confidential
Responsibilities:
- Maintenance of the cluster, addition and removal of nodes using cluster monitoring tools (Cloudera manager and ganglia), configuring Name Node high availability and keeping a track of all running jobs.
- Implementation, management and administration of the overall eco system.
- Execution of day to day to running of clusters including availability on a continuous basis.
- Implementation of capacity planning and estimation of requirements for lowering or increasing the capacity of cluster.
- Management and review of log files.
- Implementation of recovery and backup tasks.
- Implementation of resource and security management.
- Build new environments Dev/QA, new Hadoop cluster Cloudera 5.15.2
- Enable Kerberos and TLS security
- Create policies for RBAC (Role based access control list)
- Creating AD groups for respective groups.
- Build logical and physical diagrams for Dev/QA and provide hardware recommendation.
Hadoop Admin
Confidential, Chandler, AZ
Responsibilities:
- Deployed HDInsight cluster in Microsoft Azure ranging from Dev cluster to prod cluster.
- Implemented ESP (Enterprise security package) in HDInsight cluster.
- Involved in creating RBAP (Role based access policies)
- Created key-tabs for all the application specific service accounts.
- Involved in cluster tuning Hive, Tez, Yarn, and HDFS. Spark.
- Configured HIVE LLAP for better performance of the query.
- Worked closely with developer team to troubleshoot all the ongoing issue
- Worked on R-Studio connectivity to secured cluster HDP3.1 cluster
- Worked on Zeppelin with sparklvy and hive interpreters.
- Installed and configured Kerberos manager on client machine to connect secured Hadoop cluster.
- Installed ggplot2 libraries on all R-Nodes for analytics team to develop graphs in zeppelin.
- Involved in performance tuning of spark.livy2 interpreter for troubleshooting on going issues.
- Worked on SparkSQL and hive.execute.query.
Hadoop Admin
Confidential, Denver, CO
Responsibilities:
- Installed Hadoop cluster on Hortonworks distribution HDP2.6.
- Implemented LDAP integration with Ambari.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files.
- Working as Hadoop Administrator with Hortonworks of Hadoop.
- Planned and installed Hadoop clusters from Poc to Prod environment.
- Implemented HA for Namenode, YARN, Hive, and Spark.
- Big Data DEV & PROD cluster upgraded from HDP 2.3.x to HDP2.6.x
- Big Data DEV & PROD cluster upgraded from Ambari 2.1.x to 2.6.x.x
- Installed Elastic Search cluster using Ansible.
- Installation, configuration and administration experience in Big Data platform Hortonworks Ambari, Apache Hadoop on RedHat, and Centos as a data storage, retrieval, and processing systems
- Experience in designing, installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks.
- Created Ansible playbook for Elastic search configuration.
- Troubleshoot and debugs cluster problems on RHEL based Linux infrastructure.
- Configure YARN Capacity Scheduler based on infrastructure needs/YARN tuning
- Review Namenode WebUI for information concerning Datanode volume failures.
- Purge older log files.
- ACL's and audits to meet compliance specifications using Ranger.
- Installed Kafka Manager to manage and monitor Kafka cluster. (Monitor consumer lags)
Environment: Hortonworks, Hive, HDFS, Impala, and SPARK YARN, Kafka, SOLR, Oozie, Sentry, Encryption, Hbase, etc.
Hadoop Admin
Confidential, St. Louis, MO
Responsibilities:
- Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
- Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
- Deployed Name Node high availability for major production cluster.
- Installed Kafka cluster with separate nodes for brokers.
- Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
- Configured Oozie for workflow automation and coordination.
- Troubleshoot production level issues in the cluster and its functionality.
- Backup data on regular basis to a remote cluster using Distcp.
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
- Used Apache Kafka for importing real time network log data into HDFS.
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Performed Disk Space management to the users and groups in the cluster.
- Used Storm and Kafka Services to push data to HBase and Hive tables.
- Documented slides & Presentations on Confluence Page.
- Used Kafka to allow a single cluster to serve as the central data backbone for a large organization.
- Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
- Used Sqoop, Distcp utilities for data copying and for data migration.
- Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
- Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
- Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
- Monitor Hadoop cluster connectivity and security.
- Manage and review Hadoop log files.
- File system management and monitoring.
- Used Avro SerDe for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
Environment: Kafka, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper, Kerberos, Nagios, MapR.
Hadoop Admin
Confidential, Atlanta, GA
Responsibilities:
- Experience in managing scalable Hadoop cluster environments.
- Involved in managing, administering and monitoring clusters in Hadoop Infrastructure.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Experience in HDFS maintenance and administration.
- Managing nodes on Hadoop cluster connectivity and security.
- Experience in commissioning and decommissioning of nodes from cluster.
- Experience in Name Node HA implementation.
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud-based servers.
- Working with data delivery teams to setup new Hadoop users.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
- Configured Megastore for Hadoop ecosystem and management tools.
- Hands-on experience in Nagios and Ganglia monitoring tools.
- Experience in HDFS data storage and support for running Map Reduce jobs.
- Performing tuning and troubleshooting of MR jobs by analysing and reviewing Hadoop log files.
- Installing and configuring Hadoop eco system like Sqoop, Pig, Flume, and Hive.
- Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Experience in using distcp to migrate data between and across the clusters.
- Installed and configured Zookeeper.
- Hands on experience in analyzing Log files for Hadoop eco system services.
- Coordinate root cause analysis efforts to minimize future system issues.
- Troubleshooting of hardware issues and closely worked with various vendors for Hardware/OS and Hadoop issues.
Environment: Cloudera4.2, HDFS, Hive, Pig, Sqoop, HBase, Chef, Rhel, Mahout, Tableau, Micro strategy, Shell Scripting, Red Hat Linux.
Linux Systems Administrator
Confidential
Responsibilities:
- Installation, configuration and troubleshooting of Red hat 4.x, 5.x, Ubuntu 12.x and HP-UX 11.x on various hardware platforms
- Installed and configured JBoss application server on various Linux servers
- Configured Kick-start for RHEL (4, and 5), Jumpstart for Solaris and NIM for AIX to perform image installation through network
- Involved in configuration and troubleshooting of multipathing Solaris and bonding on Linux Servers
- Configured DNS Servers and Clients, and involved in troubleshooting DNS issues
- Worked with Red Hat Linux tools like RPM to install packages and patches for Red Hat Linux server
- Developed scripts for automating administration tasks like customizing user environment, and performance monitoring and tuning with nfsstat, netstat, iostat and vmstat
- Performed network and system troubleshooting activities under day to day responsibilities.
- Created Virtual server on VMware ESX/ESXi based host and installed operating system on Guest Servers.
- Installed and configured the RPM packages using the YUM Software manager.
- Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.
- Defining and Develop plan for Change, Problem & Incident management Process based on ITIL.
- Networking communication skills and protocols such as TCP/IP, Telnet, FTP, NDM, SSH, rlogin.
- Installed/Configured/Maintained/Administrated the network servers DNS, NIS, NFS, SENDMAIL and application servers Apache, JBoss on Linux.
- Extensive use of Logical Volume Manager (LVM), creating Volume Groups, Logical volumes and disk mirroring in HP-UX, AIX and Linux.
- Working with Clusters; adding multiple IP addresses to a Servers via virtual network interface in order to minimize network traffic (load-balancing and failover clusters)
- User and Security administration; addition of User into the System and Password management.
- System Monitoring and log management on UNIX and Linux Servers; including, crash and swap management, with password recovery and performance tuning.
Environment: Red hat, Ubuntu, JBoss, RPM, VMware, ESXi, LVM, DNS, NIS, HP-UX.