We provide IT Staff Augmentation Services!

Hadoop Kafka Administrator Resume

Charlotte North, CarolinA

SUMMARY:

  • Over 9 years of experience in software Admin and development, 4+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
  • In - depth knowledge of Hadoop Eco system - HDFS, Yarn, MapReduce, Hive, Hue, Sqoop, Flume, Kafka, Spark, Oozie, NiFi and Cassandra.
  • Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
  • Expertise on setting up Hadoop security, data encryption and authorization using Kerberos, TLS/SSL and Apache Sentry respectively.
  • Extensive hands on administration with Hortonworks.
  • Practical knowledge on functionalities of every Hadoop daemon, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
  • Designed and provisioned Virtual Network AWS using VPC, Subnets, Network ACLs, Internet Gateway, Route Tables, NAT Gateways
  • Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
  • Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
  • Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experience in performing backup and disaster recovery of Name node metadata and important sensitive data residing on cluster.
  • Architected and implemented automated server provisioning using puppet.
  • Experience in performing minor and major upgrades.
  • Experience in performing commissioning and decommissioning of data nodes on Hadoop cluster.
  • Strong knowledge in configuring Name Node High Availability and Name Node Federation.
  • Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, Sqoop automation.

TECHNICAL SKILLS:

Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Impala, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), Hortonworks (HDP)

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

Processes: Incident Management, Release Management, Change Management

PROFESSIONAL EXPERIENCE:

Hadoop Kafka Administrator

Confidential, Charlotte, North Carolina

Responsibilities:

  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks & Cloudera Hadoop Distribution.
  • Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
  • Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
  • Setting up and configuring Kafka Environment in Windows from the scratch and monitoring it.
  • Created a data pipeline through Kafka Connecting two different clients Applications.
  • Worked on setting up 3 Instances in UAT/STAGING environment and 5 Instances in Production environment.
  • Responsible for building components to connect to other micro-services
  • Using Kafka, Elastic search, REST. Developed plugins to de-serialize data in non-native kafka
  • Environments. Developing Machine learning algorithms for internal search engine.
  • Maintaining and troubleshooting network connectivity
  • Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
  • Monitoring using the ELK Stack i.e Elastic Search, Logstash and Kibana.
  • Knowledge about working with On-Premise servers as well as Cloud Services Based Servers.
  • Hands-on experience in standing up and administrating on-premise Kafka platform.
  • Creating a backup for all the instances in Kafka Environment.
  • Experience managing Kafka clusters both on Windows and Linux environment.
  • Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns provider. Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4.
  • RabbitMQ is a “traditional” message broker that implements variety of messaging protocols. It was one of the first open source message brokers to achieve a reasonable level of features, client libraries, dev tools, and quality documentation. RabbitMQ was originally developed to implement AMQP, an open wire protocol for messaging with powerful routing features. While Java has messaging standards like JMS, it’s not helpful for non-Java applications that need distributed messaging which is severely limiting to any integration scenario, microservice or monolithic. With the advent of AMQP, cross-language flexibility became real for open source message brokers.Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Knowledge of Kafka API.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Exposure and Knowledge of managing streaming platform on cloud provider (Azure, AWS & EMC)
  • Efficiently Worked with all of the following tools/Instances but not limited to including: Kafka, Zookeeper, Console Producer, Console Consumer, Kafka Tool, File Beat, Metric Beat, Elastic Search, Logstash, Kibana, Spring Tool Suite, Apache Tomcat Server etc.
  • Operations - Worked on Enabling JMX metrics.
  • Operations - Involved with data cleanup for JSON and XML responses that were generated.
  • Successfully secured the Kafka cluster with Kerberos Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
  • Integrated Apache Kafka for data ingestion
  • Successfully Generated consumer group lags from kafka using their API Kafka- Used for building real-time data pipelines between clusters.
  • Created POC for multiple use cases related to CBRE’s Homebuilt Application SEQUENTRA and client LEASE ACCELERATOR
  • Complete knowledge regarding Elasticsearch, Logstash and Kibana.
  • Installed Hadoop cluster and worked with big data analysis tools including hive
  • Created and wrote shell scripts (kasha, Bash), Ruby, Python and PowerShell for setting up baselines, branching, merging, and automation processes across the environments using SCM tools like GIT, Subversion (SVN), Stash and TFS on Linux and windows platforms.
  • Design, build and manage the ELK (ElasticSearch, Logstash Kibana) cluster forcentralized logging and search functionalities for the App.Responsible to designing and deploying new ELK clusters (Elasticsearch, logstash, Kibana,beats, Kafka, zookeeper etc.Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL's into it
  • Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
  • Installed Ranger in all environments for Second Level of security in Kafka Broker.
  • Involved in Data Ingestion Process to Production cluster.
  • Worked on Oozie Job Scheduler
  • Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format (Receiving gzip data compression Data and produce Avro Data into HDFS files).
  • Installed Docker for utilizing ELK, Influxdb, and Kerberos.
  • Involved in defining test automation strategy and test scenarios, created automated test cases, test plans and executed tests using Selenium WebDriver and JAVA. Architected Selenium framework which has integrations for API automation, database automation and mobile automation.
  • Executed and maintained Selenium test automation scriptb
  • Created Database on InfluxDB also worked on Interface, created for Kafka also checked the measurements on Databases
  • Created a Bash Scripting with Awk formatted text to send metrics to InfluxDB.
  • Enabled influxDB and Configured Influx database source into Grafana interface
  • Succeeded in deploying of ElasticSearch 5.3.0, Influx DB 1.2 on the Prod machine in a Docker container.
  • Created a Cron Job those will execute a program that will start the ingestion process. The Data is read in, converted to Avro, and written to the HDFS files
  • Successfully Upgraded HDP 2.5 to 2.6 in all environment Software patches and upgrades.
  • Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage.
  • Deployed Data lake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Installed the Apache Kafka cluster and Confluent Kafka open source in different environments.
  • Basically, one can install kafka open source or confluent version on windows and Linux/Unix systems.
  • Implemented real time log analytics pipeline using Confluent Kafka, storm, elastic search Logstash kibana, and greenplum.
  • We need to install jdk 1.8 or later and make accessible to the entire box.
  • 3Download the Apache kafka opensource and Apache zookeeper and start configuring in the box where we want to run the cluster. nce both kafka and zookeeper up and running, we will be able to create the topics. Later we can produce and consume the data. To make it secure, plugin the security configuration with SSL encryption, SASL Authentication and ACLs.
  • Finally, creating the backup, adding clients, corgis, patch up and monitoring.
  • Intial design we can start with single node or three node cluster and start adding the nodes wherever requires.
  • The required features are CPU core:24, RAM memory:32/64 GB and Main Memory:500GB(least case) to 2 TB.
  • Basically usuage is for functional flow of data in parallel processing and distribute streaming platform.
  • Kafka replaces the traditional pub-sub model with ease, fault-tolerant, high thorughtput and low latency.
  • Installed and developed different POC's for different application/infrastructure teams both in Apache Kafka and Confluent open source for multiple clients.
  • Installing, monitoring and maintenance of the clusters in all environments.
  • Installed single node-single broker and multi-node multi broker clusters and encrypted with SSL/TLS, authenticate with SASL/PLAINTEXT, SASL/SCRAM and SASL/GSSAPI (Kerberos).
  • Integrated topic-level security and the cluster is full up and running for 24/7.
  • Installed Confluent Enterprise in Docker and kubernetes in a 18-node cluster.
  • Installed Confluent Kafka, applied security to it and monitoring with Confluent control center.
  • Involved in clustering with Cloudera and Hortonworks and not exposing zookeeper, provided the cluster to end user using the Kafka-connect to communicate.
  • Setup redundancy to the cluster and using the monitoring tools like yahoo-Kafka manager and setup performance tuning to get the data in real time approach without any latency.
  • Supported and worked for the Docker team to install Apache Kafka cluster in multimode and enabled security in the DEV environment.
  • Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
  • Installed Kafka manager for consumer lags and for monitoring Kafka metrics also this has been used for adding topics, Partitions etc.
  • Successfully Generated consumer group lags from Kafka using their API
  • Successfully did set up a no authentication Kafka listener in parallel with Kerberos (SASL) Listener. In addition, I tested non-authenticated user (Anonymous user) in parallel with Kerberos user.
  • Installed Ranger in all environments for Second Level of security in Kafka Broker.
  • Involved in Data Ingestion Process to Production cluster.
  • Installed Docker for utilizing ELK, Influxdb, and Kerberos.
  • Installed Confluent Kafka open source and enterprise edition on Kubernetes using the helm charts of 10-node cluster and applied security SASL/PLAIN and SASL/SCRAM and pointed the cluster for outside access.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Successfully secured the Kafka cluster with SASL/PLAINTEXT, SASL/SCRAM and SASL/GSSAPI (Kerberos).
  • Implemented Kafka Security Features using SSL and without Kerberos. Further, with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users login.
  • Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
  • Worked on SNMP Trap Issues in Production Cluster. Worked on heap optimization and changed some of the configurations for hardware optimization.
  • Involved working in Production Ambari Views.
  • Implemented Rack Awareness in Production Environment.
  • Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fx for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
  • Worked on Nagios Monitoring tool.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Involved with Hortonworks Support team on Grafana consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP)
  • Successfully Generated consumer group lags from Kafka using their API

Hadoop Administrator

Confidential, San Francisco, CA

Responsibilities:

  • Installed and Configured Hortonworks Data Platform (HDP) and Apache Ambari.
  • Hadoop installation, Configuration of multiple nodes using Cloudera platform.
  • Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director, Cloudera Manager.
  • Installed and Configured Hadoop monitoring and administrating tools like Cloudera Manager, Nagios and Ganglia.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Use of Sqoop to Import and export data from HDFS to RDMS vice-versa.
  • Done stress and performance testing, benchmark for the cluster.
  • Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
  • Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
  • To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
  • Cluster Administration, releases and upgrades Managed multiple Hadoop clusters with the highest capacity of 7 PB (400+ nodes) with PAM Enabled Worked on Hortonworks Distribution.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Maintained, audited and built new clusters for testing purposes using the AMBARI, HORTONWORKS.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed and Configured Hadoop Ecosystem (MapReduce, Pig, and Sqoop. Hive, Kafka) both manually and using Ambari Server. scheduler
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Worked on tuning the performance Pig queries.
  • Converted ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
  • Implemented best income logic using Pig scripts and UDFs
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
  • Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop (HDFS).
  • Responsible for adding new eco system components, like spark, storm, flume, Knox with required custom configurations based on the requirements
  • Installed and configured Kafka Cluster.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Helped the team to increase cluster size. The configuration for additional data nodes was managed using Puppet manifests.
  • Strong knowledge of open source system monitoring and event handling tools like Nagios and Ganglia.
  • Integrated BI and Analytical tools like Tableau, Business Objects, and SAS etc. with Hadoop Cluster.
  • Planning and implementation of data migration from existing staging to production cluster. Even migrated data from existing databases to cloud (S3 and AWS RDS).
  • Component unit testing using Azure Emulator.
  • Analyze escalated incidences within the Azure SQL database. Implemented test scripts to support test driven development and continuous integration.
  • Installed and configured Apache Ranger and Apache Knox for securing HDFS, HIVE and HBASE.
  • Developed Python, Shell/Perl Scripts and Power shell for automation purpose.
  • Streamlined the process to support sprint based releases to production and improved current state of release management using Git & Jenkins.
  • Migrated an existing legacy infrastructure and recreated the entire environment within AWS.

Hadoop Admin

Confidential, Oak Book, IL

Responsibilities:

  • Installed and Configured Hadoop monitoring and administrating tools like Cloudera Manager, Nagios and Ganglia.
  • Cluster maintenance, Monitoring, Troubleshooting, Manage and review data backups, Manage & review log file Using Hortonworks and MapR.
  • Implemented and configured High Availability Hadoop Cluster using Hortonworks Distribution and MapR.
  • Experience working on Hadoop components like HDFS, YARN, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Storm, Flume, Ambari Infra, Ambari Metrics, Kafka.
  • Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
  • Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow
  • Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Deployed Network file system for Name Node Metadata backup.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Back up of data from active cluster to a backup cluster using distcp.
  • Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings
  • Close monitoring and analysis of the Map Reduce job executions on cluster Confidential task level.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Implemented an instance of Zookeeper for Kafka Brokers.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Worked in Kerberos, Active Directory/LDAP, Unix based File System.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Performed both major and minor upgrades to the existing cluster and rolling back to the previous version.
  • Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Performance tuning of Jobs when Yarn jobs are slow, Tez job is slow, Slow data loading.
  • Managing the alerts on the Ambari page and take corrective and preventive actions.
  • HDFS Disk space management, Generate HDFS Disc Utilization report for Capacity planning.

Confidential, Chicago, IL

Hadoop Admin

Responsibilities:

  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
  • Monitored the servers and Linux scripts regularly and performed troubleshooting steps tested and installed the latest software on server for end-users. Responsible for Patching Linux Servers and applied patches to cluster. Responsible for building scalable distributed data solutions using Hadoop.
  • Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability. Also done major and minor upgrades to the Hadoop cluster.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
  • Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
  • Experience in scripting languages python, Perl or shell script also.
  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks & Cloudera Hadoop Distribution.
  • Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
  • Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.

Confidential

Linux/Unix Administrator

Responsibilities:

  • Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
  • Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux
  • Performed administration and monitored job processes using associated commands
  • Manages systems routine backup, scheduling jobs and enabling cron jobs
  • Maintaining and troubleshooting network connectivity
  • Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
  • Configures DNS, NFS, FTP, remote access, and security management, Server hardening
  • Installs, upgrades and manages packages via RPM and YUM package management
  • Logical Volume Management maintenance
  • Experience administering, installing, configuring and maintaining Linux
  • Creates Linux Virtual Machines using VMware Virtual Center
  • Administers VMware Infrastructure Client 3.5 and vSphere 4.1
  • Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
  • Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
  • Supporting infrastructure environment comprising of RHEL and Solaris.
  • Installation, Configuration, and OS upgrades on RHEL 5.X/6.X/7.X, SUSE 11.X, 12.X.
  • Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
  • Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
  • Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.

Linux Admin

Confidential

Responsibilities:

  • Gained knowledge on troubleshooting and problem solving skills, including application and network- level troubleshooting ability.
  • Gained knowledge and experience on writing shell scripts to automate the tasks.
  • Identifying and triaging outages monitor and remediate systems and network performance.
  • Developing tools to automate the deployment, administration, and monitoring of a large-scale Linux environment.
  • Performing server tuning, operating system upgrades.
  • Participating in the planning phase for system requirements on various projects for deployment of business functions.
  • Participating in 24x7 on-call rotation and maintenance windows.
  • Communication & coordination with internal / external groups and operations.

Hire Now