We provide IT Staff Augmentation Services!

Hadoop / Kafka Admin Resume

2.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY

  • Around 7+ Years of experience in HADOOP/LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux 4, 5 and 6, familiar with Solaris 9 & 10 and IBM AIX 6 and Hadoop distributions including Cloudera, Mapr and Hortonworks
  • Excellent understanding of Hadoop v1.0 and v2.0 cluster architecture and Hadoop ecosystems (Hive, HBase, Flume, SQOOP, Hadoop API, HDFS, Map and Reduce).
  • Able to deploy Hadoop cluster, keep track of jobs, monitor critical parts of the cluster, take backups and configure name - node high availability.
  • Worked on Multi Clustered environment and setting up Cloudera and Hortonworks Hadoop echo-System.
  • Involved in maintaining Hadoop cluster in development and test environment
  • Performed upgrades, patches and bug fixes in HDP and CDH clusters.
  • Coordinated with technical teams for installation of Hadoop and third party related applications on systems.
  • Had good working experience on Hadoop architecture, HDFS, Map Reduce and other components in the Cloudera - Hadoop eco system.
  • Experience in loading data into HDFS using Sqoop, Flume (gathering logs from multiple system and inserting into HDFS).
  • Created and managed the database objects such as tables, indexes and views.
  • Experience in Building, installing, configuring, tuning and monitoring of a Cloudera distribution of Hadoop cluster with 150 nodes.
  • Experience in Administrating Hadoop PR cluster, Adding and removing data nodes, recovering name node, HDFS administration and troubleshooting Map reduce Job failures.
  • Experience in configuring and setting up various components of the Hadoop ecosystem using Pivotal manager.
  • Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions Extensive Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster
  • Installed Capacity Planner estimating the space, computer hardware, software and connection infrastructure resources that will be needed over some future period of time.
  • Performance tuning the Hadoop Cluster by gathering and analyzing the existing infrastructure
  • Having strong experience/expertise in different data-warehouse tools including ETL tools like Ab Initio, Informatica, etc. and BI tools like Cognos, Micro strategy, and Tableau
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa
  • Experience in using Mapr File system, Ambari, Cloudera Manager for installation and management of Hadoop Cluster
  • Data migration from existing Data stores to Hadoop using Sqoop.
  • Writing shell scripts to dump the Shared Data from MySQL servers to HDFS.
  • Good Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies.
  • Experience in managing and reviewing Hadoop log files and providing security for Hadoop Cluster with Kerberos
  • Hands on experience with opens source monitoring tools including; Nagios and Ganglia.
  • Good Knowledge on NoSQL databases such as Cassandra, Hbase and MongoDB.
  • Monitor and manage Linux servers (Hardware profiles, Resource usage, Service status etc.) Server backup and restore Server status reporting, Managing user accounts, password policies and files permission
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly
  • Experience in benchmarking, performing backup and recovery of Namenode metadata and data residing in the cluster.
  • Experience in Open stack cloud computing, virtualization, LINUX systems administration and configuration management.
  • Experience in Security Hardening in UNIX, LINUX and Windows servers.
  • Helped to establish standards, policies and procedures for all aspects of UNIX server environment (e.g., configuration, administration, documentation, etc.)
  • Experienced in Linux Administration tasks like IP Management (IP Addressing, Sub netting, Ethernet bonding and Static IP).
  • Manage and configure intermediary tools such as Exceed on Demand (EOD) client remote access solution for dependable, managed application access for X Window systems, including UNIX, LINUX, Oracle Solaris, AIX, and HP-UX.

TECHNICAL SKILLS

Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia

Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH.

Hadoop Distribution: Horton works Distribution Platform, Cloudera Distribution, Snowflake.

Programming Languages: C, Java, SQL, and PL/SQL.

Front End Technologies: HTML, XHTML, XML.

Application Servers: Apache Tomcat, Web Logic Server, Web sphere

Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.

NoSQL Databases: HBase, Cassandra, MongoDB

Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP/ Vista, Windows 7, Windows 8.

Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.

Security: Kerberos, Ranger, Knox, Falcon.

PROFESSIONAL EXPERIENCE

Hadoop / Kafka Admin

Confidential, San Jose, CA

Responsibilities:

  • Migration of legacy system into a cloud-native Kafka data streaming system on AWS platform (dockerized micro services on EC2 instances).
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
  • Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
  • Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying.
  • Install & Configurations Kafka Brokers / Zookeeper Cluster Environments.
  • Install & Configure Confluent Operator Helm Bundle on GCP.
  • Install & Configure Kafka and Zookeeper using Docker Containers.
  • Kafka File Management / Segments / storage issues and indexing files.
  • Kafka configurations validations / tuning the high through put parameters.
  • Monitoring the Partitions / replications / ISR / High water marks / Lag's.
  • Producer and Subscriber configurations / Consumer offset issues.
  • Kafka Connect with Sink/Source with MySQL / Elastic Search / syslog / etc.
  • Experience with open source Kafka distributions as well as enterprise Kafka products
  • Responsible for Kafka tuning, capacity planning, disaster recovery, replication, and troubleshooting.
  • Implementing Kafka security, limiting bandwidth usage, enforcing client quotas, backup, and restoration.
  • In-depth understanding of the internals of Kafka cluster management, Zookeeper, partitioning, schema registry, topic replication, and cross cluster mirroring (Mirror Maker).
  • Kafka Integration with Hadoop / Strom / Spark / Flume / Casandra.
  • Involved in building the data engineeringplatform on AWS for ingesting and aggregating and visualizing streaming real-time data from multiple sources.
  • Developed spark streaming jobs which streams the data from Kafka topics and performs transformations on the data.
  • Worked extensively on spark framework using Scalato perform ETL operations.
  • Involved in end to end development, testing and deployment of the spark jobs, doing performance tuning.
  • Worked on developing parsers using Scala API for parsing the data from different sources and data formats such as Byte code, JSON, CSV.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Worked extensively in optimizing and tuning the spark streaming applicationto have a real-time access to data.
  • Managed Amazon Web Services (AWS)- ELB, EC2, S3, EMR and Cloud Watch.
  • Worked on receiver approach, as well as direct stream approach for streaming real-time data from Kafka using Spark Streaming.
  • Installed Kafka manager for consumer lags and for monitoring Kafka metrics, also this has been used for adding topics, partitions etc.
  • Involved in multiple code improvements resulting in significantly less processing time for a single streaming batch., optimizing the performance of the pipeline.
  • Hands on experience on working with Amazon EMR framework transferring data to EC2 Server.
  • Worked on developing a parser for converting the Network data in byte code format to Json format using Scala API.
  • Developed automated scripts for provisioning of the clusters for Kafka, Zookeeper, Elastic Search.

Environment: Scala, Spark, Spark Streaming, Kafka, ElasticSearch, Zookeeper, Python, Java, Shell Scripting, AWS EMR.

Hadoop / Kafka Administrator

Confidential, Englewood, NJ

Responsibilities:

  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary Name Node, Resource Manager, Node Manager and Data Nodes. Done stress and performance testing, benchmark for the cluster.
  • Installing Patches and packages on Unix/Linux Servers. Worked with development in design and ongoing operation of several clusters utilizing Cloudera's Distribution including Apache Hadoop.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
  • Provided System support/maintenance for 24x7 for Customer Experience Business Services
  • Supported Data Analysts in running Map Reduce Programs.
  • Enterprise Container Services, and today using AWSFaregate.Implemented Micro Services framework with Spring Boot, NODE.JS and OpenShift containerization platform (OCP).
  • Managing the OpenShift cluster that includes scaling up and down the AWS app nodes.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
  • Responsible for scheduling jobs in Hadoop using FIFO, Fair scheduler and Capacity scheduler
  • Expertise in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Worked on a live Big Data Hadoop production environment with 220 nodes.
  • HA implementation of Name Node to avoid single point of failure.
  • Experience working on LDAP user accounts and configuring ldap on client machines.
  • Automated day to day activities using shell scripting and used Cloudera Manager to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Managed Openshift master, nodes with upgrades, decommission them from active participation by evacuating the nodes and upgrading them.
  • Involved in planning the Hadoop cluster infrastructure, resources capacity and build plan for Hadoop cluster installations.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
  • Installed and configured Ganglia Monitoring system to get metrics and monitoring the Hadoop cluster. Also configured Hadoop logs rotation and monitoring them frequently.
  • We do performance tuning of the Hadoop Cluster and map reduce jobs. Also the real-time applications with best practices to fix the design flaws.
  • Worked on Openshift platform in managing Docker containers and Kubernetes Clusters and Created Kubernetes clusters using ansible playbooks (launch-instan deploy-docker.yml, deploy-kubernetes.yml) on Exoscale.
  • Implemented Oozie work-flow for ETL Process for critical data feeds across the platform.
  • Configured Ethernet bonding for all Nodes to double the network bandwidth
  • Implementing Kerberos Security Authentication protocol for existing cluster.
  • Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to parquet files.
  • Worked closely with Business stake holders, BI analysts, developers, and SAS users to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.

Environment: Hadoop, Apache Pig, Hive, OOZIE, SQOOP, Spark, Hbase, Pig, LDAP, CDH5, Unravel, Splunk, Tomcat, and Java.

Hadoop Admin

Confidential - Dallas, TX

Responsibilities:

  • Installed, Configured and Managed Hadoop cluster.
  • Involved in implementing High-Availability and automatic failover of Hadoop Clusters.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Successfully secured the Kafka cluster with Kerberos
  • Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Involved in hardware recommendations, performance tuning and benchmarking.
  • Integrated Hadoop cluster with AD and enabled Kerberos for Authentication.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Helping developers on Solr Indexing.
  • Helping creating on collections.
  • Worked on POC to install hadoop cluster on AWS environment replace on-prem Hardware.
  • System/cluster configuration and health check-up.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Scheduling and Managing Oozie Jobs to automate sequence of rotational activity.
  • Worked with application teams to install operating system and Hadoop updates, patches, version upgrades as required.
  • Importing the data from the My Sql and Oracle into the HDFS using Sqoop

Environment: Hadoop, Horton works Ambari 2.6, Kafka, Spark, Map Reduce, Hive, HDFS, AWS, Ranger, Knox, Solr, PIG, Sqoop, Oozie, Flume, Zookeeper, NoSQL, Red hat Linux, Unix.

Hadoop / Kafka Admin

Confidential, Nashua, NH

Responsibilities:

  • Involved in Capacity Planning.
  • Installed and configured Hadoop cluster on lower and PROD environment.
  • Added data nodes and installed slave, client components.
  • Helping development tam to extract data using sqoop and import process.
  • Configured Apache NiFi on the existing Hadoop cluster.
  • Kerberos, Knox gateway setup and creating ranger policies to support Helix application.
  • Performance turning and tweaking memory settings.
  • Creating capacity queues to utilize the cluster resources for different teams.
  • Running smoke tests for every patch/release.
  • Monitoring Oozie jobs, Talend jobs and MR/TEZ jobs.
  • Installed multiple EDGE nodes on all environments to support multiple development teams.
  • Installed and configured HUE with LDAP on edge nodes.
  • Involved, working with development to design Hive table structure.
  • Working with Development to create databases and table structures.

Environment: Hadoop, HDFS, Map Reduce, HBase, Tez, Hive, Pig, Sqoop, Solr, HDP 2.6, HDFS, Sqoop, Nifi, Kafka.

Hadoop /Kafka Admin

Confidential - San Jose, CA

Responsibilities:

  • Created a local YUM repository for installing and updating packages
  • Good experience on cluster audit findings and tuning configuration parameters.
  • Implemented Kerberos security in all environments.
  • Defined file system layout and data set permissions.
  • Implemented Capacity Scheduler to share the resources of the cluster for the Map Reduce jobs given by the users.
  • Working with development teams to Import and export data into HDFS and Hive using Sqoop on Oracle database.
  • Involved in Performance testing of the Hadoop Production Cluster Using Terazen, Terasort, and Teravalidate.
  • Manage and review data backups.
  • We had GF1 and GF2 active and passive on production clusters.
  • Commissioning and Decommissioning Nodes on need basis.
  • Administered back end services and databases in the virtual environment.
  • Implemented system wide monitoring and alerts.
  • Worked with Hadoop developers in troubleshooting Map Reduce job failures and issues and helping them to resolve.
  • Evaluate and propose new tools and technologies to meet the needs of the organization.
  • Production support responsibilities include cluster maintenance.
  • Development of splunk Queries to generate the Report
  • Dashboard Creation in splunk, running SPL Queries
  • Various Metrics Creation in splunk

Environment: Cloudera CDH 5.x, HDFS, Map Reduce, Hive, Isilon 7.x/8.x, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Apache Hadoop 2.6, Sentry, Kerberos, Impala, Solr, Cloudera Manager, Red hat, Linux, MySQL and Oracle.

Linux Hadoop Admin

Confidential - Mooresville, NC

Responsibilities:

  • Experience working in Linux based environment such as Red Hat, Centos.
  • Product development and administration of managed services, Dedicated Linux servers, And webhosting platforms.
  • Performed software installation, Upgrades/patches, Performance tuning and troubleshooting of all the Linux servers in the environment.
  • Performed vulnerability testing of all the Linux servers and providing right solutions.
  • Experienced in Red Hat Linux package administration using YUM.
  • Linux Installation and configuration from scratch and regular monitoring.
  • Experienced in Package management using RPM, YUM and up to date in Red Hat
  • Linux.
  • Performed automated installations of Operating Systems using kickstart for Linux and remote monitoring and management of server hardware.
  • Manage software licenses, monitor network performance and application usage, and make software purchases.
  • Help maintain and troubleshoot UNIX and Linux environment.
  • Setup, Configured, and maintained UNIX and Linux servers.
  • Experience in System Builds, Server builds Installs, Upgrades, Troubleshooting,
  • Security, Backup, Disaster Recovery, Performance Monitoring and Fine tuning.
  • Monitored virtual memory performance, Swap space, Disk and CPU utilization.
  • Managed Database workload batches with automated shell scripts and Cron utility schedules.
  • Created and managed users and groups for permissions.

Environment: Red Hat Linux (RHEL), Cluster Server, VMWare, Global File System, Red hat Cluster Servers.

We'd love your feedback!