We provide IT Staff Augmentation Services!

Hadoop/kafka Admin Resume

2.00/5 (Submit Your Rating)

Sunnyvale, CA

SUMMARY:

  • Possesses 10+ years of comprehensive experience as a Hadoop, Big Data & Analytics environment.
  • Hands on Experience in Installing, configuring and using Hadoop Ecosystem components like HDFS, Hadoop MapReduce, Yarn, Zookeeper, Sqoop, Impala, Flume, Hive, Pig, HBase, Spark, Pig and Oozie.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera (CDH4/CDH5), Hortonworks and good knowledge on MapReduce distribution, IBM Big Insights and Amazon's EMR (Elastic MapReduce).
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in - memory computing capabilities written in Development and Testing of ETL methodologies in all the phases of the Data Warehousing.
  • Experience in administration activities of RDBMS databases, such as MS SQLServer.
  • Ability to quickly understand, learn and implement the new system design, data models in a professional work environment.
  • Worked on distributed frameworks such as Apache Spark and Presto in Amazon EMR, Redshift and interacted with data in other AWS data stores such as Amazon 53 and Amazon DynamoDB.
  • Experienced in installation, configuration and maintenance of Elastic Search cluster.
  • Experience with Horton works & Cloudera Manager Administration also experience in Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Apache, Cloudera, Horton works.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Working experience on designing and implementing complete end to end Hadoop Infrastructure.
  • Working with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin.
  • Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.

TECHNICAL SKILLS:

Hadoop stack: HDFS, MapReduce, YARN, Pig, Hive, HBase, Oozie, Sqoop, Spark, Flume, Zookeeper & Dremio

Kafka stack: Brokers, Zookeeper, Rest proxy, Schema Registry, Confluent control center, Kafka connect, Replicator

AWS Components: EC2, Simple Storage Service (S3), EBS, VPC, ELB, RDS, IAM, CloudWatch

Languages: Unix Shell Script, JavaScript, Python, Java, Pig, MySQL, HiveQL, CSS, JavaScript, HTML

Hadoop Management: Cloudera Manager, Apache Ambari, Ganglia, Nagios

Hadoop Distributions: Cloudera (CDH4, CDH5), Hortonworks (HDP 2.2 to HDP 2.4)

DevOps/Automation tools: Ansible, Terraform

Database: MySQL, PL/SQL, MS-SQL Server, Presto, Oracle 10g/11g

Security: Kerberos, Knox, Ranger

Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8/10

WORK EXPERIENCE:

Hadoop/Kafka Admin

Confidential - Sunnyvale, CA

Responsibilities:

  • Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK.
  • Involved in the analysis, design, and development and testing phases of Software

    Development Life Cycle (SDLC) using agile software development methodology.

  • Worked on Big Data infrastructure for batch processing and real time processing.

    Built scalable distributed data solutions using Hadoop.

  • Automation management with DevOps tools (Jenkins, Ansible, Docker, Kubernetes).
  • Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Developed code to write canonical model JSON records from numerous input sources to Kafka queues.
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks.
  • Ansible scripts for CI/CD, Deploy and Manage Elastic Search, Kibana, Beats Logstash.
  • Created and updated development documentation as required through changes in the code.
  • Created automation in a Jenkins Declarative Build Pipeline to update and tag git branches based on publicly promoted releases.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Created Data Pipelines as per the business requirements and scheduled it using

    Oozie Coordinators.

  • Involved in performing importing data from various sources to the Cassandra cluster using Sqoop.
  • Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Involved in running all the Hive scripts through Hive, Impala, Hive on Spark and some through Spark SQL.
  • Developed Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
  • Used Zookeeper to co-ordinate cluster services.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Worked with No-SQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
  • Developed and implemented scripts for automation of cluster builds.
  • Updated configurations, security patch all different environments to achieve and maintain standards.
  • Installed Oozie workflow engine to run multiple Hive Job.
  • Implemented firewalls for our different workflows.

Hadoop/Kafka Admin

Confidential - Dallas, TX

Responsibilities:

  • Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranging from POC (Proof-of-Concept) to PROD clusters.
  • Installation, Configuration, up gradation and administration of Windows, Sun

    Solaris, RedHat Linux and Solaris.

  • Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Horton works and

    Cloudera bundles.

  • Having experience in administering continuous integration (CI), delivery and build automation tool Jenkins.
  • Updated Spock unit-tests for Jenkins Declarative Build Pipeline.
  • Used Storm and Kafka Services to push data to HBase and Hive tables.
  • Expert in utilizing Kafka for messaging and publishing subscribe messaging system.
  • Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
  • Worked as admin on Cloudera (CDH 5.5.2) distribution for clusters ranges from

    POC to PROD.

  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Set up Hortonworks Infrastructure from configuring clusters to Node security using Kerberos.
  • Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR) Athena, and Snowflake.
  • Created and maintained various Shell and Python scripts for automating various processes.
  • Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.
  • Installing MySQL DB in Linux and Customize the MySQL DB parameters.
  • Installed Kafka cluster with separate nodes for brokers.
  • Installing and configuring Kafka and monitoring the cluster using Nagios and

    Ganglia.

  • Responsible for installation, configuration and management of Linux servers and

    POC Clusters in the VMware environment.

  • Configuring Apache and supporting them on Linux production servers.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with

    S3 bucket in AWS.

  • Experience in setting up Kafka cluster for publishing topics and familiar with lambda architecture.
  • Adding/installation of new components and removal of them through Cloudera

    Manager.

  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Level 2, 3 SME for current Big Data Clusters at the Client Site and set up standard troubleshooting technique.
  • Managed servers on the Amazon Web Services (AWS) platform instances using

    Loaded log data into HDFS using Flume, Kafka and performing ETL integrations.

  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Part of development team that developed a CI/CD on AWS which cut code releases by half.Environment: HDFS, Docker, Puppet, Map Reduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume Oozie, Sqoop, CDH5, Apache Hadoop 2.6, Spark, AWS, SOLR, Storm, Knox, Cloudera

    Developed and implemented platform architecture as per established standards.

  • Configured Name Node high availability and Name Node federation.
  • Installed Oozie workflow engine to run multiple Hive Job.

Hadoop / Kafka Administrator

Confidential -Houston, TX

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and

    Production Environments.

  • Worked on Capacity planning for the Production Cluster.
  • Installed HUE Browser.
  • Involved in loading data from UNIX file system to HDFS and creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Experience in MapR, Cloudera, & EMR Hadoop distributions.
  • Worked on Installation of HORTONWORKS 2.1 in AWS Linux Servers and Configuring

    Oozie Jobs.

  • Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.
  • Performed on cluster up gradation in Hadoop from HDP 2.1 to HDP 2.3.
  • Ability to Configuring queues in capacity scheduler and taking Snapshot backups for Hbase tables.
  • Worked on fixing the cluster issues and Configuring High Availability for Name

    Node in HDP 2.1.

  • Involved in Cluster Monitoring backup, restore and troubleshooting activities.
  • Involved in MapR to Hortonsworks migration.
  • Currently working as Hadoop administrator in MapR hadoop distribution for 5 clusters ranges from POC clusters to PROD clusters contains more than 1000 nodes.
  • Implemented manifest files in puppet for automated orchestration of Hadoop and

    Cassandra clusters.

  • Worked on installing cluster, commissioning & decommissioning of Data Nodes Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Managed and reviewed Hadoop log files.
  • Administration of Hbase, Hive, Sqoop, HDFS, and MapR.
  • Importing and exporting data from different databases like MySQL, RDBMS into

    HDFS and HBASE using Sqoop.

  • Worked on Configuring Kerberos Authentication in the cluster.
  • Experience in using Mapr File system, Ambari, Cloudera Manager for installation and management of Hadoop Cluster.
  • Very good experience with all the Hadoop eco systems in UNIX environment.
  • Experience with UNIX administration.
  • Worked on installing and configuring Solr 5.2.1 in Hadoop cluster.
  • Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
  • Worked on indexing the Hbase tables using and indexing the Json data and Nested data.
  • Hands on experience on installation and configuring the Spark and Impala.
  • Successfully installed and configured Queues in Capacity scheduler and Oozie scheduler.
  • Worked on configuring queues in and Performance Optimization for the Hive queries while Performing tuning in the Cluster level and adding the Users in the clusters.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Adding/installation of new components and removal of them through Ambari.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Monitored workload, job performance and capacity planning.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Inventing and deploying a corresponding Solr Cloud collection.
  • Creating collections and configurations, Register a Lily Hbase Indexer configuration with the Lily Hbase Indexer Service.
  • Creating and managing the Cron jobs.

Environment: Hadoop, Map Reduce, Yarn, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Impala, Spark Hortonworks, Flume, HBase, Zookeeper and Unix/Linux, Hue (Beeswax), AWS.

Hadoop/Linux Admin

Confidential

Responsibilities:

  • Involved in Cluster Monitoring backup, restore and troubleshooting activities.
  • Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Managed 100 + UNIX servers running RHEL, HPUX on Oracle HP.
  • Importing and exporting data from different databases like MySQL, RDBMS into

    HDFS and HBASE using Sqoop.

  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Worked on Configuring Kerberos Authentication in the cluster.
  • Very good experience with all the Hadoop eco systems in UNIX environment.
  • Experience with UNIX administration.
  • Worked on installing and configuring Solr 5.2.1 in Hadoop cluster.
  • Worked on indexing the Hbase tables using Solr and indexing the Json data and

    Nested data.

  • Provided 24x7 System Administration support for Red Hat Linux 3.x, 4.x servers and resolved trouble tickets on shift rotation basis.
  • Configured HP ProLiant, Dell Power edge, R series, and Cisco UCS and IBM p-series machines, for production, staging and test environments.
  • Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client

    3.5 and migrating servers between ESX hosts.

  • Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6 5.7 and 6.
Environment: IBM AIX, Red Hat Linux, IBM Netezza, SUN Solaris, SUSE Linux P-Seies servers, VMware, vSphere, VMs, OpsView, Nagios, TAD4D, ITM, INFO, TEM Pure Scale Pure Flex, PureApp.

We'd love your feedback!