We provide IT Staff Augmentation Services!

Kafka Engineer Resume

SUMMARY

  • 9+ years of professional IT experience which includes Java Application Development, Database Management & on Big Data technologies and Hadoop.
  • 3+ years of experience in Big Data Analytics using various Hadoop eco - system tools and Spark Framework.

PROFESSIONAL EXPERIENCE

Confidential

Kafka Engineer

Responsibilities:

  • Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Kafka cluster for data acquisition.
  • Integrated Flume with Kafka, and Worked on monitoring and troubleshooting the Kafka-Flume-HDFS data pipeline for real-time data ingestion in HDFS
  • Implemented near real time data pipeline using framework based on Kafka, Spark.
  • Use Kafka a publish-subscribe messaging system by creating topics using consumers and producers to ingest data into the application for Spark to process the data and create Kafka topics for application and system logs.
  • Design/Implement large scale pub-sub message queues using Apache Kafka
  • Worked on Configuring Zookeeper, Kafka and logstash cluster for data ingestion and Elasticsearch performance and optimization and Worked on Kafka for live streaming of data.
  • Setup/Optimise ELK {Elasticsearch, Logstash, Kibana} Stack and Integrated Apache Kafka for data ingestion
  • Develop Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
  • Implement Spark Kafka streaming to pick up the data from Kafka and send to Spark pipeline.
  • Implement NiFi to Spark streaming directly withtout using Kafka internally to provide various options to client in single Confidential.
  • Developed real-time streaming applications integrated with Kafka and Nifi to handle large volume and velocity data streams in a scalable, reliable and fault tolerant manner for Confidential Campaign management analytics.
  • Designed and implemented big data ingestion pipelines to ingest multi TB data from various datasource using Kafka, Spark streaming including data quality checks, transformation, and stored as efficient storage formats Performing data wrangling on Multi-Terabyte datasets from various data sources for a variety of downstream purposes such as analytics using PySpark.
  • Designed solutions to process high volume data stream ingestion, processing and low latency data provisioning using Hadoop Ecosystems Hive, Pig, Scoop and Kafka, Python, Spark, Scala, NoSql, Nifi, Druid
  • Created Kafka producer API to send live-stream data into various Kafka topics.
  • Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
  • Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
  • Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Experience with batch processing of data sources using Apache Spark.
  • Develop predictive analytic using Apache Spark Scala APIs.
  • Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
  • Develop Kafka producer and consumers, HBase clients, Spark jobs using scala API’s along with components on HDFS, Hive.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Responsible for managing existing data extraction jobs, but also play a vital role in building new data pipelines from various structured and unstructured sources into Hadoop. work on a product team using Agile Scrum methodology to design, develop, deploy and support solutions that leverage the Client big data platform.
  • Integrated Apache Spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, HBase and Hive by integrating with Spark.
  • Design and code from specifications, analyzes, evaluates, tests, debugs, documents, and implements complex software apps.
  • Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Implemented Cloudera Manager on existing cluster.
  • Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x.
  • Troubleshooting experience in debugging and fixed the wrong data or data missing problem for both Oracle Database and Teradata.

Confidential - Austin, TX

Kafka Admin

Responsibilities:

  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
  • Hands-on experience in creating a backup & mirroring of Kafka Cluster brokers, broker sizing, topic sizing, h/w sizing, performance monitoring, broker security, topic security, consumer/producer access management (ACL).
  • Worked on installing new relic kafka plugin implementation for monitoring of kafka cluster.
  • Hands on experience in resolving incident tickets related to Hadoop components like Hbase, Yarn, Hive, Kafka, and identifying root cause analysis.
  • Automated deployments on AWS using GitHub and Jenkins.
  • Set up the CI/CD pipelines using Jenkins, Maven, GitHub and AWS.
  • Worked on High availability cluster setup, maintenance, and ongoing support
  • Experience in open-source Kafka, zookeepers, Kafka connects.
  • Used GitHub for control version and Jira for issues and project tracking.
  • Developed Junit test cases for unit testing and integration testing.
  • Worked on required enhancements on the application as stories using agile software development methodology by running time-boxed iterations.
  • Taken additional role as Scrum Master and facilitated daily scrums, discovery sessions, planning and estimation sessions, showcases, retrospective sessions.
  • Developing the DAO layer to persist the data into Cassandra DB
  • Created shell scripts and PL/SQL scripts that were executed daily to refresh data feeds from multiple systems.
  • Responsible for Ingesting Mobile & web survey/feedback data in to the Big Data / AI Platform composed of SMACK framework (Spark, Mesosphere, Akka, Cassandra, Kafka), Druid, Grafana on DCOS platform at AWS Cloud.
  • Installed and deployed Kafka, Zookeeper, ELK, Grafana, Prometheus and Datadog using Ansible playbooks.
  • Reviewed Kafka cluster configurations and provided best practices to get peak performance.
  • Configured alerting rules and set up pagerduty alerting for Kafka, Zookeeper, Druid, Cassandra, Spark and different microservices in grafana.
  • Upgraded Zookeeper clusters across all environments from version 3.4.6 to version 3.4.10.
  • Upgraded multiple Kafka clusters across multiple environments from 0.10.0.1 to 0.11.0.2 and then again to the latest 1.1.1 version without any data loss and zero downtime.
  • Improved the performance of the Kafka cluster by fine tuning the Kafka Configurations at producer, consumer and broker level.
  • Deployed Kafka manager for getting better insights into our Kafka clusters.
  • Fixed Kafka and Zookeeper related Production issues across multiple clusters.
  • Installed, configured and maintained replication tools like uReplicator and Mirror Maker to support High Availability (HA) of our Kafka Clusters.
  • Installed and configured filebeat on Kafka, Druid and Cassandra EC2 instances to ship logs to an Elastic Search index.
  • Configured and created Kibana dashboards for ERROR Logs to get better insights into our big data pipeline.
  • Installed Curator for periodic cleanup of old logs from ES Index.
  • Developed a python script to auto scale Kafka consumer micro services running on DCOS if the consumer lag is higher than the configured threshold.
  • Developed a python script to auto scale marathon apps if resource limits (CPU, MEM) are higher than configured thresholds.
  • Upgraded Grafana and Prometheus to their latest versions.
  • Deployed superset in DCOS as a marathon app using Docker to get dashboards and visualizations on top of Druid.
  • Helped in configuring Kafka producer and consumer microservices to stream the data to and from Kafka topics.
  • Worked on managing marathon apps running on DCOS.
  • Took care of monitoring and alerting for Cassandra, druid and Kafka in our pipeline using Grafana, Data dog and Prometheus.

Hire Now