Apache Kafka

Overview

Apache Kafka is a distributed event streaming platform capable of handling high-throughput, low-latency data streams. Originally developed by LinkedIn and later open-sourced, Kafka is widely used for building real-time data pipelines and streaming applications.

Key Concepts

Topics

Definition: Topics are the categories to which records are sent.

Structure: A topic is divided into partitions for scalability and parallelism.

Replication: Each partition can be replicated across multiple nodes for fault tolerance.

Producers

Role: Send records to Kafka topics.

Mechanism: Can send data to a specific partition within a topic, allowing for ordered delivery within that partition.

Consumers

Role: Read records from Kafka topics.

Consumer Groups: Consumers can be part of a consumer group, enabling load balancing of data consumption.

Brokers

Definition: Kafka brokers are servers that store data and serve clients.

Cluster: A Kafka cluster is composed of multiple brokers working together.

Zookeeper

Role: Manages the Kafka cluster metadata and configurations.

Coordination: Helps in leader election for partitions and keeping track of brokers.

Models Supported

Important

Publish-Subscribe Model: Kafka supports a publish-subscribe messaging model where producers publish messages to topics, and multiple consumers can subscribe to those topics to receive the messages.

Point-to-Point Model: Through the use of consumer groups, Kafka can also support a point-to-point model where each message is processed by only one consumer within a group.

How It Works

Data Flow

Producers: Send messages to a Kafka topic.
Broker: Receives the message and stores it in the topic’s partition.
Consumers: Subscribe to the topic and read messages from the partitions.

Quality of Service

Supports three Quality Of Service

Diagram

Link to original

Use Cases

Real-time Data Processing: Processing streams of data in real time, such as log aggregation and monitoring.
Data Integration: Integrating data across various systems and applications.
Event Sourcing: Storing state changes as a sequence of events, enabling replay and reconstruction of system state.

References

Cloudflare’s Trillion-Message Kafka Infrastructure: A Deep Dive

🌐🌿

Recent Notes

Azure Point-to-Site VPN

Azure Site-to-Site VPN

Azure VNet-to-VNet

Azure VPN Gateway

Clean Architecture

Apache Kafka

Apache Kafka

Overview

Key Concepts

Models Supported

How It Works

Data Flow

Quality of Service

Diagram

Use Cases

References

Graph View

Table of Contents

Backlinks

🌐🌿

Recent Notes

Azure Point-to-Site VPN

Azure Site-to-Site VPN

Azure VNet-to-VNet

Azure VPN Gateway

Clean Architecture

Apache Kafka

Apache Kafka

Overview

Key Concepts

Models Supported

How It Works

Data Flow

Quality of Service

Diagram

Use Cases

Related Topics

References

Graph View

Table of Contents

Backlinks