Kafka Series [01]: Core Components And How They Work

Kafka Clusters

A system contains multiple Kafka brokers that work together to ensure fault tolerance, scalability, and high availability.

There can be one or more brokers in the clusters

Single Broker Cluster: The Kafka cluster has only one broker
Multi-Broker Cluster: The Kafa cluster has two ore more brokers

Kafka Brokers

A Kafka server that handles data storage and client read/write requests and manages replication for reliability.

Kafka Topics & Kafka Partitions

A topic is a logical abstraction, while the actual data is physically stored in partitions as append-only log files on disk.

Each partition:

Is an append-only log file stored on disk
Maintains strict ordering of messages
Is distributed across brokers

Producer

A Client application that publishes messages to a Kafka topic.

The sending process consists of four main steps:

Step 1: Create a ProducerRecord, which must include a topic and a value. The partition and key are optional.
Step 2: Serialize the record into byte arrays before sending it over the network.
Step 3: Determine the target partition.
- If partition is explicitly specified, it is used directly.
- If a key is provided, the partition is selected based on a hash of the key.
- Otherwise, Kafka uses a sticky partitioning strategy to optimize batching.
Step 4: The record is added to an internal buffer and grouped into batches before being sent to the broker. The leader broker writes the data to its log, replicates it to followers if required, and returns metadata (such as offset and partition) to the producer.

Consumer

A client application that receives messages from a Kafka topic.

The receiving process consists of five main steps:

Step 1: The consumer subscribes to one or more topics and becomes part of a consumer group.
Step 2: Once connected, Kafka assigns partitions to each consumer within the group.
Each partition is consumed by only one consumer at a time within the same group.
Step 3: The consumer actively pulls data by calling consumer.poll().
Step 4: The consumer processes each message based on business logic.
Step 5: After processing, the consumer commits offsets to mark messages as consumed.

ZooKeeper

Manages and coordinates Kafka brokers, handling configuration, metadata, and leader election and cluster state.

Starting from Kafka 3.3, KRaft replaces Zookeeper by embedding the metadata quorum directly inside Kafka using the Raft consensus protocol

Offsets

Unique IDs for each message in a partition, used by consumers to track read progress.

Initially, the offset pointer points to the first message. As soon as the consumer reads that message, the offset pointer moves to the next message in the sequence

Summary

In Apache Kafka, core components such as brokers, topics, partitions, producers, and consumers work together to form a scalable and fault-tolerant streaming system.

By organizing data into partitioned logs and coordinating producers and consumers through consumer groups and offsets, Kafka enables reliable, ordered, and high-throughput data processing.

Producer

A Client application that publishes messages to a Kafka topic.

The sending process consists of four main steps:

Step 1: Create a ProducerRecord, which must include a topic and a value. The partition and key are optional.

Step 2: Serialize the record into byte arrays before sending it over the network.

Step 3: Determine the target partition.

If partition is explicitly specified, it is used directly.
If a key is provided, the partition is selected based on a hash of the key.
Otherwise, Kafka uses a sticky partitioning strategy to optimize batching.

Step 4: The record is added to an internal buffer and grouped into batches before being sent to the broker. The leader broker writes the data to its log, replicates it to followers if required, and returns metadata (such as offset and partition) to the producer.

Consumer

A client application that receives messages from a Kafka topic.

The receiving process consists of five main steps:

Step 1: The consumer subscribes to one or more topics and becomes part of a consumer group.

Step 2: Once connected, Kafka assigns partitions to each consumer within the group.
Each partition is consumed by only one consumer at a time within the same group.

Step 3: The consumer actively pulls data by calling consumer.poll().

Step 4: The consumer processes each message based on business logic.

Step 5: After processing, the consumer commits offsets to mark messages as consumed.

Summary

In Apache Kafka, core components such as brokers, topics, partitions, producers, and consumers work together to form a scalable and fault-tolerant streaming system.

By organizing data into partitioned logs and coordinating producers and consumers through consumer groups and offsets, Kafka enables reliable, ordered, and high-throughput data processing.