Design Kafka

Design a distributed event streaming platform like Apache Kafka that provides durable, high-throughput, fault-tolerant message streaming. Producers publish messages to partitioned topics stored as append-only commit logs; consumers in groups read messages in order and track progress via offsets; partitions are replicated across brokers with automatic leader election; the system supports exactly-once semantics, log compaction, and scales to millions of messages per second.

Scale Estimates

Metric	Value
Messages per second (large cluster)	10+ million
Data throughput (single broker)	600 MB/s write, 1+ GB/s read
Brokers per cluster	10–1,000+
Topics per cluster	10,000+
Partitions per cluster	100,000+ (millions with KRaft)
Replication factor (typical)	3
Message retention	7 days (configurable; tiered storage for infinite)
Consumer groups	Thousands per cluster
Average message size	1 KB (range: 100 bytes – 1 MB)
End-to-end latency (p99)	< 10ms (acks=1), < 50ms (acks=all)

Non-Functional Requirements

Throughput: Millions of messages/second per cluster; achieved via sequential disk I/O, OS page cache, zero-copy (sendfile), batching, and compression
Durability: Messages persisted to disk immediately on produce; replication factor ≥ 3 ensures data survives broker failures; acks=all + min.insync.replicas=2 for strongest guarantee
Ordering: Strict FIFO within a partition (messages with same key → same partition → ordered); no cross-partition ordering; partition is the unit of ordered delivery
Fault tolerance: Automatic leader election from ISR on broker failure; consumer group rebalance on consumer failure; no single point of failure (KRaft replaces ZooKeeper)
Exactly-once: Idempotent producers (PID + sequence dedup) prevent retry duplicates; transactions enable atomic read-process-write across topics; critical for stream processing
Replay: Consumers can seek to any offset within the retention window; enables reprocessing, backfilling, and debugging; the log is the source of truth

Scale Estimates

Metric

Value

Messages per second (large cluster)

10+ million

Data throughput (single broker)

600 MB/s write, 1+ GB/s read

Brokers per cluster

10–1,000+

Topics per cluster

10,000+

Partitions per cluster

100,000+ (millions with KRaft)

Replication factor (typical)

Message retention

7 days (configurable; tiered storage for infinite)

Consumer groups

Thousands per cluster

Average message size

1 KB (range: 100 bytes – 1 MB)

End-to-end latency (p99)

< 10ms (acks=1), < 50ms (acks=all)

Non-Functional Requirements

Throughput: Millions of messages/second per cluster; achieved via sequential disk I/O, OS page cache, zero-copy (sendfile), batching, and compression

Durability: Messages persisted to disk immediately on produce; replication factor ≥ 3 ensures data survives broker failures; acks=all + min.insync.replicas=2 for strongest guarantee

Ordering: Strict FIFO within a partition (messages with same key → same partition → ordered); no cross-partition ordering; partition is the unit of ordered delivery

Fault tolerance: Automatic leader election from ISR on broker failure; consumer group rebalance on consumer failure; no single point of failure (KRaft replaces ZooKeeper)

Exactly-once: Idempotent producers (PID + sequence dedup) prevent retry duplicates; transactions enable atomic read-process-write across topics; critical for stream processing

Replay: Consumers can seek to any offset within the retention window; enables reprocessing, backfilling, and debugging; the log is the source of truth

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design Kafka

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design Kafka

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does Kafka's storage layer achieve high throughput with disk-based persistence?

2How does partition replication and leader election work for fault tolerance?

3How do consumer groups and partition assignment work?

4How does Kafka achieve exactly-once semantics (EOS)?

5How has Kafka's architecture evolved from ZooKeeper to KRaft?

6How would you design Kafka for extremely high throughput and low latency?

7How do you handle common production challenges: data retention, backpressure, and monitoring?

Key Topics

Asked At

Design Kafka

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does Kafka's storage layer achieve high throughput with disk-based persistence?

2How does partition replication and leader election work for fault tolerance?

3How do consumer groups and partition assignment work?

4How does Kafka achieve exactly-once semantics (EOS)?

5How has Kafka's architecture evolved from ZooKeeper to KRaft?

6How would you design Kafka for extremely high throughput and low latency?

7How do you handle common production challenges: data retention, backpressure, and monitoring?

Key Topics

Asked At