Loading system design...
Design a distributed event streaming platform like Apache Kafka that provides durable, high-throughput, fault-tolerant message streaming. Producers publish messages to partitioned topics stored as append-only commit logs; consumers in groups read messages in order and track progress via offsets; partitions are replicated across brokers with automatic leader election; the system supports exactly-once semantics, log compaction, and scales to millions of messages per second.
| Metric | Value |
|---|---|
| Messages per second (large cluster) | 10+ million |
| Data throughput (single broker) | 600 MB/s write, 1+ GB/s read |
| Brokers per cluster | 10–1,000+ |
| Topics per cluster | 10,000+ |
| Partitions per cluster | 100,000+ (millions with KRaft) |
| Replication factor (typical) | 3 |
| Message retention | 7 days (configurable; tiered storage for infinite) |
| Consumer groups | Thousands per cluster |
| Average message size | 1 KB (range: 100 bytes – 1 MB) |
| End-to-end latency (p99) | < 10ms (acks=1), < 50ms (acks=all) |
Publish messages: producers send messages (key-value byte arrays) to named topics; messages are durably stored and acknowledged; support synchronous and asynchronous publishing
Subscribe and consume: consumers subscribe to topics and poll for new messages; support consumer groups where each partition is consumed by exactly one consumer in the group (load balancing); independent groups consume independently
Topic partitioning: each topic divided into configurable number of partitions; messages distributed across partitions by key hash (or round-robin if no key); partitions are the unit of parallelism
Durable storage: messages persisted to disk in an append-only log (commit log); retained for a configurable period (e.g., 7 days) or until a size limit; consumers can replay from any offset within the retention window
Replication: each partition replicated across multiple brokers (configurable replication factor, e.g., 3); one replica is the leader (handles reads/writes), others are followers; automatic leader election on broker failure
Ordering guarantees: messages with the same key always go to the same partition; within a partition, messages are strictly ordered (FIFO); no ordering guarantee across partitions
Consumer offsets: track each consumer group's progress per partition (committed offset); consumers can seek to any offset (replay, skip); offsets stored in an internal __consumer_offsets topic
Exactly-once semantics (EOS): support idempotent producers (dedup retries) and transactional produce-consume (atomic read-process-write across topics); prevents duplicate processing in stream pipelines
Consumer group rebalancing: when consumers join/leave a group, partitions are automatically redistributed among remaining consumers; support sticky, range, and cooperative rebalancing strategies
Schema registry: optional schema management for messages (Avro, Protobuf, JSON Schema); enforce schema compatibility (backward, forward, full); evolve schemas without breaking consumers
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?