Loading learning content...
When LinkedIn engineers confronted the challenge of processing billions of events daily—user activity, system metrics, operational data—traditional message queues buckled under the pressure. Their solution, Apache Kafka, emerged as a fundamentally different approach to messaging: a distributed, fault-tolerant commit log designed for massive throughput and real-time data streaming.
Kafka isn't just another message broker. It represents a paradigm shift in how we think about data flow in distributed systems. Rather than treating messages as transient items to be consumed and discarded, Kafka treats them as durable, ordered records that form the single source of truth for your data pipeline. This log-centric philosophy has made Kafka the backbone of modern data infrastructure at companies like Netflix, Uber, Airbnb, and LinkedIn itself.
By the end of this page, you will understand Kafka's log-based architecture, its high-throughput design principles, partitioning and replication strategies, the consumer group model, and how Kafka achieves exactly-once semantics. You'll gain the depth required to architect Kafka-based solutions for real-world distributed systems.
At the heart of Kafka lies a deceptively simple abstraction: the append-only log. Understanding this foundational concept is essential to grasping why Kafka behaves differently from traditional message queues.
What is a commit log?
A commit log is an ordered, immutable sequence of records. New records are always appended to the end—never modified or inserted in the middle. Each record receives a unique, monotonically increasing sequence number called an offset. This offset serves as the record's permanent address within the log.
This design mirrors the write-ahead logs (WAL) used by databases like PostgreSQL and MySQL for crash recovery. But Kafka elevates the log from an internal implementation detail to the primary data structure that applications interact with directly.
+----------------------------------------------+| Kafka Topic: user-events |+----------------------------------------------+| Partition 0 || +----+----+----+----+----+----+----+----+ || | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | || +----+----+----+----+----+----+----+----+ || ↑ ↑ || oldest newest || || Each cell = one message with unique offset |+----------------------------------------------+ Key Properties:• Append-only: New messages added only at the end• Immutable: Once written, messages cannot be modified• Ordered: Offset provides strict ordering within partition• Durable: Messages persisted to disk, replicated• Retained: Messages kept for configurable retention periodWhy logs matter for distributed systems:
The log abstraction provides several critical properties that make distributed systems easier to build:
Ordering guarantees: Within a partition, messages maintain strict ordering based on their offset. This enables consumers to process events in the exact sequence they occurred.
Replay capability: Unlike traditional queues where messages disappear after consumption, Kafka retains messages for a configurable period. Consumers can "rewind" to any offset and reprocess historical data—invaluable for debugging, rebuilding derived state, or onboarding new consumers.
Decoupled consumption: Multiple consumers can read from the same log independently, each tracking their own offset. The log doesn't care how many consumers exist or how fast they read.
Crash recovery: If a consumer fails, it can resume from its last committed offset. No messages are lost; processing continues exactly where it left off.
Martin Kleppmann famously described Kafka as "a database turned inside out." Traditional databases hide their internal log and expose tables for querying. Kafka exposes the log directly and treats derived views (databases, caches, search indexes) as materialized projections of that log. This inversion enables powerful patterns like Change Data Capture and Event Sourcing.
Kafka's exceptional throughput—capable of handling millions of messages per second on commodity hardware—stems from deliberate architectural choices that optimize for sequential I/O, minimize copies, and leverage modern operating system capabilities.
Sequential I/O over Random I/O:
Hard drives and SSDs exhibit dramatically different performance characteristics for sequential versus random access. Sequential writes to disk can achieve throughputs of 600MB/s or more, while random writes might manage only 100KB/s—a difference of three orders of magnitude. Kafka's append-only log structure ensures all disk operations are sequential, converting what would be expensive random access patterns into efficient sequential scans.
| Operation Type | HDD Throughput | SSD Throughput | Latency Pattern |
|---|---|---|---|
| Sequential Write | 100-200 MB/s | 400-600 MB/s | Predictable, low |
| Random Write | 0.1-1 MB/s | 10-100 MB/s | Unpredictable, high |
| Sequential Read | 100-200 MB/s | 450-550 MB/s | Predictable, low |
| Random Read | 0.1-1 MB/s | 10-100 MB/s | Unpredictable, high |
Zero-Copy Data Transfer:
Traditional network servers read data from disk into application memory, then copy it to socket buffers for transmission—multiple memory copies that consume CPU and bandwidth. Kafka eliminates these copies using the sendfile() system call, allowing the operating system to transfer data directly from disk (or page cache) to the network card without any application-level copying.
This zero-copy mechanism reduces CPU overhead and enables a single Kafka broker to saturate a 10Gbps network link while consuming minimal CPU resources.
Traditional Data Transfer:1. Disk → Kernel Buffer (DMA copy)2. Kernel Buffer → User Buffer (CPU copy)3. User Buffer → Socket Buffer (CPU copy)4. Socket Buffer → NIC (DMA copy)Total: 4 copies, 2 context switches Kafka Zero-Copy (sendfile):1. Disk → Kernel Buffer (DMA copy)2. Kernel Buffer → NIC (DMA copy via scatter-gather)Total: 2 copies, 0 context switches Result: ~3x throughput improvement, 60% CPU reductionPage Cache Utilization:
Kafka delegates caching entirely to the operating system's page cache rather than maintaining its own in-process cache. This design has profound implications:
Batching and Compression:
Kafka producers batch multiple messages together before sending, and these batches can be compressed using algorithms like LZ4, Snappy, or Zstandard. Batching amortizes network round-trip costs, while compression reduces both network bandwidth and disk storage requirements.
Kafka brokers benefit from: (1) Multiple high-capacity HDDs or SSDs in JBOD configuration for parallel writes, (2) Moderate RAM (64GB+) for page cache, (3) High-bandwidth network (10Gbps+), (4) Modern multi-core CPUs for compression. Notably, expensive enterprise storage arrays often perform worse than commodity drives due to their random-access optimization.
While a single Kafka partition provides ordering guarantees, it also represents a throughput bottleneck—all reads and writes for that partition funnel through a single broker. Partitioning is Kafka's mechanism for horizontal scaling, distributing a topic's data across multiple brokers for parallel processing.
Partition fundamentals:
A Kafka topic is divided into one or more partitions. Each partition is an independent, ordered log stored on a single broker. Messages within a partition maintain strict ordering; messages across partitions have no ordering guarantees.
Producers determine which partition receives each message using a partitioning strategy:
Key-based partitioning: Messages with the same key always go to the same partition (using hash(key) % num_partitions). This ensures related messages are processed in order.
Round-robin partitioning: Messages without keys are distributed evenly across partitions for load balancing.
Custom partitioners: Applications can implement custom logic for routing messages to specific partitions.
Topic: orders (3 partitions)+--------------------------------------------------+| Producer || Key: "user-123" → hash("user-123") % 3 = 1 || Key: "user-456" → hash("user-456") % 3 = 0 || Key: "user-789" → hash("user-789") % 3 = 2 |+--------------------------------------------------+ ↓ ↓ ↓+------------------+ +------------------+ +------------------+| Partition 0 | | Partition 1 | | Partition 2 || (Broker 1) | | (Broker 2) | | (Broker 3) || | | | | || user-456 orders | | user-123 orders | | user-789 orders || ordered in-seq | | ordered in-seq | | ordered in-seq |+------------------+ +------------------+ +------------------+ Key insight: Orders for the same user are always in the same partition,guaranteeing they're processed in order.Parallelism through partitions:
The number of partitions determines the maximum parallelism for consumers. If a topic has 12 partitions, up to 12 consumer instances can read in parallel—each assigned one or more partitions. This creates a natural scaling path:
Choosing the right partition count:
Partition count is a critical design decision with long-term implications:
| Factor | Consideration |
|---|---|
| Throughput | More partitions = more parallel producers and consumers |
| Ordering | Messages with same key should go to same partition |
| Consumer scaling | Maximum consumers = number of partitions |
| End-to-end latency | More partitions = longer replication time |
| Broker memory | Each partition consumes resources (~10KB per partition) |
| Topic creation time | Many partitions slow down topic creation |
You can increase partition count after topic creation, but you cannot decrease it. When partitions increase, existing key-to-partition mappings change—messages with the same key may move to different partitions. For stateful consumers relying on key-based partitioning, this requires careful coordination.
Distributed systems fail—brokers crash, disks corrupt, networks partition. Kafka's replication mechanism ensures data survives these failures while maintaining consistency guarantees.
Leader-Follower replication:
Each partition has one leader and zero or more followers (replicas). All reads and writes go through the leader; followers passively replicate the leader's log. If the leader fails, one of the followers is promoted to leader.
In-Sync Replicas (ISR):
Followers that have fully caught up with the leader form the In-Sync Replica set (ISR). Kafka considers a write "committed" only when all ISR members have acknowledged it. If a follower falls too far behind (configurable via replica.lag.time.max.ms), it's removed from the ISR.
This ISR mechanism provides a balance between durability and availability:
Partition 0: Replication Factor = 3+-----------------------------------------------+| Leader (Broker 1) || Log: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] || ↑ ↑ || committed high-water-mark |+-----------------------------------------------+ ↓ replicate ↓ replicate+------------------+ +------------------+| Follower (B2) | | Follower (B3) || [0,1,2,3,4,5,6,7]| | [0,1,2,3,4,5,6] || in-sync (ISR) | | in-sync (ISR) |+------------------+ +------------------+ High Water Mark: Offset 7 (replicated to all ISR)Consumers only see committed messages (offset ≤ 7)Leader has messages 8,9 not yet replicatedProducer acknowledgment modes:
Producers control durability guarantees through the acks setting:
Combined with min.insync.replicas, you can enforce that writes only succeed if a minimum number of replicas acknowledge:
# Require at least 2 replicas (including leader) to be in-sync
min.insync.replicas=2
# Wait for all in-sync replicas to acknowledge
acks=all
# Result: Write succeeds only if leader + 1 follower confirm
# If fewer than 2 replicas are in-sync, producer receives error
| Setting | Durability | Throughput | Latency | Use Case |
|---|---|---|---|---|
| acks=0 | None | Highest | Lowest | Metrics, logs (loss acceptable) |
| acks=1 | Weak | High | Low | General use, acceptable risk |
| acks=all | Strong | Lower | Higher | Financial, critical data |
| acks=all + min.insync.replicas=2 | Strongest | Lowest | Highest | Zero data loss requirements |
By default, Kafka only elects leaders from the ISR. If all ISR members fail, the partition becomes unavailable. Setting unclean.leader.election.enable=true allows out-of-sync replicas to become leader, restoring availability at the cost of potential data loss. This trade-off between availability and consistency reflects the CAP theorem in practice.
Kafka's consumer model is built around the concept of consumer groups—a mechanism for distributing partition consumption across multiple consumer instances while ensuring each message is processed only once within the group.
How consumer groups work:
Consumers identify themselves with a group.id. Kafka's group coordinator assigns partitions to consumers within the group such that:
When consumers join or leave the group, or when partitions change, Kafka triggers a rebalance to redistribute assignments.
Topic: events (4 partitions)+--------------------+| Consumer Group: A | (Order Processing)+--------------------+| Consumer 1: P0, P1 || Consumer 2: P2, P3 |+--------------------+Each message processed once by group A +--------------------+| Consumer Group: B | (Analytics)+--------------------+| Consumer 1: P0,P1,P2,P3 |+--------------------+Each message processed once by group B Key insight: Both groups receive ALL messages.Within each group, messages are distributed (work queue pattern).Across groups, messages are broadcast (pub-sub pattern).Offset management:
Kafka consumers track their progress through committed offsets. When a consumer restarts, it resumes from its last committed offset. Offset commits can be:
// Auto-commit enabled
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "5000");
// Manual commit for precise control
consumer.commitSync(); // Blocking
consumer.commitAsync(); // Non-blocking
Rebalancing strategies:
Kafka supports different partition assignment strategies:
During a rebalance, all consumers in the group stop processing—a "stop-the-world" event. For large consumer groups processing latency-sensitive workloads, this can cause significant disruption. Modern Kafka (2.4+) supports Incremental Cooperative Rebalancing, which allows consumers to continue processing unchanged assignments during rebalance.
"Exactly-once" delivery is the holy grail of distributed messaging—ensuring each message is processed once and only once, even in the presence of failures. While theoretically impossible to achieve in all edge cases, Kafka provides strong exactly-once guarantees within its ecosystem through idempotent producers and transactional messaging.
Idempotent producers:
Network failures can cause producers to retry sending messages, potentially creating duplicates. Kafka's idempotent producer feature assigns each producer a unique Producer ID (PID) and each message a sequence number. The broker detects and discards duplicate messages based on these identifiers.
// Enable idempotent producer
props.put("enable.idempotence", "true");
// Automatically sets:
// - acks=all
// - retries=Integer.MAX_VALUE
// - max.in.flight.requests.per.connection=5 (or less)
Without Idempotence:Producer sends message A → Network timeoutProducer retries message A → SuccessProducer unaware first send succeededResult: Message A stored twice With Idempotence:Producer (PID=5) sends A with seq=0 → Network timeoutProducer (PID=5) retries A with seq=0 → SuccessBroker: "Already have PID=5, seq=0, ignoring duplicate"Result: Message A stored exactly onceTransactional messaging:
For workflows that write to multiple partitions or topics, idempotence alone isn't sufficient. Kafka transactions enable atomic writes across multiple operations:
producer.initTransactions();
try {
producer.beginTransaction();
// Write to multiple topics/partitions atomically
producer.send(new ProducerRecord<>("orders", order));
producer.send(new ProducerRecord<>("inventory", update));
producer.send(new ProducerRecord<>("notifications", alert));
// Commit consumer offsets as part of transaction
producer.sendOffsetsToTransaction(offsets, consumerGroupId);
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
}
Read committed isolation:
Consumers can be configured to only see committed messages, hiding any in-flight transactions:
props.put("isolation.level", "read_committed");
// Consumer only sees messages from committed transactions
// Uncommitted/aborted transaction messages are filtered out
Kafka has evolved from a messaging system into a complete streaming platform with a rich ecosystem of tools for building real-time data pipelines and applications.
Core ecosystem components:
Kafka Streams: A client library for building streaming applications directly on Kafka. Provides stateful processing, windowing, and exactly-once semantics without requiring a separate cluster.
Kafka Connect: A framework for reliably streaming data between Kafka and external systems (databases, search indexes, file systems). Hundreds of pre-built connectors available.
Schema Registry: Manages schemas for Kafka messages (typically Avro, Protobuf, or JSON Schema). Enables schema evolution and compatibility checking.
ksqlDB: A streaming database offering SQL-like syntax for real-time analytics and transformations on Kafka topics.
| Use Case | Description | Example |
|---|---|---|
| Event Streaming | Real-time event processing and analytics | User activity tracking, click streams |
| Log Aggregation | Collecting logs from multiple services | Centralized logging pipeline |
| Data Integration | Moving data between systems | Database CDC to data warehouse |
| Stream Processing | Real-time transformations and analytics | Fraud detection, recommendations |
| Event Sourcing | Storing state changes as events | Order management, audit logs |
| Metrics Pipeline | Collecting and processing operational metrics | System monitoring, alerting |
Kafka excels when you need: (1) Very high throughput (millions of messages/sec), (2) Message replay/reprocessing capability, (3) Long message retention (days/weeks), (4) Stream processing or event sourcing, (5) Multiple independent consumers reading same data. Consider alternatives if you primarily need traditional request-reply patterns or complex routing.
Apache Kafka represents a fundamental rethinking of distributed messaging, built around the durable, ordered commit log. Let's consolidate the key concepts:
You now have a deep understanding of Apache Kafka's architecture and design principles. Next, we'll explore RabbitMQ—a message broker with a very different philosophy centered on flexible routing and the AMQP protocol.