Loading learning content...
The MESI protocol is the foundational cache coherence protocol used in virtually all modern multiprocessor systems. Named for its four states—Modified, Exclusive, Shared, and Invalid—MESI provides a complete solution to the cache coherence problem while minimizing unnecessary memory and coherence traffic.
Understanding MESI is essential for anyone working on systems programming, performance optimization, or hardware architecture. Every time you use a lock, access shared data, or wonder why your multi-threaded code is slow, MESI is operating beneath the surface.
This page covers each MESI state in detail, the full state transition diagram, how processors coordinate state changes through snooping and ownership, and the MOESI and MESIF variants used by AMD and Intel respectively. You'll understand how modern CPUs maintain coherence at the protocol level.
Each cache line in a MESI-based cache is tagged with one of four states. These states encode both the line's relationship to memory and to other caches in the system.
| State | Meaning | Memory Match? | Other Caches? | Can Write? |
|---|---|---|---|---|
| M (Modified) | This cache has the ONLY copy, and it's been modified | No (stale) | No copies | Yes (already exclusive) |
| E (Exclusive) | This cache has the ONLY copy, unmodified | Yes (clean) | No copies | Yes (silently → M) |
| S (Shared) | This cache has a copy, others may too | Yes (clean) | Maybe | No (must invalidate first) |
| I (Invalid) | This cache line is not valid/empty | N/A | N/A | No (must fetch first) |
Key Insights:
Modified (M):
Exclusive (E):
Shared (S):
Invalid (I):
The Exclusive state is a key optimization. Without it (just M, S, I), every write by a single processor would require bus transaction to invalidate potential sharers—even when there are none. With E, a processor writing to data that only it has cached can simply flip to M silently. This silent upgrade saves significant coherence traffic for non-shared data.
MESI state transitions are triggered by two types of events:
Let's examine all transitions in detail:
Bus Transaction Types:
| Event | Next State | Bus Action | Explanation |
|---|---|---|---|
| Processor Read | E or S | BusRd | Fetch line. If no other cache has it, go to E. If shared, go to S. |
| Processor Write | M | BusRdX | Fetch line with exclusive access, then write. Go directly to M. |
| Snooped BusRd | I | None | Line not present, ignore. |
| Snooped BusRdX | I | None | Line not present, ignore. |
Let's trace through common scenarios to see MESI in action.
The ping-pong pattern (example 3) is the pathological case for cache coherence. It's what happens when multiple threads use a simple spinlock or frequently update shared counters. Good parallel algorithms minimize this contention through per-thread data, combining trees, or lock-free structures that reduce coherence traffic.
AMD processors use the MOESI protocol, which adds an Owned (O) state to MESI. This optimization improves performance for certain sharing patterns.
| Property | Description |
|---|---|
| Definition | This cache has a modified (dirty) copy, but OTHER caches also have (Shared) copies |
| Memory Status | Memory is STALE (not updated) |
| Other Caches | May have Shared (read-only) copies |
| Responsibility | This cache must supply data on BusRd requests |
| On Eviction | MUST write back to memory (since memory is stale) |
Why Owned State Helps:
In standard MESI, when a Modified line is shared:
With MOESI:
Benefit: Avoids memory write when going from Modified to shared scenario. If P1 was just doing a temporary read, we saved a memory write.
State Transitions with Owned:
1234567891011121314151617181920212223
MOESI State Transitions (key differences from MESI): Modified + Snooped BusRd: MESI: M → S (with memory write-back) MOESI: M → O (NO memory write-back, this cache becomes "Owner") Owned State Behaviors: Owner + PrRd → O (hit, supply data) Owner + PrWr → M (become Modified, invalidate Shared copies) Owner + BusRd → O (supply data to requester, stay Owner) Owner + BusRdX → I (supply data, become Invalid, requester becomes M) Owner + Eviction → Write to memory! (memory was stale) Shared State (in MOESI context): If another cache is Owner, Shared caches don't need to supply data The Owner is responsible for supplying data Benefit Summary:- Reduces memory traffic when Modified data is read by others- Memory is only updated when the Owner is evicted or another writer appears- Common case: P0 modifies, P1 reads once, P0 continues modifying - With MESI: memory written on P1's read - With MOESI: no memory write neededMOESI is particularly beneficial for systems with high memory latency (like NUMA systems) where avoiding memory writes is valuable. AMD's use of MOESI reflects their focus on multi-socket server systems where memory bandwidth is precious. Intel takes a different approach with MESIF, optimizing for different scenarios.
Intel processors use the MESIF protocol, which adds a Forward (F) state to MESI. This optimization addresses a different problem: who should supply data when multiple caches have Shared copies?
| Property | Description |
|---|---|
| Definition | This cache has a clean shared copy AND is responsible for responding to BusRd requests |
| Memory Status | Memory is CURRENT (not stale) |
| Other Caches | May have Shared copies (but they won't respond to snoops) |
| Responsibility | This cache (and only this cache among Shared copies) responds to BusRd |
| On Eviction | Can discard; another Shared copy becomes Forward |
Why Forward State Helps:
The Problem with Standard MESI Shared State:
When multiple caches have a line in Shared state and a new request arrives:
MESIF Solution:
Forward State Assignment:
Intuition: The most recent reader is likely the one with the most "active" interest in this line, so it should be the one to respond to future requests.
MOESI optimizes for reducing memory writes (keeps dirty data in Owner state without updating memory). MESIF optimizes for reducing response conflicts in shared read scenarios. Both are valid approaches; the best choice depends on workload characteristics. AMD's server focus favors MOESI; Intel's varied workloads work well with MESIF.
Implementing a cache coherence protocol involves significant hardware complexity. Key implementation considerations include:
The Role of the Bus (or Network):
Classic snooping protocols assume a shared bus with specific properties:
These properties make coherence straightforward but don't scale. Modern systems use point-to-point interconnects (rings, meshes, crossbars) with additional mechanisms to provide ordering and visibility.
Handling Transient States:
Real implementations have additional transient states for in-progress operations:
I-S: Invalid, waiting for data to become Shared
S-M: Shared, waiting for invalidation acknowledgments to become Modified
M-S: Modified, in process of responding to snoop (becoming Shared)
I-M: Invalid, waiting for data and acks to become Modified
These transient states ensure correct behavior while transactions are in-flight. An incoming snoop to a cache in state I-S must be handled carefully—the line isn't yet valid, but we've committed to fetching it.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
// Simplified MESI cache controller state machine (pseudocode) enum State { I, S, E, M };enum BusOp { BusRd, BusRdX, BusUpgr, Flush }; // Handle processor requestfunction handleProcessorRequest(addr, isWrite) { line = findCacheLine(addr); if (line.state == State.I) { if (isWrite) { busRequest(BusRdX, addr); // Get with exclusive intent wait for data; line.state = State.M; } else { busRequest(BusRd, addr); // Get shared wait for data; if (noOtherSharers) { line.state = State.E; // We're the only one } else { line.state = State.S; // Others have copies } } } else if (line.state == State.S) { if (isWrite) { busRequest(BusUpgr, addr); // Already have data, just need exclusive wait for invalidation acks; line.state = State.M; } // Read hit: no action needed } else if (line.state == State.E) { if (isWrite) { line.state = State.M; // Silent upgrade! } // Read hit: no action needed } else if (line.state == State.M) { // Read or write hit: no action needed }} // Handle snooped bus requestfunction handleSnoopedRequest(addr, op, requestor) { line = findCacheLine(addr); if (line == null || line.state == State.I) return; // Not our concern switch (op) { case BusRd: if (line.state == State.M) { supplyData(requestor); // We have dirty data writeBackToMemory(); // Or just supply to requestor line.state = State.S; } else if (line.state == State.E) { supplyData(requestor); line.state = State.S; } // Shared: may or may not supply (depends on protocol variant) break; case BusRdX: if (line.state == State.M) { supplyData(requestor); writeBackToMemory(); } else if (line.state == State.E) { supplyData(requestor); } // In all cases, invalidate our copy line.state = State.I; break; case BusUpgr: // Someone with Shared wants exclusive line.state = State.I; break; }}Cache coherence protocols are notoriously difficult to design correctly. A subtle bug can cause data corruption that's nearly impossible to debug. Formal verification is essential.
Verification Approaches:
1. Formal Methods (Model Checking):
2. Runtime Monitoring:
3. Simulation:
The SWMR Invariant as Verification Target:
The Single-Writer-Multiple-Reader invariant must hold at all times:
Invariant: For any address A at time T:
(CountModified(A) == 1 && CountShared(A) == 0) ||
(CountModified(A) == 0 && CountExclusive(A) <= 1)
If this invariant is violated, even momentarily, data corruption can result. Verification tools exhaustively check that no sequence of operations can violate this invariant.
Major CPU vendors have shipped processors with coherence bugs. Intel's early Pentium Pro had coherence issues in rare corner cases. These bugs are often discovered years later and require microcode patches. The design space is so complex that complete verification is challenging despite significant investment.
We've covered the MESI protocol and its extensions in depth. Let's consolidate:
Module Complete:
This concludes our deep dive into cache coherence. You now understand:
This knowledge is fundamental for understanding operating system kernel development, writing high-performance parallel code, and reasoning about memory system behavior in modern multiprocessors.
Congratulations! You've mastered cache coherence—from the memory wall problem through cache hierarchy design to the MESI protocol that keeps it all consistent. You can now reason about cache behavior, understand coherence overhead, and write cache-friendly parallel code.