Operating SystemsCache Coherence

Cache Coherence and Memory Consistency

LevelIntermediate

Duration90 mins

TopicCache Coherence

5 / 5

Coherence Protocols (MESI)

The Four-State Solution

The MESI protocol is the foundational cache coherence protocol used in virtually all modern multiprocessor systems. Named for its four states—Modified, Exclusive, Shared, and Invalid—MESI provides a complete solution to the cache coherence problem while minimizing unnecessary memory and coherence traffic.

Understanding MESI is essential for anyone working on systems programming, performance optimization, or hardware architecture. Every time you use a lock, access shared data, or wonder why your multi-threaded code is slow, MESI is operating beneath the surface.

What You Will Learn

This page covers each MESI state in detail, the full state transition diagram, how processors coordinate state changes through snooping and ownership, and the MOESI and MESIF variants used by AMD and Intel respectively. You'll understand how modern CPUs maintain coherence at the protocol level.

MESI States Overview

Each cache line in a MESI-based cache is tagged with one of four states. These states encode both the line's relationship to memory and to other caches in the system.

MESI Cache Line States
State	Meaning	Memory Match?	Other Caches?	Can Write?
M (Modified)	This cache has the ONLY copy, and it's been modified	No (stale)	No copies	Yes (already exclusive)
E (Exclusive)	This cache has the ONLY copy, unmodified	Yes (clean)	No copies	Yes (silently → M)
S (Shared)	This cache has a copy, others may too	Yes (clean)	Maybe	No (must invalidate first)
I (Invalid)	This cache line is not valid/empty	N/A	N/A	No (must fetch first)

Key Insights:

Modified (M):

The cache line has been written since being loaded
This cache is the ONLY valid copy in the entire system
Memory is stale—contains old data
The cache "owns" this line and must supply it to any requester
On eviction, MUST write back to memory

Exclusive (E):

The cache line matches memory content
This cache is the ONLY cache with a copy
Can transition to Modified without any bus transaction (silent upgrade)
This is the "optimization state"—avoids invalidations when writing
On eviction, can simply discard (memory is valid)

Shared (S):

The cache line matches memory content
Other caches MAY have copies (we don't know for sure after evictions)
Read-only access—cannot write without first invalidating other copies
Multiple Shared copies can coexist
On eviction, can simply discard

Invalid (I):

The cache line contains no valid data (or garbage)
May be genuinely empty or may have been invalidated
Any access requires fetching from memory or another cache
The "base state" for unused cache lines

The Value of Exclusive State

The Exclusive state is a key optimization. Without it (just M, S, I), every write by a single processor would require bus transaction to invalidate potential sharers—even when there are none. With E, a processor writing to data that only it has cached can simply flip to M silently. This silent upgrade saves significant coherence traffic for non-shared data.

State Transitions

MESI state transitions are triggered by two types of events:

Processor Events: Reads and writes by the local processor
Bus Events (Snooped): Requests from other processors observed on the bus

Let's examine all transitions in detail:

Converting Mermaid diagram...

Bus Transaction Types:

BusRd: Read request—processor wants to read, needs data
BusRdX: Read-exclusive request—processor wants to write, needs exclusive access
BusUpgr: Upgrade request—already have data (Shared), just need exclusive access
Flush: Supply data to bus (from Modified state), may or may not write to memory

Transitions from Invalid State
Event	Next State	Bus Action	Explanation
Processor Read	E or S	BusRd	Fetch line. If no other cache has it, go to E. If shared, go to S.
Processor Write	M	BusRdX	Fetch line with exclusive access, then write. Go directly to M.
Snooped BusRd	I	None	Line not present, ignore.
Snooped BusRdX	I	None	Line not present, ignore.

Worked Examples

Let's trace through common scenarios to see MESI in action.

Example 1: Single-Writer PatternOne processor reads and writes to a location, no other processor touches it:

Input

Output

Example 2: Reader-Writer Pattern (Producer-Consumer)P0 writes, then P1 reads:

Input

Output

Example 3: Ping-Pong (Contention Pattern)P0 and P1 alternately write to the same location:

Input

Output

Avoid Ping-Pong Patterns

The ping-pong pattern (example 3) is the pathological case for cache coherence. It's what happens when multiple threads use a simple spinlock or frequently update shared counters. Good parallel algorithms minimize this contention through per-thread data, combining trees, or lock-free structures that reduce coherence traffic.

MOESI Protocol (AMD Extension)

AMD processors use the MOESI protocol, which adds an Owned (O) state to MESI. This optimization improves performance for certain sharing patterns.

The Owned State
Property	Description
Definition	This cache has a modified (dirty) copy, but OTHER caches also have (Shared) copies
Memory Status	Memory is STALE (not updated)
Other Caches	May have Shared (read-only) copies
Responsibility	This cache must supply data on BusRd requests
On Eviction	MUST write back to memory (since memory is stale)

Why Owned State Helps:

In standard MESI, when a Modified line is shared:

P0 has Modified
P1 reads (BusRd)
P0 must: supply data to P1, write back to memory, become Shared
Both P0 and P1 become Shared; memory is now current

With MOESI:

P0 has Modified
P1 reads (BusRd)
P0: supplies data to P1, becomes Owned (no memory write!)
P1 becomes Shared
Memory is still stale, but P0 is responsible for supplying data

Benefit: Avoids memory write when going from Modified to shared scenario. If P1 was just doing a temporary read, we saved a memory write.

State Transitions with Owned:

moesi_transitions.txt

Text

MOESI State Transitions (key differences from MESI):
 
Modified + Snooped BusRd:
  MESI:  M → S (with memory write-back)
  MOESI: M → O (NO memory write-back, this cache becomes "Owner")
 
Owned State Behaviors:
  Owner + PrRd         → O (hit, supply data)
  Owner + PrWr         → M (become Modified, invalidate Shared copies)
  Owner + BusRd        → O (supply data to requester, stay Owner)
  Owner + BusRdX       → I (supply data, become Invalid, requester becomes M)
  Owner + Eviction     → Write to memory! (memory was stale)
 
Shared State (in MOESI context):
  If another cache is Owner, Shared caches don't need to supply data
  The Owner is responsible for supplying data
 
Benefit Summary:
- Reduces memory traffic when Modified data is read by others
- Memory is only updated when the Owner is evicted or another writer appears
- Common case: P0 modifies, P1 reads once, P0 continues modifying
  - With MESI: memory written on P1's read
  - With MOESI: no memory write needed

AMD's MOESI Advantage

MOESI is particularly beneficial for systems with high memory latency (like NUMA systems) where avoiding memory writes is valuable. AMD's use of MOESI reflects their focus on multi-socket server systems where memory bandwidth is precious. Intel takes a different approach with MESIF, optimizing for different scenarios.

MESIF Protocol (Intel Extension)

Intel processors use the MESIF protocol, which adds a Forward (F) state to MESI. This optimization addresses a different problem: who should supply data when multiple caches have Shared copies?

The Forward State
Property	Description
Definition	This cache has a clean shared copy AND is responsible for responding to BusRd requests
Memory Status	Memory is CURRENT (not stale)
Other Caches	May have Shared copies (but they won't respond to snoops)
Responsibility	This cache (and only this cache among Shared copies) responds to BusRd
On Eviction	Can discard; another Shared copy becomes Forward

Why Forward State Helps:

The Problem with Standard MESI Shared State:

When multiple caches have a line in Shared state and a new request arrives:

All Shared caches could respond → wasteful, potential collisions
Only memory responds → slow (memory is slower than cache-to-cache transfer)
Arbitration needed → complex, adds latency

MESIF Solution:

Exactly ONE cache has the Forward state
Other sharers have Shared state (and do NOT respond to snoops)
Forward cache (or memory, if no Forward) responds to BusRd

Forward State Assignment:

When a cache fetches a line and receives it from memory → becomes F
When a Forward cache supplies data to another cache → recipient becomes F, supplier becomes S
This "passes the responsibility" to the most recent reader

Intuition: The most recent reader is likely the one with the most "active" interest in this line, so it should be the one to respond to future requests.

MESIF in ActionThree processors reading the same data:

Input

Output

MOESI vs MESIF: Different Optimizations

MOESI optimizes for reducing memory writes (keeps dirty data in Owner state without updating memory). MESIF optimizes for reducing response conflicts in shared read scenarios. Both are valid approaches; the best choice depends on workload characteristics. AMD's server focus favors MOESI; Intel's varied workloads work well with MESIF.

Protocol Implementation Details

Implementing a cache coherence protocol involves significant hardware complexity. Key implementation considerations include:

Implementation Considerations

•Snoop Filtering: In systems with many cores, having every cache snoop every transaction is expensive. Snoop filters (or directory-based hints) reduce unnecessary snoops.
•Transaction Ordering: Bus transactions must be ordered consistently. All caches must see transactions in the same order to maintain coherence.
•Atomic Transitions: State transitions and data transfers must be atomic. Between deciding to transition and completing the transition, the state must be protected.
•Deadlock Avoidance: With multiple outstanding requests, circular dependencies can occur. Protocol design must prevent deadlock.
•Retry Mechanisms: If a transaction cannot complete (e.g., target busy), the protocol needs retry mechanisms without blocking the entire system.

The Role of the Bus (or Network):

Classic snooping protocols assume a shared bus with specific properties:

Broadcast: All caches see all transactions
Atomic: Only one transaction at a time (serialization point)
Ordered: All caches see transactions in the same order

These properties make coherence straightforward but don't scale. Modern systems use point-to-point interconnects (rings, meshes, crossbars) with additional mechanisms to provide ordering and visibility.

Handling Transient States:

Real implementations have additional transient states for in-progress operations:

I-S: Invalid, waiting for data to become Shared
S-M: Shared, waiting for invalidation acknowledgments to become Modified
M-S: Modified, in process of responding to snoop (becoming Shared)
I-M: Invalid, waiting for data and acks to become Modified

These transient states ensure correct behavior while transactions are in-flight. An incoming snoop to a cache in state I-S must be handled carefully—the line isn't yet valid, but we've committed to fetching it.

mesi_state_machine.pseudo

Text

// Simplified MESI cache controller state machine (pseudocode)
 
enum State { I, S, E, M };
enum BusOp { BusRd, BusRdX, BusUpgr, Flush };
 
// Handle processor request
function handleProcessorRequest(addr, isWrite) {
    line = findCacheLine(addr);
    
    if (line.state == State.I) {
        if (isWrite) {
            busRequest(BusRdX, addr);     // Get with exclusive intent
            wait for data;
            line.state = State.M;
        } else {
            busRequest(BusRd, addr);       // Get shared
            wait for data;
            if (noOtherSharers) {
                line.state = State.E;      // We're the only one
            } else {
                line.state = State.S;      // Others have copies
            }
        }
    } else if (line.state == State.S) {
        if (isWrite) {
            busRequest(BusUpgr, addr);     // Already have data, just need exclusive
            wait for invalidation acks;
            line.state = State.M;
        }
        // Read hit: no action needed
    } else if (line.state == State.E) {
        if (isWrite) {
            line.state = State.M;          // Silent upgrade!
        }
        // Read hit: no action needed
    } else if (line.state == State.M) {
        // Read or write hit: no action needed
    }
}
 
// Handle snooped bus request
function handleSnoopedRequest(addr, op, requestor) {
    line = findCacheLine(addr);
    if (line == null || line.state == State.I) return;  // Not our concern
    
    switch (op) {
        case BusRd:
            if (line.state == State.M) {
                supplyData(requestor);     // We have dirty data
                writeBackToMemory();       // Or just supply to requestor
                line.state = State.S;
            } else if (line.state == State.E) {
                supplyData(requestor);
                line.state = State.S;
            }
            // Shared: may or may not supply (depends on protocol variant)
            break;
            
        case BusRdX:
            if (line.state == State.M) {
                supplyData(requestor);
                writeBackToMemory();
            } else if (line.state == State.E) {
                supplyData(requestor);
            }
            // In all cases, invalidate our copy
            line.state = State.I;
            break;
            
        case BusUpgr:
            // Someone with Shared wants exclusive
            line.state = State.I;
            break;
    }
}

Protocol Verification and Correctness

Cache coherence protocols are notoriously difficult to design correctly. A subtle bug can cause data corruption that's nearly impossible to debug. Formal verification is essential.

Common Protocol Bugs

•Race Conditions: Two caches simultaneously request exclusive access; protocol must serialize correctly.
•Stale Data Reads: A window exists where a cache can read stale data before invalidation arrives.
•Lost Updates: If invalidation acks are lost or mishandled, writes can be lost.
•Livelock: Caches repeatedly interfere with each other's progress without making forward progress.
•Deadlock: Circular dependencies in resource acquisition cause system hang.
•Incorrect State Encoding: With only 2 bits for 4 states, bit flips can cause silent corruption.

Verification Approaches:

1. Formal Methods (Model Checking):

Express protocol as a state machine
Use tools like Murphi, TLA+, or SPIN to explore all possible states
Prove properties like "no two caches have Modified simultaneously"
Industry standard for critical protocols

2. Runtime Monitoring:

Add hardware checks for invariant violations
Assert: if (state == Modified && otherHasLine) PANIC
Expensive in silicon but catches bugs during validation

3. Simulation:

Run millions of random multi-threaded test sequences
Check memory consistency at checkpoints
Can find bugs but cannot prove absence of bugs

The SWMR Invariant as Verification Target:

The Single-Writer-Multiple-Reader invariant must hold at all times:

Invariant: For any address A at time T:
  (CountModified(A) == 1 && CountShared(A) == 0) ||
  (CountModified(A) == 0 && CountExclusive(A) <= 1)

If this invariant is violated, even momentarily, data corruption can result. Verification tools exhaustively check that no sequence of operations can violate this invariant.

Real-World Protocol Bugs

Major CPU vendors have shipped processors with coherence bugs. Intel's early Pentium Pro had coherence issues in rare corner cases. These bugs are often discovered years later and require microcode patches. The design space is so complex that complete verification is challenging despite significant investment.

Summary: MESI and Variants

We've covered the MESI protocol and its extensions in depth. Let's consolidate:

Key Takeaways

•MESI has four states — Modified (dirty, exclusive), Exclusive (clean, exclusive), Shared (clean, multiple copies), Invalid (not valid).
•Exclusive enables silent upgrades — Writing to Exclusive data becomes Modified with no bus transaction.
•Transitions are triggered by processor ops and snooped bus ops — The protocol responds to both local and remote events.
•MOESI adds Owned state — Allows sharing dirty data without immediately writing to memory. AMD uses this.
•MESIF adds Forward state — Designates one Shared cache to respond to reads, avoiding collisions. Intel uses this.
•Ping-pong patterns are expensive — Multiple writers to the same line cause continuous coherence traffic.
•Implementation is complex — Transient states, atomicity, ordering, and deadlock avoidance require careful design.
•Verification is critical — Formal methods and exhaustive testing are required to ensure correctness.

Module Complete:

This concludes our deep dive into cache coherence. You now understand:

Why caches exist and how they're organized (cache hierarchy, cache lines)
How writes are handled (write-through, write-back)
Why coherence is necessary (the coherence problem)
How coherence is maintained (MESI, MOESI, MESIF protocols)

This knowledge is fundamental for understanding operating system kernel development, writing high-performance parallel code, and reasoning about memory system behavior in modern multiprocessors.

Module Complete!

Congratulations! You've mastered cache coherence—from the memory wall problem through cache hierarchy design to the MESI protocol that keeps it all consistent. You can now reason about cache behavior, understand coherence overhead, and write cache-friendly parallel code.

5 / 5

Loading learning content...

Operating SystemsCache Coherence

Cache Coherence and Memory Consistency

LevelIntermediate

Duration90 mins

TopicCache Coherence

5 / 5

Coherence Protocols (MESI)

The Four-State Solution

What You Will Learn

MESI States Overview

Each cache line in a MESI-based cache is tagged with one of four states. These states encode both the line's relationship to memory and to other caches in the system.

MESI Cache Line States
State	Meaning	Memory Match?	Other Caches?	Can Write?
M (Modified)	This cache has the ONLY copy, and it's been modified	No (stale)	No copies	Yes (already exclusive)
E (Exclusive)	This cache has the ONLY copy, unmodified	Yes (clean)	No copies	Yes (silently → M)
S (Shared)	This cache has a copy, others may too	Yes (clean)	Maybe	No (must invalidate first)
I (Invalid)	This cache line is not valid/empty	N/A	N/A	No (must fetch first)

Key Insights:

Modified (M):

The cache line has been written since being loaded
This cache is the ONLY valid copy in the entire system
Memory is stale—contains old data
The cache "owns" this line and must supply it to any requester
On eviction, MUST write back to memory

Exclusive (E):

The cache line matches memory content
This cache is the ONLY cache with a copy
Can transition to Modified without any bus transaction (silent upgrade)
This is the "optimization state"—avoids invalidations when writing
On eviction, can simply discard (memory is valid)

Shared (S):

The cache line matches memory content
Other caches MAY have copies (we don't know for sure after evictions)
Read-only access—cannot write without first invalidating other copies
Multiple Shared copies can coexist
On eviction, can simply discard

Invalid (I):

The cache line contains no valid data (or garbage)
May be genuinely empty or may have been invalidated
Any access requires fetching from memory or another cache
The "base state" for unused cache lines

The Value of Exclusive State

State Transitions

MESI state transitions are triggered by two types of events:

Processor Events: Reads and writes by the local processor
Bus Events (Snooped): Requests from other processors observed on the bus

Let's examine all transitions in detail:

Converting Mermaid diagram...

Bus Transaction Types:

BusRd: Read request—processor wants to read, needs data
BusRdX: Read-exclusive request—processor wants to write, needs exclusive access
BusUpgr: Upgrade request—already have data (Shared), just need exclusive access
Flush: Supply data to bus (from Modified state), may or may not write to memory

Transitions from Invalid State
Event	Next State	Bus Action	Explanation
Processor Read	E or S	BusRd	Fetch line. If no other cache has it, go to E. If shared, go to S.
Processor Write	M	BusRdX	Fetch line with exclusive access, then write. Go directly to M.
Snooped BusRd	I	None	Line not present, ignore.
Snooped BusRdX	I	None	Line not present, ignore.

Worked Examples

Let's trace through common scenarios to see MESI in action.

Example 1: Single-Writer PatternOne processor reads and writes to a location, no other processor touches it:

Input

Output

Example 2: Reader-Writer Pattern (Producer-Consumer)P0 writes, then P1 reads:

Input

Output

Example 3: Ping-Pong (Contention Pattern)P0 and P1 alternately write to the same location:

Input

Output

Avoid Ping-Pong Patterns

MOESI Protocol (AMD Extension)

AMD processors use the MOESI protocol, which adds an Owned (O) state to MESI. This optimization improves performance for certain sharing patterns.

The Owned State
Property	Description
Definition	This cache has a modified (dirty) copy, but OTHER caches also have (Shared) copies
Memory Status	Memory is STALE (not updated)
Other Caches	May have Shared (read-only) copies
Responsibility	This cache must supply data on BusRd requests
On Eviction	MUST write back to memory (since memory is stale)

Why Owned State Helps:

In standard MESI, when a Modified line is shared:

P0 has Modified
P1 reads (BusRd)
P0 must: supply data to P1, write back to memory, become Shared
Both P0 and P1 become Shared; memory is now current

With MOESI:

P0 has Modified
P1 reads (BusRd)
P0: supplies data to P1, becomes Owned (no memory write!)
P1 becomes Shared
Memory is still stale, but P0 is responsible for supplying data

Benefit: Avoids memory write when going from Modified to shared scenario. If P1 was just doing a temporary read, we saved a memory write.

State Transitions with Owned:

moesi_transitions.txt

Text

MOESI State Transitions (key differences from MESI):
 
Modified + Snooped BusRd:
  MESI:  M → S (with memory write-back)
  MOESI: M → O (NO memory write-back, this cache becomes "Owner")
 
Owned State Behaviors:
  Owner + PrRd         → O (hit, supply data)
  Owner + PrWr         → M (become Modified, invalidate Shared copies)
  Owner + BusRd        → O (supply data to requester, stay Owner)
  Owner + BusRdX       → I (supply data, become Invalid, requester becomes M)
  Owner + Eviction     → Write to memory! (memory was stale)
 
Shared State (in MOESI context):
  If another cache is Owner, Shared caches don't need to supply data
  The Owner is responsible for supplying data
 
Benefit Summary:
- Reduces memory traffic when Modified data is read by others
- Memory is only updated when the Owner is evicted or another writer appears
- Common case: P0 modifies, P1 reads once, P0 continues modifying
  - With MESI: memory written on P1's read
  - With MOESI: no memory write needed

AMD's MOESI Advantage

MESIF Protocol (Intel Extension)

The Forward State
Property	Description
Definition	This cache has a clean shared copy AND is responsible for responding to BusRd requests
Memory Status	Memory is CURRENT (not stale)
Other Caches	May have Shared copies (but they won't respond to snoops)
Responsibility	This cache (and only this cache among Shared copies) responds to BusRd
On Eviction	Can discard; another Shared copy becomes Forward

Why Forward State Helps:

The Problem with Standard MESI Shared State:

When multiple caches have a line in Shared state and a new request arrives:

All Shared caches could respond → wasteful, potential collisions
Only memory responds → slow (memory is slower than cache-to-cache transfer)
Arbitration needed → complex, adds latency

MESIF Solution:

Exactly ONE cache has the Forward state
Other sharers have Shared state (and do NOT respond to snoops)
Forward cache (or memory, if no Forward) responds to BusRd

Forward State Assignment:

When a cache fetches a line and receives it from memory → becomes F
When a Forward cache supplies data to another cache → recipient becomes F, supplier becomes S
This "passes the responsibility" to the most recent reader

Intuition: The most recent reader is likely the one with the most "active" interest in this line, so it should be the one to respond to future requests.

MESIF in ActionThree processors reading the same data:

Input

Output

MOESI vs MESIF: Different Optimizations

Protocol Implementation Details

Implementing a cache coherence protocol involves significant hardware complexity. Key implementation considerations include:

Implementation Considerations

•Snoop Filtering: In systems with many cores, having every cache snoop every transaction is expensive. Snoop filters (or directory-based hints) reduce unnecessary snoops.
•Transaction Ordering: Bus transactions must be ordered consistently. All caches must see transactions in the same order to maintain coherence.
•Atomic Transitions: State transitions and data transfers must be atomic. Between deciding to transition and completing the transition, the state must be protected.
•Deadlock Avoidance: With multiple outstanding requests, circular dependencies can occur. Protocol design must prevent deadlock.
•Retry Mechanisms: If a transaction cannot complete (e.g., target busy), the protocol needs retry mechanisms without blocking the entire system.

The Role of the Bus (or Network):

Classic snooping protocols assume a shared bus with specific properties:

Broadcast: All caches see all transactions
Atomic: Only one transaction at a time (serialization point)
Ordered: All caches see transactions in the same order

Handling Transient States:

Real implementations have additional transient states for in-progress operations:

I-S: Invalid, waiting for data to become Shared
S-M: Shared, waiting for invalidation acknowledgments to become Modified
M-S: Modified, in process of responding to snoop (becoming Shared)
I-M: Invalid, waiting for data and acks to become Modified

mesi_state_machine.pseudo

Text

// Simplified MESI cache controller state machine (pseudocode)
 
enum State { I, S, E, M };
enum BusOp { BusRd, BusRdX, BusUpgr, Flush };
 
// Handle processor request
function handleProcessorRequest(addr, isWrite) {
    line = findCacheLine(addr);
    
    if (line.state == State.I) {
        if (isWrite) {
            busRequest(BusRdX, addr);     // Get with exclusive intent
            wait for data;
            line.state = State.M;
        } else {
            busRequest(BusRd, addr);       // Get shared
            wait for data;
            if (noOtherSharers) {
                line.state = State.E;      // We're the only one
            } else {
                line.state = State.S;      // Others have copies
            }
        }
    } else if (line.state == State.S) {
        if (isWrite) {
            busRequest(BusUpgr, addr);     // Already have data, just need exclusive
            wait for invalidation acks;
            line.state = State.M;
        }
        // Read hit: no action needed
    } else if (line.state == State.E) {
        if (isWrite) {
            line.state = State.M;          // Silent upgrade!
        }
        // Read hit: no action needed
    } else if (line.state == State.M) {
        // Read or write hit: no action needed
    }
}
 
// Handle snooped bus request
function handleSnoopedRequest(addr, op, requestor) {
    line = findCacheLine(addr);
    if (line == null || line.state == State.I) return;  // Not our concern
    
    switch (op) {
        case BusRd:
            if (line.state == State.M) {
                supplyData(requestor);     // We have dirty data
                writeBackToMemory();       // Or just supply to requestor
                line.state = State.S;
            } else if (line.state == State.E) {
                supplyData(requestor);
                line.state = State.S;
            }
            // Shared: may or may not supply (depends on protocol variant)
            break;
            
        case BusRdX:
            if (line.state == State.M) {
                supplyData(requestor);
                writeBackToMemory();
            } else if (line.state == State.E) {
                supplyData(requestor);
            }
            // In all cases, invalidate our copy
            line.state = State.I;
            break;
            
        case BusUpgr:
            // Someone with Shared wants exclusive
            line.state = State.I;
            break;
    }
}

Protocol Verification and Correctness

Cache coherence protocols are notoriously difficult to design correctly. A subtle bug can cause data corruption that's nearly impossible to debug. Formal verification is essential.

Common Protocol Bugs

•Race Conditions: Two caches simultaneously request exclusive access; protocol must serialize correctly.
•Stale Data Reads: A window exists where a cache can read stale data before invalidation arrives.
•Lost Updates: If invalidation acks are lost or mishandled, writes can be lost.
•Livelock: Caches repeatedly interfere with each other's progress without making forward progress.
•Deadlock: Circular dependencies in resource acquisition cause system hang.
•Incorrect State Encoding: With only 2 bits for 4 states, bit flips can cause silent corruption.

Verification Approaches:

1. Formal Methods (Model Checking):

Express protocol as a state machine
Use tools like Murphi, TLA+, or SPIN to explore all possible states
Prove properties like "no two caches have Modified simultaneously"
Industry standard for critical protocols

2. Runtime Monitoring:

Add hardware checks for invariant violations
Assert: if (state == Modified && otherHasLine) PANIC
Expensive in silicon but catches bugs during validation

3. Simulation:

Run millions of random multi-threaded test sequences
Check memory consistency at checkpoints
Can find bugs but cannot prove absence of bugs

The SWMR Invariant as Verification Target:

The Single-Writer-Multiple-Reader invariant must hold at all times:

Invariant: For any address A at time T:
  (CountModified(A) == 1 && CountShared(A) == 0) ||
  (CountModified(A) == 0 && CountExclusive(A) <= 1)

If this invariant is violated, even momentarily, data corruption can result. Verification tools exhaustively check that no sequence of operations can violate this invariant.

Real-World Protocol Bugs

Summary: MESI and Variants

We've covered the MESI protocol and its extensions in depth. Let's consolidate:

Key Takeaways

•MESI has four states — Modified (dirty, exclusive), Exclusive (clean, exclusive), Shared (clean, multiple copies), Invalid (not valid).
•Exclusive enables silent upgrades — Writing to Exclusive data becomes Modified with no bus transaction.
•Transitions are triggered by processor ops and snooped bus ops — The protocol responds to both local and remote events.
•MOESI adds Owned state — Allows sharing dirty data without immediately writing to memory. AMD uses this.
•MESIF adds Forward state — Designates one Shared cache to respond to reads, avoiding collisions. Intel uses this.
•Ping-pong patterns are expensive — Multiple writers to the same line cause continuous coherence traffic.
•Implementation is complex — Transient states, atomicity, ordering, and deadlock avoidance require careful design.
•Verification is critical — Formal methods and exhaustive testing are required to ensure correctness.

Module Complete:

This concludes our deep dive into cache coherence. You now understand:

Why caches exist and how they're organized (cache hierarchy, cache lines)
How writes are handled (write-through, write-back)
Why coherence is necessary (the coherence problem)
How coherence is maintained (MESI, MOESI, MESIF protocols)

This knowledge is fundamental for understanding operating system kernel development, writing high-performance parallel code, and reasoning about memory system behavior in modern multiprocessors.

Module Complete!

5 / 5