Operating SystemsCritical Section Problem

The Critical Section Problem

LevelIntermediate

Duration60 mins

TopicCritical Section Problem

1 / 5

Critical Section Definition

The Code That Must Not Be Interrupted

In the realm of concurrent programming, there exists a class of code so sensitive that allowing multiple threads or processes to execute it simultaneously can corrupt data, crash systems, or produce results that violate every invariant your program depends upon. This code isn't inherently special—it doesn't contain magic or privileged instructions. What makes it dangerous is context: the fact that it accesses shared resources that multiple execution contexts might touch concurrently.

This sensitive code region has a name that has become foundational to computer science: the critical section. Understanding what a critical section is, why it matters, and how to protect it is the first step toward mastering synchronization—and by extension, mastering concurrent programming itself.

What You Will Learn

By the end of this page, you will understand the precise definition of a critical section, comprehend why certain code regions become critical, and recognize critical sections in real-world code. You'll develop the mental model that underlies all synchronization mechanisms—from simple locks to sophisticated concurrent data structures.

Formal Definition of a Critical Section

Let us begin with a precise, formal definition that will serve as our reference throughout this module and beyond.

Definition: A critical section (abbreviated CS) is a segment of code that accesses shared resources—such as shared memory, files, I/O devices, or any mutable state accessible by multiple threads or processes—where the execution of this code by multiple entities simultaneously could lead to a race condition.

The term "critical" captures the essence of the problem: this code region is critical to maintain correctness because the shared resources it manipulates must remain in a consistent state. If two threads simultaneously execute their critical sections that modify the same shared variable, the final value of that variable becomes unpredictable—a situation that violates the fundamental expectation that sequential programs produce deterministic results.

Historical Perspective

The term 'critical section' was formalized by Edsger W. Dijkstra in his seminal 1965 paper 'Solution of a Problem in Concurrent Programming Control.' This paper not only introduced the terminology but also presented the first known correct software solution to the mutual exclusion problem for two processes. The conceptual framework Dijkstra established remains the foundation of concurrent programming theory today.

The Defining Characteristics:

A code region qualifies as a critical section when it exhibits all of the following characteristics:

Accesses Shared State: The code reads from or writes to memory locations, files, or resources that other concurrent entities can also access.
Ordering Sensitivity: The outcome depends on the relative timing or ordering of operations from different execution contexts. If operations could interleave in different ways, producing different results, the region is critical.
Non-Atomic Execution: The operations within the region are not inherently atomic—they can be interrupted, and other threads can execute between any two instructions.
Invariant Dependency: The code assumes certain invariants about the shared state that could be violated if another thread modifies that state mid-execution.

What Makes a Section 'Critical'

•Shared mutable state access — The code touches data that other threads can read or write concurrently
•Non-atomic compound operations — Multiple machine instructions combine to form a logical operation (read-modify-write)
•Temporal coupling — The correctness depends on operations completing without interruption by related operations from other threads
•Invariant establishment or reliance — The code either establishes or relies upon relationships between multiple data items

The Anatomy of Concurrent Access

To truly understand critical sections, we must examine what happens at the machine level when multiple threads access shared data. Consider the deceptively simple operation of incrementing a shared counter:

counter = counter + 1

This single line of high-level code is not atomic. On virtually all processor architectures, it translates to at least three machine-level operations:

increment_assembly.asm
1
2
3
4
5
6
7
8
9
; Assembly representation of counter = counter + 1
; Assume counter is stored at memory address [counter_addr]
 
MOV  EAX, [counter_addr]    ; Step 1: LOAD - Read current value into register
ADD  EAX, 1                 ; Step 2: MODIFY - Increment the register value
MOV  [counter_addr], EAX    ; Step 3: STORE - Write the new value back to memory
 
; Between any of these instructions, another thread could execute
; This is the 'critical' window where interleaving causes problems

The Interleaving Problem:

Now imagine two threads, T1 and T2, both attempting to increment this counter when its initial value is 0. In a correct execution, doing two increments should yield a final value of 2. But observe what happens with unfortunate interleaving:

Interleaved Execution Leading to Lost Update
Step	Thread T1	Thread T2	Register T1	Register T2	Memory [counter]
Initial	—	—	—	—	0
1	LOAD counter	—	0	—	0
2	—	LOAD counter	0	0	0
3	ADD 1	—	1	0	0
4	—	ADD 1	1	1	0
5	STORE counter	—	1	1	1
6	—	STORE counter	1	1	1
Final	Done	Done	—	—	1 (should be 2!)

Both threads read the initial value of 0, both compute 0+1=1, and both write 1 back. One increment is completely lost. This is the canonical example of a lost update race condition, and it illustrates precisely why the increment operation is a critical section.

Key Insight: The critical section isn't just the STORE instruction—it's the entire LOAD-MODIFY-STORE sequence. These three operations form an indivisible logical unit that, from a correctness perspective, must appear to execute atomically. Any interleaving that allows another thread to execute between these steps violates the expected semantics.

The Granularity Illusion

High-level programming languages hide the multi-instruction nature of seemingly simple operations. The statement 'x++' looks atomic but isn't. Even 'x = y' can be non-atomic if x and y are larger than the machine's word size (e.g., a 64-bit assignment on a 32-bit architecture). Never assume atomicity—verify it at the machine level or use explicit synchronization.

Identifying Critical Sections in Practice

In real-world codebases, critical sections don't come labeled. Identifying them requires understanding what constitutes shared state and how different threads interact with it. Let's develop a systematic approach to recognizing critical sections.

The Identification Process:

Map Shared Resources: Identify all data structures, variables, files, and resources that multiple threads can access. This includes global variables, heap-allocated objects passed between threads, and any system resources.
Trace Access Patterns: For each shared resource, determine which operations threads perform—reads, writes, or read-modify-write operations.
Analyze Interleaving Scenarios: Consider what happens if operations from different threads interleave. If any interleaving produces an incorrect or inconsistent result, you've found a critical section.
Consider Compound Operations: Even if individual accesses are atomic, sequences of accesses that must remain consistent form critical sections.

critical_section_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// EXAMPLE 1: Classic shared counter (Critical Section)
int shared_counter = 0;
 
void increment_counter(void) {
    // ====== CRITICAL SECTION START ======
    shared_counter++;  // Not atomic: LOAD -> ADD -> STORE
    // ====== CRITICAL SECTION END ======
}
 
// EXAMPLE 2: Linked list insertion (Critical Section)
struct Node* head = NULL;
 
void insert_at_head(int value) {
    struct Node* new_node = malloc(sizeof(struct Node));
    new_node->data = value;
    
    // ====== CRITICAL SECTION START ======
    new_node->next = head;  // Read head
    head = new_node;        // Write head
    // Both operations must complete without another thread inserting
    // ====== CRITICAL SECTION END ======
}
 
// EXAMPLE 3: Bank account transfer (Critical Section spans objects)
void transfer(Account* from, Account* to, int amount) {
    // ====== CRITICAL SECTION START ======
    // Both accounts must be modified atomically as a unit
    if (from->balance >= amount) {
        from->balance -= amount;  // Debit source
        to->balance += amount;    // Credit destination
    }
    // If interrupted between these operations, money is lost or created
    // ====== CRITICAL SECTION END ======
}
 
// EXAMPLE 4: Condition check and action (Check-then-act pattern)
void lazy_init(Object** obj_ptr) {
    // ====== CRITICAL SECTION START ======
    if (*obj_ptr == NULL) {           // Check
        *obj_ptr = create_object();   // Act
    }
    // Another thread could also see NULL and create duplicate object
    // ====== CRITICAL SECTION END ======
}

Common Critical Section Patterns

•Read-Modify-Write (RMW) — Reading a value, computing a new value from it, and writing back. Examples: counters, accumulators, toggles.
•Check-Then-Act — Testing a condition and taking action based on the result. Examples: lazy initialization, conditional updates, bounds checking.
•Compound Actions — Multiple distinct operations that must appear atomic. Examples: push/pop on stacks, enqueue/dequeue, list modifications.
•Invariant Maintenance — Operations that temporarily break an invariant before restoring it. Examples: tree rotations, list splicing, object state transitions.

The Invariant Perspective

A powerful way to identify critical sections is to think about invariants—relationships between data items that must always hold. Any code that temporarily violates an invariant while modifying data is a critical section, because another thread observing the intermediate state would see a broken invariant.

Critical Sections vs. Atomic Operations

A common source of confusion is the relationship between critical sections and atomic operations. Understanding this distinction is essential for making correct design decisions.

Atomic Operations:

An atomic operation is one that, from the perspective of other threads, either has not started or has completely finished—there is no observable intermediate state. Atomicity can be provided by:

Hardware Support: Many processors provide atomic instructions like Compare-And-Swap (CAS), Test-And-Set (TAS), Load-Linked/Store-Conditional (LL/SC), and atomic read-modify-write operations on single memory locations.
Language Guarantees: Some languages guarantee atomicity for certain operations. For example, in Java, reads and writes of reference types and most primitive types (except long and double in older specifications) are atomic.

The Distinction:

Atomic operations are building blocks that can be used to protect critical sections, but they are not the same thing. The critical section is the code that needs protection; atomic operations (or higher-level synchronization primitives built upon them) provide that protection.

Atomic Operations

•Indivisible at hardware level
•Limited to simple operations
•Usually operate on single memory location
•Guaranteed by hardware or language
•Cannot arbitrary code sequences
•Very fast (single instruction)
•Examples: CAS, atomic increment

Critical Sections

•Conceptual region of code
•Can contain any operations
•May span multiple memory locations
•Requires explicit protection mechanisms
•Can encapsulate arbitrary logic
•Duration depends on code complexity
•Examples: database transactions, object updates

When Atomic Operations Are Sufficient:

If your critical section consists of exactly one operation that has an atomic counterpart (e.g., incrementing a counter), you can use the atomic operation directly and eliminate the need for locks entirely:

// Instead of this (requires locking):
lock(counter_lock);
counter++;
unlock(counter_lock);

// Use this (lock-free):
atomic_fetch_add(&counter, 1);

When Atomic Operations Are Insufficient:

Atomic operations cannot help when:

The critical section involves multiple memory locations that must be updated atomically as a group
The critical section contains conditional logic, loops, or complex computations
The relationship between multiple variables must remain consistent

In these cases, you need synchronization primitives (locks, semaphores, monitors) that serialize access to the entire critical section.

Lock-Free is Not Wait-Free

Using atomic operations to build 'lock-free' algorithms doesn't eliminate the critical section—it changes how exclusion is achieved. The code still has regions that must execute without interference; the difference is in the mechanism (optimistic concurrency with atomic instructions vs. pessimistic concurrency with locks). Both approaches solve the critical section problem; they just offer different performance and starvation characteristics.

Scope and Granularity of Critical Sections

One of the most important practical decisions in concurrent programming is determining the appropriate scope and granularity of critical sections. This decision profoundly affects both correctness and performance.

Scope: The scope of a critical section refers to what shared resources it protects and what operations it encompasses.

Granularity: The granularity refers to how much code (and time) is spent within the critical section.

Critical Section Granularity Trade-offs
Granularity	Advantages	Disadvantages	Use Cases
Coarse-Grained (Large CS)	Simpler reasoning Fewer synchronization points Less deadlock risk	Poor concurrency Long wait times Resource underutilization	Simple applications Low contention scenarios Legacy code maintenance
Fine-Grained (Small CS)	Better concurrency Shorter wait times Higher throughput	Complex reasoning More synchronization calls Higher deadlock risk	High-performance systems High contention scenarios Scalable data structures
Medium-Grained	Balanced approach Reasonable concurrency Manageable complexity	May not optimize either Requires careful analysis	Most production systems General-purpose code

The Goldilocks Principle:

Critical sections should be as large as necessary for correctness, but no larger than required. Consider this example:

granularity_comparison.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// TOO COARSE: Entire function is synchronized
void process_item(Item* item) {
    lock(global_lock);
    // ====== CRITICAL SECTION (unnecessarily large) ======
    validate(item);                    // Could be done outside CS
    result = compute(item);            // Expensive, no shared state
    update_shared_statistics(result);  // Actually needs synchronization
    log_result(result);                // Could be done outside CS
    // ======================================================
    unlock(global_lock);
}
 
// TOO FINE: Misses compound operation requirement
void transfer(Account* from, Account* to, int amount) {
    lock(from->lock);
    from->balance -= amount;  // CS for 'from'
    unlock(from->lock);
    
    lock(to->lock);
    to->balance += amount;    // Separate CS for 'to'
    unlock(to->lock);
    // INCORRECT! If crash occurs between unlocks, money disappears
}
 
// JUST RIGHT: Minimal but complete critical section
void process_item_optimized(Item* item) {
    validate(item);                    // No shared state - outside CS
    result = compute(item);            // Expensive but thread-local
    
    lock(stats_lock);
    // ====== CRITICAL SECTION (minimal) ======
    update_shared_statistics(result);  // Only this needs protection
    // =========================================
    unlock(stats_lock);
    
    log_result(result);                // Thread-safe logger - outside CS
}

Beware of Lock Contention

Every nanosecond spent in a critical section is a nanosecond that other threads may be blocked waiting. If your critical section contains expensive operations (I/O, complex computations, memory allocation), you're serializing work that could potentially run in parallel. Profile your locks and minimize time spent holding them.

The Critical Section and System Correctness

Understanding the critical section isn't merely an academic exercise—it's the foundation upon which correct concurrent systems are built. Let us examine why this concept is so central to system correctness.

Correctness Properties:

When we say a concurrent program is "correct," we typically mean it satisfies two categories of properties:

1. Safety Properties (Nothing bad happens):

Data invariants are never violated
Shared resources remain in consistent states
Race conditions do not produce incorrect results
The program produces the same results as an equivalent sequential execution

2. Liveness Properties (Something good eventually happens):

Every thread that wants to execute its critical section eventually does so
The system makes progress and doesn't deadlock
No thread is permanently denied access (starvation-free)

What Proper Critical Section Handling Guarantees

•Mutual Exclusion — At most one thread executes the critical section at any time, preventing simultaneous access to shared resources
•State Consistency — Other threads never observe an intermediate, inconsistent state during a multi-step update
•Atomicity Illusion — Complex operations appear to execute instantaneously from the perspective of other threads
•Invariant Preservation — Relationships between data items remain valid at all observable points
•Predictable Behavior — Despite concurrent execution, the system behaves as if operations occurred in some sequential order

The Consequences of Failure:

When critical sections are not properly protected, the failures are often insidious:

Data Corruption: Partial updates leave data structures in invalid states. A linked list might have a cycle; a tree might have orphaned nodes; a database might violate referential integrity.
Heisenbugs: Race conditions produce bugs that appear and disappear based on timing, load, or the presence of debugging tools. Adding a print statement or running under a debugger might hide the bug by changing timing.
Security Vulnerabilities: Race conditions can be exploited. Time-of-check to time-of-use (TOCTOU) vulnerabilities let attackers manipulate checked conditions before they're acted upon.
Non-Reproducible Failures: The same input can produce different outputs depending on thread scheduling. Unit tests pass, but production systems fail intermittently.

The Therac-25 Lesson

The Therac-25 radiation therapy machine incidents (1985-1987) remain one of the most cited examples of race condition consequences. Due to a race condition in the control software, patients received radiation doses up to 100 times the intended amount, resulting in deaths and serious injuries. This tragedy underscores that critical section failures in safety-critical systems can have fatal consequences.

Visualizing Critical Sections

A diagram can crystallize the understanding of critical sections. Consider the classic model of a process executing code that involves shared resources:

Converting Mermaid diagram...

The Four Sections Model:

Every process or thread that needs to access shared resources cycles through four logical sections:

Entry Section: Code that requests permission to enter the critical section. This is where synchronization logic resides—acquiring locks, testing conditions, or executing atomic operations to gain exclusive access.
Critical Section: The actual code that accesses shared resources. This must be executed exclusively—no other process should be in its critical section simultaneously (for the same shared resource).
Exit Section: Code that relinquishes the exclusive access obtained in the entry section. This typically involves releasing locks and potentially waking up waiting processes.
Remainder Section: Everything else the process does that doesn't involve shared resources. This code can execute concurrently with any other code—it requires no synchronization.

Key Observations:

The entry and exit sections are part of the solution, not the problem. They exist solely to protect the critical section.
Multiple processes can be in their remainder sections simultaneously—that's perfectly safe.
The critical section should be as small as possible while still encompassing all operations that must be atomic with respect to other processes.
A process might have multiple critical sections for different shared resources with different protection mechanisms.

Summary: Understanding Critical Sections

We have established the foundational understanding of what critical sections are and why they matter. Let's consolidate the essential concepts:

Key Takeaways

•A critical section is code that accesses shared resources — Any code region that reads or modifies data accessible by multiple threads is potentially critical and may require synchronization.
•The danger is interleaving, not the code itself — The same code that works perfectly in a single-threaded context becomes dangerous when multiple threads can interleave their operations.
•Critical sections must execute atomically (in appearance) — Other threads must not observe intermediate states; from their perspective, the critical section either hasn't started or has completed.
•Granularity is a key design decision — Critical sections should be large enough for correctness but small enough for performance. Finding this balance is a core skill.
•Atomic operations are building blocks, not solutions themselves — They can replace simple critical sections but cannot protect complex, multi-step operations.
•Failure to protect critical sections causes insidious bugs — Race conditions produce Heisenbugs, data corruption, security vulnerabilities, and in safety-critical systems, potentially fatal failures.

What's Next:

Now that we understand what a critical section is, we need to examine the mechanisms that control access to it. The next page explores the entry section—the code that executes before a process enters its critical section. This is where the art and science of synchronization begins: how does a process request permission to enter, and how does it know when that permission is granted?

We will see that designing correct entry section logic is surprisingly subtle, and doing it wrong leads to violations of mutual exclusion, deadlock, or starvation.

Page Complete

You now understand the formal definition and conceptual framework of critical sections—the fundamental unit of synchronization in concurrent programming. This knowledge forms the bedrock for understanding all synchronization primitives and concurrent algorithms. Next, we explore how to guard entry into these critical regions.

1 / 5

Loading learning content...

Operating SystemsCritical Section Problem

The Critical Section Problem

LevelIntermediate

Duration60 mins

TopicCritical Section Problem

1 / 5

Critical Section Definition

The Code That Must Not Be Interrupted

What You Will Learn

Formal Definition of a Critical Section

Let us begin with a precise, formal definition that will serve as our reference throughout this module and beyond.

Historical Perspective

The Defining Characteristics:

A code region qualifies as a critical section when it exhibits all of the following characteristics:

Accesses Shared State: The code reads from or writes to memory locations, files, or resources that other concurrent entities can also access.
Ordering Sensitivity: The outcome depends on the relative timing or ordering of operations from different execution contexts. If operations could interleave in different ways, producing different results, the region is critical.
Non-Atomic Execution: The operations within the region are not inherently atomic—they can be interrupted, and other threads can execute between any two instructions.
Invariant Dependency: The code assumes certain invariants about the shared state that could be violated if another thread modifies that state mid-execution.

What Makes a Section 'Critical'

•Shared mutable state access — The code touches data that other threads can read or write concurrently
•Non-atomic compound operations — Multiple machine instructions combine to form a logical operation (read-modify-write)
•Temporal coupling — The correctness depends on operations completing without interruption by related operations from other threads
•Invariant establishment or reliance — The code either establishes or relies upon relationships between multiple data items

The Anatomy of Concurrent Access

counter = counter + 1

This single line of high-level code is not atomic. On virtually all processor architectures, it translates to at least three machine-level operations:

increment_assembly.asm
1
2
3
4
5
6
7
8
9
; Assembly representation of counter = counter + 1
; Assume counter is stored at memory address [counter_addr]
 
MOV  EAX, [counter_addr]    ; Step 1: LOAD - Read current value into register
ADD  EAX, 1                 ; Step 2: MODIFY - Increment the register value
MOV  [counter_addr], EAX    ; Step 3: STORE - Write the new value back to memory
 
; Between any of these instructions, another thread could execute
; This is the 'critical' window where interleaving causes problems

The Interleaving Problem:

Interleaved Execution Leading to Lost Update
Step	Thread T1	Thread T2	Register T1	Register T2	Memory [counter]
Initial	—	—	—	—	0
1	LOAD counter	—	0	—	0
2	—	LOAD counter	0	0	0
3	ADD 1	—	1	0	0
4	—	ADD 1	1	1	0
5	STORE counter	—	1	1	1
6	—	STORE counter	1	1	1
Final	Done	Done	—	—	1 (should be 2!)

The Granularity Illusion

Identifying Critical Sections in Practice

The Identification Process:

Map Shared Resources: Identify all data structures, variables, files, and resources that multiple threads can access. This includes global variables, heap-allocated objects passed between threads, and any system resources.
Trace Access Patterns: For each shared resource, determine which operations threads perform—reads, writes, or read-modify-write operations.
Analyze Interleaving Scenarios: Consider what happens if operations from different threads interleave. If any interleaving produces an incorrect or inconsistent result, you've found a critical section.
Consider Compound Operations: Even if individual accesses are atomic, sequences of accesses that must remain consistent form critical sections.

critical_section_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// EXAMPLE 1: Classic shared counter (Critical Section)
int shared_counter = 0;
 
void increment_counter(void) {
    // ====== CRITICAL SECTION START ======
    shared_counter++;  // Not atomic: LOAD -> ADD -> STORE
    // ====== CRITICAL SECTION END ======
}
 
// EXAMPLE 2: Linked list insertion (Critical Section)
struct Node* head = NULL;
 
void insert_at_head(int value) {
    struct Node* new_node = malloc(sizeof(struct Node));
    new_node->data = value;
    
    // ====== CRITICAL SECTION START ======
    new_node->next = head;  // Read head
    head = new_node;        // Write head
    // Both operations must complete without another thread inserting
    // ====== CRITICAL SECTION END ======
}
 
// EXAMPLE 3: Bank account transfer (Critical Section spans objects)
void transfer(Account* from, Account* to, int amount) {
    // ====== CRITICAL SECTION START ======
    // Both accounts must be modified atomically as a unit
    if (from->balance >= amount) {
        from->balance -= amount;  // Debit source
        to->balance += amount;    // Credit destination
    }
    // If interrupted between these operations, money is lost or created
    // ====== CRITICAL SECTION END ======
}
 
// EXAMPLE 4: Condition check and action (Check-then-act pattern)
void lazy_init(Object** obj_ptr) {
    // ====== CRITICAL SECTION START ======
    if (*obj_ptr == NULL) {           // Check
        *obj_ptr = create_object();   // Act
    }
    // Another thread could also see NULL and create duplicate object
    // ====== CRITICAL SECTION END ======
}

Common Critical Section Patterns

•Read-Modify-Write (RMW) — Reading a value, computing a new value from it, and writing back. Examples: counters, accumulators, toggles.
•Check-Then-Act — Testing a condition and taking action based on the result. Examples: lazy initialization, conditional updates, bounds checking.
•Compound Actions — Multiple distinct operations that must appear atomic. Examples: push/pop on stacks, enqueue/dequeue, list modifications.
•Invariant Maintenance — Operations that temporarily break an invariant before restoring it. Examples: tree rotations, list splicing, object state transitions.

The Invariant Perspective

Critical Sections vs. Atomic Operations

A common source of confusion is the relationship between critical sections and atomic operations. Understanding this distinction is essential for making correct design decisions.

Atomic Operations:

An atomic operation is one that, from the perspective of other threads, either has not started or has completely finished—there is no observable intermediate state. Atomicity can be provided by:

Hardware Support: Many processors provide atomic instructions like Compare-And-Swap (CAS), Test-And-Set (TAS), Load-Linked/Store-Conditional (LL/SC), and atomic read-modify-write operations on single memory locations.
Language Guarantees: Some languages guarantee atomicity for certain operations. For example, in Java, reads and writes of reference types and most primitive types (except long and double in older specifications) are atomic.

The Distinction:

Atomic Operations

•Indivisible at hardware level
•Limited to simple operations
•Usually operate on single memory location
•Guaranteed by hardware or language
•Cannot arbitrary code sequences
•Very fast (single instruction)
•Examples: CAS, atomic increment

Critical Sections

•Conceptual region of code
•Can contain any operations
•May span multiple memory locations
•Requires explicit protection mechanisms
•Can encapsulate arbitrary logic
•Duration depends on code complexity
•Examples: database transactions, object updates

When Atomic Operations Are Sufficient:

// Instead of this (requires locking):
lock(counter_lock);
counter++;
unlock(counter_lock);

// Use this (lock-free):
atomic_fetch_add(&counter, 1);

When Atomic Operations Are Insufficient:

Atomic operations cannot help when:

The critical section involves multiple memory locations that must be updated atomically as a group
The critical section contains conditional logic, loops, or complex computations
The relationship between multiple variables must remain consistent

In these cases, you need synchronization primitives (locks, semaphores, monitors) that serialize access to the entire critical section.

Lock-Free is Not Wait-Free

Scope and Granularity of Critical Sections

Scope: The scope of a critical section refers to what shared resources it protects and what operations it encompasses.

Granularity: The granularity refers to how much code (and time) is spent within the critical section.

Critical Section Granularity Trade-offs
Granularity	Advantages	Disadvantages	Use Cases
Coarse-Grained (Large CS)	Simpler reasoning Fewer synchronization points Less deadlock risk	Poor concurrency Long wait times Resource underutilization	Simple applications Low contention scenarios Legacy code maintenance
Fine-Grained (Small CS)	Better concurrency Shorter wait times Higher throughput	Complex reasoning More synchronization calls Higher deadlock risk	High-performance systems High contention scenarios Scalable data structures
Medium-Grained	Balanced approach Reasonable concurrency Manageable complexity	May not optimize either Requires careful analysis	Most production systems General-purpose code

The Goldilocks Principle:

Critical sections should be as large as necessary for correctness, but no larger than required. Consider this example:

granularity_comparison.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// TOO COARSE: Entire function is synchronized
void process_item(Item* item) {
    lock(global_lock);
    // ====== CRITICAL SECTION (unnecessarily large) ======
    validate(item);                    // Could be done outside CS
    result = compute(item);            // Expensive, no shared state
    update_shared_statistics(result);  // Actually needs synchronization
    log_result(result);                // Could be done outside CS
    // ======================================================
    unlock(global_lock);
}
 
// TOO FINE: Misses compound operation requirement
void transfer(Account* from, Account* to, int amount) {
    lock(from->lock);
    from->balance -= amount;  // CS for 'from'
    unlock(from->lock);
    
    lock(to->lock);
    to->balance += amount;    // Separate CS for 'to'
    unlock(to->lock);
    // INCORRECT! If crash occurs between unlocks, money disappears
}
 
// JUST RIGHT: Minimal but complete critical section
void process_item_optimized(Item* item) {
    validate(item);                    // No shared state - outside CS
    result = compute(item);            // Expensive but thread-local
    
    lock(stats_lock);
    // ====== CRITICAL SECTION (minimal) ======
    update_shared_statistics(result);  // Only this needs protection
    // =========================================
    unlock(stats_lock);
    
    log_result(result);                // Thread-safe logger - outside CS
}

Beware of Lock Contention

The Critical Section and System Correctness

Correctness Properties:

When we say a concurrent program is "correct," we typically mean it satisfies two categories of properties:

1. Safety Properties (Nothing bad happens):

Data invariants are never violated
Shared resources remain in consistent states
Race conditions do not produce incorrect results
The program produces the same results as an equivalent sequential execution

2. Liveness Properties (Something good eventually happens):

Every thread that wants to execute its critical section eventually does so
The system makes progress and doesn't deadlock
No thread is permanently denied access (starvation-free)

What Proper Critical Section Handling Guarantees

•Mutual Exclusion — At most one thread executes the critical section at any time, preventing simultaneous access to shared resources
•State Consistency — Other threads never observe an intermediate, inconsistent state during a multi-step update
•Atomicity Illusion — Complex operations appear to execute instantaneously from the perspective of other threads
•Invariant Preservation — Relationships between data items remain valid at all observable points
•Predictable Behavior — Despite concurrent execution, the system behaves as if operations occurred in some sequential order

The Consequences of Failure:

When critical sections are not properly protected, the failures are often insidious:

Data Corruption: Partial updates leave data structures in invalid states. A linked list might have a cycle; a tree might have orphaned nodes; a database might violate referential integrity.
Heisenbugs: Race conditions produce bugs that appear and disappear based on timing, load, or the presence of debugging tools. Adding a print statement or running under a debugger might hide the bug by changing timing.
Security Vulnerabilities: Race conditions can be exploited. Time-of-check to time-of-use (TOCTOU) vulnerabilities let attackers manipulate checked conditions before they're acted upon.
Non-Reproducible Failures: The same input can produce different outputs depending on thread scheduling. Unit tests pass, but production systems fail intermittently.

The Therac-25 Lesson

Visualizing Critical Sections

A diagram can crystallize the understanding of critical sections. Consider the classic model of a process executing code that involves shared resources:

Converting Mermaid diagram...

The Four Sections Model:

Every process or thread that needs to access shared resources cycles through four logical sections:

Entry Section: Code that requests permission to enter the critical section. This is where synchronization logic resides—acquiring locks, testing conditions, or executing atomic operations to gain exclusive access.
Critical Section: The actual code that accesses shared resources. This must be executed exclusively—no other process should be in its critical section simultaneously (for the same shared resource).
Exit Section: Code that relinquishes the exclusive access obtained in the entry section. This typically involves releasing locks and potentially waking up waiting processes.
Remainder Section: Everything else the process does that doesn't involve shared resources. This code can execute concurrently with any other code—it requires no synchronization.

Key Observations:

The entry and exit sections are part of the solution, not the problem. They exist solely to protect the critical section.
Multiple processes can be in their remainder sections simultaneously—that's perfectly safe.
The critical section should be as small as possible while still encompassing all operations that must be atomic with respect to other processes.
A process might have multiple critical sections for different shared resources with different protection mechanisms.

Summary: Understanding Critical Sections

We have established the foundational understanding of what critical sections are and why they matter. Let's consolidate the essential concepts:

Key Takeaways

•A critical section is code that accesses shared resources — Any code region that reads or modifies data accessible by multiple threads is potentially critical and may require synchronization.
•The danger is interleaving, not the code itself — The same code that works perfectly in a single-threaded context becomes dangerous when multiple threads can interleave their operations.
•Critical sections must execute atomically (in appearance) — Other threads must not observe intermediate states; from their perspective, the critical section either hasn't started or has completed.
•Granularity is a key design decision — Critical sections should be large enough for correctness but small enough for performance. Finding this balance is a core skill.
•Atomic operations are building blocks, not solutions themselves — They can replace simple critical sections but cannot protect complex, multi-step operations.
•Failure to protect critical sections causes insidious bugs — Race conditions produce Heisenbugs, data corruption, security vulnerabilities, and in safety-critical systems, potentially fatal failures.

What's Next:

We will see that designing correct entry section logic is surprisingly subtle, and doing it wrong leads to violations of mutual exclusion, deadlock, or starvation.

Page Complete

1 / 5