Operating SystemsCritical Section Problem

The Critical Section Problem

LevelIntermediate

Duration60 mins

TopicCritical Section Problem

4 / 5

Remainder Section

Beyond Critical Sections: The Remainder Section

We've studied the critical section, entry section, and exit section in detail. But these components represent only a fraction of what a typical process does. The vast majority of a process's execution time is often spent in the remainder section—the code that doesn't involve shared resources and can execute concurrently without any synchronization.

Understanding the remainder section is crucial for two reasons: first, it completes our conceptual model of process synchronization; second, and more practically, maximizing time spent in the remainder section relative to the critical section is the key to scalable concurrent performance.

What You Will Learn

By the end of this page, you will understand the formal definition of the remainder section, its role in process structure, why maximizing remainder section work is critical for performance, and how remainder section behavior affects the overall synchronization requirements.

Formal Definition of the Remainder Section

Definition: The remainder section is the portion of a process's code that executes after the exit section and before the next entry section. It encompasses all computation that does not require exclusive access to shared resources.

Formal Position in Process Structure:

while (true) {
    ENTRY SECTION      ← Acquire exclusive access
    CRITICAL SECTION   ← Access shared resources
    EXIT SECTION       ← Release access, notify waiters
    REMAINDER SECTION  ← Independent work (no synchronization needed)
}

The remainder section is characterized by three essential properties:

Remainder Section Characteristics

•No shared resource access — The code doesn't read or write any data that other processes might simultaneously access
•No synchronization requirements — The process doesn't need to wait for other processes and doesn't hold any locks
•Full concurrency permitted — Any number of processes can be in their remainder sections simultaneously
•No impact on mutual exclusion — Remainder section code cannot cause race conditions with other processes
•Independent execution — The code can be interrupted or delayed without affecting shared state correctness

The Importance of 'Remainder'

The term 'remainder' emphasizes that this is the 'rest of the work' that doesn't fit into the critical section. In well-designed concurrent programs, the remainder section should constitute the majority of execution time—if most work requires synchronization, concurrency provides little benefit.

What Belongs in the Remainder Section

Not all code naturally falls into the critical section or remainder section categories. Understanding what can safely be moved to the remainder section is a critical design skill.

Candidates for the Remainder Section:

Code Categories and Their Section Placement
Code Type	Can Be in Remainder?	Rationale
Local variable computation	✅ Yes	Thread-local data cannot cause races
Pure functions with immutable inputs	✅ Yes	No side effects, no shared state
I/O to private resources	✅ Yes	Thread-specific files, sockets
Sleeping/yielding	✅ Yes	Reduces contention when no CS work needed
Reads of immutable shared data	✅ Yes	Data never changes, no race possible
Reads of shared mutable data	⚠️ Sometimes	Only if stale data is acceptable
Writes to thread-local copies	✅ Yes	Changes visible only to current thread
Writes to shared data	❌ No	Must be in critical section
Check-then-act on shared data	❌ No	Race between check and act
Complex invariant maintenance	❌ No	Invariant might be observed mid-update

remainder_section_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Example: Optimizing by expanding the remainder section
 
// BEFORE: Excessive code in critical section
void process_request_bad(Request* req) {
    mutex_lock(&processing_mutex);
    // ====== CRITICAL SECTION (too large) ======
    validate_request(req);           // Pure computation - no shared state!
    parse_request(req);              // Pure computation - no shared state!
    Response resp = compute_response(req);  // Expensive, but no shared state!
    update_statistics(&global_stats, req);  // Needs synchronization
    write_to_log(req, resp);         // Could use thread-local buffer!
    // ===========================================
    mutex_unlock(&processing_mutex);
    
    send_response(req->client, resp);
}
// Problem: Most work doesn't need the lock!
 
// AFTER: Minimized critical section, expanded remainder section
void process_request_good(Request* req) {
    // ====== REMAINDER SECTION (before CS) ======
    // All this work is thread-local, runs while others work
    validate_request(req);           // Pure function
    parse_request(req);              // Pure function
    Response resp = compute_response(req);  // Expensive but local
    
    // ====== ENTRY SECTION ======
    mutex_lock(&processing_mutex);
    
    // ====== CRITICAL SECTION (minimal) ======
    update_statistics(&global_stats, req);  // Only this needs the lock!
    // ===========================================
    
    // ====== EXIT SECTION ======
    mutex_unlock(&processing_mutex);
    
    // ====== REMAINDER SECTION (after CS) ======
    write_to_thread_local_log(req, resp);  // Thread-local buffer
    send_response(req->client, resp);      // Per-connection, no sharing
}
// Benefit: Threads can process in parallel, only statistics update is serialized

The Optimization Mindset

Always ask: 'Does this code NEED to be in the critical section?' If the answer is no, move it out. Every instruction removed from the critical section directly improves concurrency. This is one of the highest-impact optimizations in concurrent programming.

Remainder Section and Performance

The ratio of time spent in the remainder section versus the critical section is one of the most important factors in concurrent system performance. This relationship is captured by Amdahl's Law.

Amdahl's Law for Locks:

If a fraction s of execution time is spent in the critical section (serialized), and the remainder (1 - s) can run in parallel, then with n processors:

Speedup = 1 / (s + (1-s)/n)

As n → ∞:

Speedup_max = 1 / s

This means: if 10% of work is in the critical section, you can never achieve more than 10x speedup, no matter how many processors you add.

Impact of Critical Section Ratio on Maximum Speedup
% in Critical Section (s)	% in Remainder (1-s)	Max Speedup (1/s)	With 8 CPUs	With 64 CPUs
1%	99%	100x	7.5x	39.3x
5%	95%	20x	5.9x	13.2x
10%	90%	10x	4.7x	8.0x
20%	80%	5x	3.3x	4.4x
50%	50%	2x	1.8x	1.9x
90%	10%	1.1x	1.1x	1.1x

The Real-World Implications:

This mathematical reality has profound implications:

Adding more CPUs has diminishing returns if the critical section is significant
Reducing critical section time is more valuable than adding hardware when s is high
Lock optimization is a first-class architectural concern, not a micro-optimization
Fine-grained locking (multiple small locks) beats coarse-grained locking (one big lock) because different threads can access different resources concurrently

The 'Single Lock' Antipattern

A common design mistake is protecting all shared data with a single global lock. This effectively makes the entire system serialized—only one thread can do shared-data work at a time. Even if 8 cores are available, the system behaves like it has one core for any locked operations. Identify independent data structures and protect each with its own lock to maximize concurrency.

single_vs_fine_grained_locks.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// ANTIPATTERN: Single global lock
struct GlobalState {
    Counter counter;
    UserDatabase users;
    TransactionLog log;
    SessionCache cache;
};
 
mutex_t global_lock;  // One lock for everything!
 
void process_user_request(Request* r) {
    mutex_lock(&global_lock);  // Everyone waits here
    
    // Even if we only touch the cache, 
    // we block access to counter, users, and log!
    Session* s = cache_lookup(&state.cache, r->session_id);
    
    mutex_unlock(&global_lock);
}
 
// BETTER: Fine-grained locks
struct GlobalState {
    struct {
        Counter counter;
        mutex_t lock;
    } counter_data;
    
    struct {
        UserDatabase users;
        rwlock_t lock;  // Multiple readers OK
    } user_data;
    
    struct {
        TransactionLog log;
        mutex_t lock;
    } log_data;
    
    struct {
        SessionCache cache;
        mutex_t lock;
    } cache_data;
};
 
void process_user_request_better(Request* r) {
    mutex_lock(&state.cache_data.lock);  // Only cache lock
    
    // Other threads can access counter, users, log concurrently!
    Session* s = cache_lookup(&state.cache_data.cache, r->session_id);
    
    mutex_unlock(&state.cache_data.lock);
}

Remainder Section and Liveness Requirements

The remainder section plays a subtle but important role in the correctness requirements of synchronization primitives, particularly the progress requirement.

Recall the Progress Requirement:

If no process is in its critical section and some processes wish to enter, the selection of which process enters next cannot be postponed indefinitely. Only processes not executing in their remainder sections can participate in this decision.

The last phrase is crucial. Let's understand why.

Why Remainder Sections Are Excluded from Progress Decisions:

Consider what would happen if a process in its remainder section could influence who enters the critical section next:

Process A finishes its CS and enters its remainder section
Process B wants to enter the CS and is waiting
But the protocol says "ask all processes who should enter"
Process A is in a long remainder section (sleeping, doing I/O, computing for hours...)
B cannot enter until A responds!

This would be absurd. A process that doesn't even want the CS would block others who do. The progress requirement explicitly excludes remainder section processes to avoid this.

progress_violation_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// HYPOTHETICAL BROKEN ALGORITHM that violates progress
// by involving remainder section processes
 
// Each process must "vote" before anyone enters CS
volatile int vote[N];  // 0 = no opinion, 1 = OK to enter
 
void broken_entry_section(int process_id) {
    // Request votes from ALL processes
    for (int i = 0; i < N; i++) {
        vote[i] = 0;  // Reset votes
    }
    
    // Wait until all processes vote OK
    for (int i = 0; i < N; i++) {
        while (vote[i] == 0) {
            // BLOCKED! If process i is in remainder section doing
            // a long computation, it might never vote.
            // Progress is violated!
        }
    }
}
 
// In remainder section, each process periodically:
void remainder_section_processing(int process_id) {
    // ... long computation ...
    
    // Maybe eventually vote
    vote[process_id] = 1;  // But this might take forever!
    
    // ... more computation ...
}
 
// FIX: Correct algorithms (like Peterson's) only involve
// processes that are actively trying to enter CS:
void peterson_entry(int process_id) {
    flag[process_id] = 1;  // I WANT IN
    turn = other;
    
    // Only waits for the OTHER process IF:
    // - Other wants in (flag[other] == 1), AND
    // - It's other's turn
    // If other is in remainder section, flag[other] = 0, we enter immediately!
    while (flag[other] == 1 && turn == other);
}

Interpreting 'Not Interested'

A process in its remainder section is expressing: 'I'm not interested in the critical section right now.' Correct synchronization algorithms interpret this as implicit permission for others to proceed. This is why flag variables in Peterson's algorithm are set to 0 in the exit section—it's announcing disinterest, not blocking others.

Remainder Section Duration Considerations

The duration of the remainder section—how long a process stays outside the critical section between accesses—has significant implications for system behavior and synchronization mechanism choice.

Short Remainder Sections:

When processes quickly return to wanting the CS:

High lock contention is likely
Spinlocks may waste more CPU
Fair queuing becomes more important
Cache ping-ponging increases

Remainder Section Duration and Design Implications
Characteristic	Short Remainder	Long Remainder
Contention level	High (processes frequently want CS)	Low (CS rarely contested)
Preferred lock type	Fair lock (prevents starvation)	Simple spinlock may suffice
Wait strategy	Blocking (CPU for others)	Spinning (quick access when free)
Throughput sensitivity	CS time dominates	CS time less critical
Cache effects	Lock line constantly bouncing	Lock line stays cold
Priority issues	Priority inversion more likely	Less likely
Scalability	Poor (Amdahl's Law)	Good (mostly parallel)

Long Remainder Sections (Infrequent Lock Access):

When processes spend most time in remainder:

Lock contention is rare
Simple synchronization often suffices
Spinlocks may be ideal (rarely spinning long)
System scales well with CPU count

Design Implication: Know Your Access Patterns

Choosing the right synchronization mechanism requires understanding:

How long do processes typically hold the lock?
How frequently do they need it?
What's the ratio of CS time to remainder time?
Is access bursty or steady?

access_pattern_analysis.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Analyzing access patterns to choose synchronization strategy
 
// Pattern A: SHORT REMAINDER, SHORT CS
// Example: Updating a shared counter for every operation
void pattern_a_counter_update(void) {
    for (int i = 0; i < MILLION; i++) {
        // Remainder: almost nothing
        // CS: increment shared counter
        mutex_lock(&counter_lock);
        shared_counter++;  // 1 instruction
        mutex_unlock(&counter_lock);
        // Back to remainder immediately
    }
}
// Best approach: atomic increment (no lock at all)
// atomic_fetch_add(&shared_counter, 1);
 
// Pattern B: LONG REMAINDER, SHORT CS
// Example: Worker threads processing independent jobs
void pattern_b_worker_thread(void) {
    while (true) {
        Job* job;
        
        // ENTRY + CS: Get next job (brief)
        mutex_lock(&queue_lock);
        job = dequeue(&job_queue);
        mutex_unlock(&queue_lock);
        
        // REMAINDER: Do the work (long)
        process_job(job);  // Might take seconds
        save_results(job);
    }
}
// Best approach: simple mutex, contention is rare
 
// Pattern C: SHORT REMAINDER, LONG CS
// Example: Database transactions
void pattern_c_database_transaction(void) {
    // Remainder: prepare transaction (brief)
    Transaction* tx = prepare();
    
    // CS: execute transaction (long)
    db_lock();
    execute_multiple_queries(tx);  // Many operations
    update_indexes(tx);
    flush_to_disk(tx);  // I/O!
    db_unlock();
    
    // Remainder: notify (brief)
    notify_completion(tx);
}
// Problem: Long CS hurts concurrency
// Better approach: Fine-grained locking (per-table or per-row)
// Or: Optimistic concurrency (check conflicts at commit)

Bursty Access and Convoys

When many processes finish their remainder sections at similar times (e.g., timer-based batch processing), they all arrive at the critical section together, forming a 'convoy.' This creates temporary high contention even if average contention is low. Jittering (adding random delays) in the remainder section can break up convoys and smooth access patterns.

Remainder Section in Real-Time Systems

In real-time systems, the remainder section takes on additional significance because it affects schedulability—the system's ability to meet all timing deadlines.

The Real-Time Perspective:

Real-time systems often model tasks as periodic:

Period (T): How often the task needs to run
Execution time (C): How long the task takes when it runs
Deadline (D): When the task must complete (often D = T)

The remainder section contributes to the task's period—it's the gap between one critical section access and the next.

Converting Mermaid diagram...

Blocking Time and Schedulability:

When a high-priority task must wait in the entry section (blocked by a low-priority task holding the lock), this blocking time adds to the task's worst-case execution time. The remainder section affects this indirectly:

If the low-priority task has a long remainder section, it holds the lock less frequently, reducing blocking probability
If all tasks have short remainder sections relative to long CS, blocking becomes the dominant factor

Schedulability Analysis in Real-Time:

For a system with n tasks and blocking time Bi for task i:

U = Σ(Ci + Bi) / Ti ≤ n(2^(1/n) - 1)  ≈ ln(2) for large n

The blocking time Bi depends on:

Length of critical sections of lower-priority tasks
Frequency of lower-priority task CS access (inversely related to their remainder section length)

realtime_remainder_analysis.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Real-time task structure example
typedef struct {
    int period_ms;           // How often task runs
    int computation_ms;      // Time when not in CS (remainder)
    int critical_section_ms; // Time in CS
    int deadline_ms;         // When task must complete
    int priority;            // Scheduling priority (higher = more important)
    mutex_t* required_lock;  // Lock this task needs
} RealtimeTask;
 
// Example real-time system
RealtimeTask tasks[] = {
    // High priority: Quick sensor read, long remainder (analysis)
    {.period_ms = 100, .computation_ms = 80, .critical_section_ms = 2,
     .deadline_ms = 100, .priority = 10, .required_lock = &sensor_lock},
    
    // Medium priority: Control loop, balanced remainder/CS
    {.period_ms = 50, .computation_ms = 20, .critical_section_ms = 10,
     .deadline_ms = 50, .priority = 5, .required_lock = &actuator_lock},
    
    // Low priority: Logging, long CS (file I/O), long period
    {.period_ms = 1000, .computation_ms = 50, .critical_section_ms = 100,
     .deadline_ms = 1000, .priority = 1, .required_lock = &log_lock},
};
 
// Blocking time analysis:
// - High priority task uses sensor_lock: Not shared by others, no blocking
// - Medium priority uses actuator_lock: Not shared by others, no blocking  
// - BUT if they shared a lock, high priority could be blocked by low
//   for up to 100ms (low's CS duration)!
 
// With priority inheritance mutex:
void high_priority_task_with_inheritance(void) {
    // When we block on lock held by low-priority task,
    // low-priority task's priority is boosted to ours
    // This ensures low finishes quickly, minimizing our blocking time
    pthread_mutex_lock(&shared_lock);  // Uses PTHREAD_PRIO_INHERIT
    // ... critical section ...
    pthread_mutex_unlock(&shared_lock);
}

Real-Time Lock Protocols

In hard real-time systems, unbounded blocking is unacceptable—it can cause missed deadlines and system failure. Protocols like Priority Inheritance and Priority Ceiling ensure that blocking time is bounded and can be factored into schedulability analysis. The remainder section patterns of all tasks must be analyzed to compute worst-case blocking.

Optimizing the Remainder Section

Since maximizing remainder section work improves concurrency, let's explore strategies for moving work out of critical sections and into the remainder section.

Strategy 1: Copy-on-Entry, Merge-on-Exit

Copy data at the start of the CS, release the lock, process the copy in the remainder section, then re-acquire and update:

copy_on_entry.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Copy-on-entry pattern
void process_with_copy(void) {
    DataCopy local_copy;
    
    // Brief CS: just copy the data
    mutex_lock(&data_lock);
    local_copy = shared_data;  // Shallow or deep copy
    mutex_unlock(&data_lock);
    
    // Long remainder: work on the copy
    Result result = expensive_computation(&local_copy);
    
    // Brief CS: merge results back
    mutex_lock(&data_lock);
    merge_result(&shared_data, &result);
    mutex_unlock(&data_lock);
}
 
// Trade-off: Works well for read-heavy workloads
// Caveat: May need conflict detection if data changed while computing

Strategy 2: Thread-Local Accumulation

Each thread maintains local state in its remainder section, periodically merging into shared state:

thread_local_accumulation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Thread-local accumulation pattern
__thread int local_counter = 0;  // Each thread has its own
int global_counter = 0;
mutex_t counter_lock;
 
void increment(void) {
    // REMAINDER: Update thread-local (no synchronization!)
    local_counter++;
    
    // Only occasionally merge to global
    if (local_counter >= BATCH_SIZE) {
        mutex_lock(&counter_lock);
        global_counter += local_counter;  // CS: quick bulk update
        mutex_unlock(&counter_lock);
        local_counter = 0;
    }
}
 
// Results:
// - 99% of increments are lock-free
// - CS occurs only every BATCH_SIZE operations
// - Dramatically reduces contention

Strategy 3: Read-Copy-Update (RCU)

Readers never block; writers create a new version and atomically swap pointers. Readers in their remainder section may see old or new data—both are valid:

rcu_pattern.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// RCU-style optimization (simplified concept)
struct Data {
    // ... lots of fields ...
};
 
atomic_ptr<Data> shared_ptr;  // Atomic pointer to current data
 
// Readers: No lock, no waiting! (all in "remainder")
Data* read_data(void) {
    Data* ptr = atomic_load(&shared_ptr);  // Atomic read
    // Use ptr freely - the data won't be freed while we're using it
    // (Requires a grace period mechanism)
    return ptr;
}
 
// Writer: Does heavy work in remainder, brief atomic swap
void update_data(UpdateInfo* info) {
    // REMAINDER: Prepare new version (no lock)
    Data* old_ptr = atomic_load(&shared_ptr);
    Data* new_ptr = malloc(sizeof(Data));
    *new_ptr = *old_ptr;  // Copy old data
    apply_update(new_ptr, info);  // Modify the copy
    
    // CS: Just an atomic pointer swap!
    atomic_store(&shared_ptr, new_ptr);
    
    // REMAINDER: Reclaim old data after grace period
    // (Wait until all readers using old_ptr have finished)
    synchronize_rcu();
    free(old_ptr);
}
 
// RCU is used extensively in the Linux kernel for read-mostly structures

Remainder Section Optimization Techniques

•Copy-on-entry — Get data, release lock, process copy, re-acquire and merge
•Thread-local accumulation — Batch updates locally, merge periodically
•RCU (Read-Copy-Update) — Readers never wait, writers create new versions
•Double buffering — Readers use one buffer, writers prepare another, swap atomically
•Lazy updates — Mark changes, apply them later in bulk
•Partitioned data — Different threads own different partitions (no sharing)

Summary: The Complete Synchronization Lifecycle

The remainder section completes our understanding of the four-part process structure for accessing shared resources. Let's consolidate the key insights:

Key Takeaways

•The remainder section is where true concurrency happens — Multiple processes can execute their remainder sections simultaneously without interference.
•Maximizing remainder section work is critical for scalability — Amdahl's Law shows that the critical section fraction S caps maximum speedup at 1/S.
•Move work out of critical sections — Any code that doesn't require synchronization should be in the remainder section.
•Remainder section processes don't block progress — The formal progress requirement excludes them from entry decisions precisely so they can't delay others.
•Access patterns affect synchronization choice — Long remainder sections with short CS suggest simple locks; short remainder with long CS demands optimization.
•Optimization techniques exist — Copy-on-entry, thread-local accumulation, and RCU can dramatically expand effective remainder section time.

Converting Mermaid diagram...

What's Next:

Now that we fully understand the structure of the critical section problem—the critical section itself, the entry and exit sections that protect it, and the remainder section that runs freely—we're ready to formalize the problem requirements. The next page presents the three requirements (mutual exclusion, progress, and bounded waiting) in rigorous detail, enabling us to evaluate any proposed solution.

Page Complete

You now understand the remainder section and its crucial role in concurrent system performance. Combined with your knowledge of the entry, critical, and exit sections, you have a complete conceptual model of how processes safely share resources. Next, we formalize the requirements that any correct solution must satisfy.

4 / 5

Loading learning content...

Operating SystemsCritical Section Problem

The Critical Section Problem

LevelIntermediate

Duration60 mins

TopicCritical Section Problem

4 / 5

Remainder Section

Beyond Critical Sections: The Remainder Section

What You Will Learn

Formal Definition of the Remainder Section

Formal Position in Process Structure:

while (true) {
    ENTRY SECTION      ← Acquire exclusive access
    CRITICAL SECTION   ← Access shared resources
    EXIT SECTION       ← Release access, notify waiters
    REMAINDER SECTION  ← Independent work (no synchronization needed)
}

The remainder section is characterized by three essential properties:

Remainder Section Characteristics

•No shared resource access — The code doesn't read or write any data that other processes might simultaneously access
•No synchronization requirements — The process doesn't need to wait for other processes and doesn't hold any locks
•Full concurrency permitted — Any number of processes can be in their remainder sections simultaneously
•No impact on mutual exclusion — Remainder section code cannot cause race conditions with other processes
•Independent execution — The code can be interrupted or delayed without affecting shared state correctness

The Importance of 'Remainder'

What Belongs in the Remainder Section

Not all code naturally falls into the critical section or remainder section categories. Understanding what can safely be moved to the remainder section is a critical design skill.

Candidates for the Remainder Section:

Code Categories and Their Section Placement
Code Type	Can Be in Remainder?	Rationale
Local variable computation	✅ Yes	Thread-local data cannot cause races
Pure functions with immutable inputs	✅ Yes	No side effects, no shared state
I/O to private resources	✅ Yes	Thread-specific files, sockets
Sleeping/yielding	✅ Yes	Reduces contention when no CS work needed
Reads of immutable shared data	✅ Yes	Data never changes, no race possible
Reads of shared mutable data	⚠️ Sometimes	Only if stale data is acceptable
Writes to thread-local copies	✅ Yes	Changes visible only to current thread
Writes to shared data	❌ No	Must be in critical section
Check-then-act on shared data	❌ No	Race between check and act
Complex invariant maintenance	❌ No	Invariant might be observed mid-update

remainder_section_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Example: Optimizing by expanding the remainder section
 
// BEFORE: Excessive code in critical section
void process_request_bad(Request* req) {
    mutex_lock(&processing_mutex);
    // ====== CRITICAL SECTION (too large) ======
    validate_request(req);           // Pure computation - no shared state!
    parse_request(req);              // Pure computation - no shared state!
    Response resp = compute_response(req);  // Expensive, but no shared state!
    update_statistics(&global_stats, req);  // Needs synchronization
    write_to_log(req, resp);         // Could use thread-local buffer!
    // ===========================================
    mutex_unlock(&processing_mutex);
    
    send_response(req->client, resp);
}
// Problem: Most work doesn't need the lock!
 
// AFTER: Minimized critical section, expanded remainder section
void process_request_good(Request* req) {
    // ====== REMAINDER SECTION (before CS) ======
    // All this work is thread-local, runs while others work
    validate_request(req);           // Pure function
    parse_request(req);              // Pure function
    Response resp = compute_response(req);  // Expensive but local
    
    // ====== ENTRY SECTION ======
    mutex_lock(&processing_mutex);
    
    // ====== CRITICAL SECTION (minimal) ======
    update_statistics(&global_stats, req);  // Only this needs the lock!
    // ===========================================
    
    // ====== EXIT SECTION ======
    mutex_unlock(&processing_mutex);
    
    // ====== REMAINDER SECTION (after CS) ======
    write_to_thread_local_log(req, resp);  // Thread-local buffer
    send_response(req->client, resp);      // Per-connection, no sharing
}
// Benefit: Threads can process in parallel, only statistics update is serialized

The Optimization Mindset

Remainder Section and Performance

The ratio of time spent in the remainder section versus the critical section is one of the most important factors in concurrent system performance. This relationship is captured by Amdahl's Law.

Amdahl's Law for Locks:

If a fraction s of execution time is spent in the critical section (serialized), and the remainder (1 - s) can run in parallel, then with n processors:

Speedup = 1 / (s + (1-s)/n)

As n → ∞:

Speedup_max = 1 / s

This means: if 10% of work is in the critical section, you can never achieve more than 10x speedup, no matter how many processors you add.

Impact of Critical Section Ratio on Maximum Speedup
% in Critical Section (s)	% in Remainder (1-s)	Max Speedup (1/s)	With 8 CPUs	With 64 CPUs
1%	99%	100x	7.5x	39.3x
5%	95%	20x	5.9x	13.2x
10%	90%	10x	4.7x	8.0x
20%	80%	5x	3.3x	4.4x
50%	50%	2x	1.8x	1.9x
90%	10%	1.1x	1.1x	1.1x

The Real-World Implications:

This mathematical reality has profound implications:

Adding more CPUs has diminishing returns if the critical section is significant
Reducing critical section time is more valuable than adding hardware when s is high
Lock optimization is a first-class architectural concern, not a micro-optimization
Fine-grained locking (multiple small locks) beats coarse-grained locking (one big lock) because different threads can access different resources concurrently

The 'Single Lock' Antipattern

single_vs_fine_grained_locks.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// ANTIPATTERN: Single global lock
struct GlobalState {
    Counter counter;
    UserDatabase users;
    TransactionLog log;
    SessionCache cache;
};
 
mutex_t global_lock;  // One lock for everything!
 
void process_user_request(Request* r) {
    mutex_lock(&global_lock);  // Everyone waits here
    
    // Even if we only touch the cache, 
    // we block access to counter, users, and log!
    Session* s = cache_lookup(&state.cache, r->session_id);
    
    mutex_unlock(&global_lock);
}
 
// BETTER: Fine-grained locks
struct GlobalState {
    struct {
        Counter counter;
        mutex_t lock;
    } counter_data;
    
    struct {
        UserDatabase users;
        rwlock_t lock;  // Multiple readers OK
    } user_data;
    
    struct {
        TransactionLog log;
        mutex_t lock;
    } log_data;
    
    struct {
        SessionCache cache;
        mutex_t lock;
    } cache_data;
};
 
void process_user_request_better(Request* r) {
    mutex_lock(&state.cache_data.lock);  // Only cache lock
    
    // Other threads can access counter, users, log concurrently!
    Session* s = cache_lookup(&state.cache_data.cache, r->session_id);
    
    mutex_unlock(&state.cache_data.lock);
}

Remainder Section and Liveness Requirements

The remainder section plays a subtle but important role in the correctness requirements of synchronization primitives, particularly the progress requirement.

Recall the Progress Requirement:

If no process is in its critical section and some processes wish to enter, the selection of which process enters next cannot be postponed indefinitely. Only processes not executing in their remainder sections can participate in this decision.

The last phrase is crucial. Let's understand why.

Why Remainder Sections Are Excluded from Progress Decisions:

Consider what would happen if a process in its remainder section could influence who enters the critical section next:

Process A finishes its CS and enters its remainder section
Process B wants to enter the CS and is waiting
But the protocol says "ask all processes who should enter"
Process A is in a long remainder section (sleeping, doing I/O, computing for hours...)
B cannot enter until A responds!

This would be absurd. A process that doesn't even want the CS would block others who do. The progress requirement explicitly excludes remainder section processes to avoid this.

progress_violation_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// HYPOTHETICAL BROKEN ALGORITHM that violates progress
// by involving remainder section processes
 
// Each process must "vote" before anyone enters CS
volatile int vote[N];  // 0 = no opinion, 1 = OK to enter
 
void broken_entry_section(int process_id) {
    // Request votes from ALL processes
    for (int i = 0; i < N; i++) {
        vote[i] = 0;  // Reset votes
    }
    
    // Wait until all processes vote OK
    for (int i = 0; i < N; i++) {
        while (vote[i] == 0) {
            // BLOCKED! If process i is in remainder section doing
            // a long computation, it might never vote.
            // Progress is violated!
        }
    }
}
 
// In remainder section, each process periodically:
void remainder_section_processing(int process_id) {
    // ... long computation ...
    
    // Maybe eventually vote
    vote[process_id] = 1;  // But this might take forever!
    
    // ... more computation ...
}
 
// FIX: Correct algorithms (like Peterson's) only involve
// processes that are actively trying to enter CS:
void peterson_entry(int process_id) {
    flag[process_id] = 1;  // I WANT IN
    turn = other;
    
    // Only waits for the OTHER process IF:
    // - Other wants in (flag[other] == 1), AND
    // - It's other's turn
    // If other is in remainder section, flag[other] = 0, we enter immediately!
    while (flag[other] == 1 && turn == other);
}

Interpreting 'Not Interested'

Remainder Section Duration Considerations

The duration of the remainder section—how long a process stays outside the critical section between accesses—has significant implications for system behavior and synchronization mechanism choice.

Short Remainder Sections:

When processes quickly return to wanting the CS:

High lock contention is likely
Spinlocks may waste more CPU
Fair queuing becomes more important
Cache ping-ponging increases

Remainder Section Duration and Design Implications
Characteristic	Short Remainder	Long Remainder
Contention level	High (processes frequently want CS)	Low (CS rarely contested)
Preferred lock type	Fair lock (prevents starvation)	Simple spinlock may suffice
Wait strategy	Blocking (CPU for others)	Spinning (quick access when free)
Throughput sensitivity	CS time dominates	CS time less critical
Cache effects	Lock line constantly bouncing	Lock line stays cold
Priority issues	Priority inversion more likely	Less likely
Scalability	Poor (Amdahl's Law)	Good (mostly parallel)

Long Remainder Sections (Infrequent Lock Access):

When processes spend most time in remainder:

Lock contention is rare
Simple synchronization often suffices
Spinlocks may be ideal (rarely spinning long)
System scales well with CPU count

Design Implication: Know Your Access Patterns

Choosing the right synchronization mechanism requires understanding:

How long do processes typically hold the lock?
How frequently do they need it?
What's the ratio of CS time to remainder time?
Is access bursty or steady?

access_pattern_analysis.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Analyzing access patterns to choose synchronization strategy
 
// Pattern A: SHORT REMAINDER, SHORT CS
// Example: Updating a shared counter for every operation
void pattern_a_counter_update(void) {
    for (int i = 0; i < MILLION; i++) {
        // Remainder: almost nothing
        // CS: increment shared counter
        mutex_lock(&counter_lock);
        shared_counter++;  // 1 instruction
        mutex_unlock(&counter_lock);
        // Back to remainder immediately
    }
}
// Best approach: atomic increment (no lock at all)
// atomic_fetch_add(&shared_counter, 1);
 
// Pattern B: LONG REMAINDER, SHORT CS
// Example: Worker threads processing independent jobs
void pattern_b_worker_thread(void) {
    while (true) {
        Job* job;
        
        // ENTRY + CS: Get next job (brief)
        mutex_lock(&queue_lock);
        job = dequeue(&job_queue);
        mutex_unlock(&queue_lock);
        
        // REMAINDER: Do the work (long)
        process_job(job);  // Might take seconds
        save_results(job);
    }
}
// Best approach: simple mutex, contention is rare
 
// Pattern C: SHORT REMAINDER, LONG CS
// Example: Database transactions
void pattern_c_database_transaction(void) {
    // Remainder: prepare transaction (brief)
    Transaction* tx = prepare();
    
    // CS: execute transaction (long)
    db_lock();
    execute_multiple_queries(tx);  // Many operations
    update_indexes(tx);
    flush_to_disk(tx);  // I/O!
    db_unlock();
    
    // Remainder: notify (brief)
    notify_completion(tx);
}
// Problem: Long CS hurts concurrency
// Better approach: Fine-grained locking (per-table or per-row)
// Or: Optimistic concurrency (check conflicts at commit)

Bursty Access and Convoys

Remainder Section in Real-Time Systems

In real-time systems, the remainder section takes on additional significance because it affects schedulability—the system's ability to meet all timing deadlines.

The Real-Time Perspective:

Real-time systems often model tasks as periodic:

Period (T): How often the task needs to run
Execution time (C): How long the task takes when it runs
Deadline (D): When the task must complete (often D = T)

The remainder section contributes to the task's period—it's the gap between one critical section access and the next.

Converting Mermaid diagram...

Blocking Time and Schedulability:

If the low-priority task has a long remainder section, it holds the lock less frequently, reducing blocking probability
If all tasks have short remainder sections relative to long CS, blocking becomes the dominant factor

Schedulability Analysis in Real-Time:

For a system with n tasks and blocking time Bi for task i:

U = Σ(Ci + Bi) / Ti ≤ n(2^(1/n) - 1)  ≈ ln(2) for large n

The blocking time Bi depends on:

Length of critical sections of lower-priority tasks
Frequency of lower-priority task CS access (inversely related to their remainder section length)

realtime_remainder_analysis.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Real-time task structure example
typedef struct {
    int period_ms;           // How often task runs
    int computation_ms;      // Time when not in CS (remainder)
    int critical_section_ms; // Time in CS
    int deadline_ms;         // When task must complete
    int priority;            // Scheduling priority (higher = more important)
    mutex_t* required_lock;  // Lock this task needs
} RealtimeTask;
 
// Example real-time system
RealtimeTask tasks[] = {
    // High priority: Quick sensor read, long remainder (analysis)
    {.period_ms = 100, .computation_ms = 80, .critical_section_ms = 2,
     .deadline_ms = 100, .priority = 10, .required_lock = &sensor_lock},
    
    // Medium priority: Control loop, balanced remainder/CS
    {.period_ms = 50, .computation_ms = 20, .critical_section_ms = 10,
     .deadline_ms = 50, .priority = 5, .required_lock = &actuator_lock},
    
    // Low priority: Logging, long CS (file I/O), long period
    {.period_ms = 1000, .computation_ms = 50, .critical_section_ms = 100,
     .deadline_ms = 1000, .priority = 1, .required_lock = &log_lock},
};
 
// Blocking time analysis:
// - High priority task uses sensor_lock: Not shared by others, no blocking
// - Medium priority uses actuator_lock: Not shared by others, no blocking  
// - BUT if they shared a lock, high priority could be blocked by low
//   for up to 100ms (low's CS duration)!
 
// With priority inheritance mutex:
void high_priority_task_with_inheritance(void) {
    // When we block on lock held by low-priority task,
    // low-priority task's priority is boosted to ours
    // This ensures low finishes quickly, minimizing our blocking time
    pthread_mutex_lock(&shared_lock);  // Uses PTHREAD_PRIO_INHERIT
    // ... critical section ...
    pthread_mutex_unlock(&shared_lock);
}

Real-Time Lock Protocols

Optimizing the Remainder Section

Since maximizing remainder section work improves concurrency, let's explore strategies for moving work out of critical sections and into the remainder section.

Strategy 1: Copy-on-Entry, Merge-on-Exit

Copy data at the start of the CS, release the lock, process the copy in the remainder section, then re-acquire and update:

copy_on_entry.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Copy-on-entry pattern
void process_with_copy(void) {
    DataCopy local_copy;
    
    // Brief CS: just copy the data
    mutex_lock(&data_lock);
    local_copy = shared_data;  // Shallow or deep copy
    mutex_unlock(&data_lock);
    
    // Long remainder: work on the copy
    Result result = expensive_computation(&local_copy);
    
    // Brief CS: merge results back
    mutex_lock(&data_lock);
    merge_result(&shared_data, &result);
    mutex_unlock(&data_lock);
}
 
// Trade-off: Works well for read-heavy workloads
// Caveat: May need conflict detection if data changed while computing

Strategy 2: Thread-Local Accumulation

Each thread maintains local state in its remainder section, periodically merging into shared state:

thread_local_accumulation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Thread-local accumulation pattern
__thread int local_counter = 0;  // Each thread has its own
int global_counter = 0;
mutex_t counter_lock;
 
void increment(void) {
    // REMAINDER: Update thread-local (no synchronization!)
    local_counter++;
    
    // Only occasionally merge to global
    if (local_counter >= BATCH_SIZE) {
        mutex_lock(&counter_lock);
        global_counter += local_counter;  // CS: quick bulk update
        mutex_unlock(&counter_lock);
        local_counter = 0;
    }
}
 
// Results:
// - 99% of increments are lock-free
// - CS occurs only every BATCH_SIZE operations
// - Dramatically reduces contention

Strategy 3: Read-Copy-Update (RCU)

Readers never block; writers create a new version and atomically swap pointers. Readers in their remainder section may see old or new data—both are valid:

rcu_pattern.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// RCU-style optimization (simplified concept)
struct Data {
    // ... lots of fields ...
};
 
atomic_ptr<Data> shared_ptr;  // Atomic pointer to current data
 
// Readers: No lock, no waiting! (all in "remainder")
Data* read_data(void) {
    Data* ptr = atomic_load(&shared_ptr);  // Atomic read
    // Use ptr freely - the data won't be freed while we're using it
    // (Requires a grace period mechanism)
    return ptr;
}
 
// Writer: Does heavy work in remainder, brief atomic swap
void update_data(UpdateInfo* info) {
    // REMAINDER: Prepare new version (no lock)
    Data* old_ptr = atomic_load(&shared_ptr);
    Data* new_ptr = malloc(sizeof(Data));
    *new_ptr = *old_ptr;  // Copy old data
    apply_update(new_ptr, info);  // Modify the copy
    
    // CS: Just an atomic pointer swap!
    atomic_store(&shared_ptr, new_ptr);
    
    // REMAINDER: Reclaim old data after grace period
    // (Wait until all readers using old_ptr have finished)
    synchronize_rcu();
    free(old_ptr);
}
 
// RCU is used extensively in the Linux kernel for read-mostly structures

Remainder Section Optimization Techniques

•Copy-on-entry — Get data, release lock, process copy, re-acquire and merge
•Thread-local accumulation — Batch updates locally, merge periodically
•RCU (Read-Copy-Update) — Readers never wait, writers create new versions
•Double buffering — Readers use one buffer, writers prepare another, swap atomically
•Lazy updates — Mark changes, apply them later in bulk
•Partitioned data — Different threads own different partitions (no sharing)

Summary: The Complete Synchronization Lifecycle

The remainder section completes our understanding of the four-part process structure for accessing shared resources. Let's consolidate the key insights:

Key Takeaways

•The remainder section is where true concurrency happens — Multiple processes can execute their remainder sections simultaneously without interference.
•Maximizing remainder section work is critical for scalability — Amdahl's Law shows that the critical section fraction S caps maximum speedup at 1/S.
•Move work out of critical sections — Any code that doesn't require synchronization should be in the remainder section.
•Remainder section processes don't block progress — The formal progress requirement excludes them from entry decisions precisely so they can't delay others.
•Access patterns affect synchronization choice — Long remainder sections with short CS suggest simple locks; short remainder with long CS demands optimization.
•Optimization techniques exist — Copy-on-entry, thread-local accumulation, and RCU can dramatically expand effective remainder section time.

Converting Mermaid diagram...

What's Next:

Page Complete

4 / 5