Operating SystemsSemaphore Implementation

Semaphore Implementation

LevelIntermediate

Duration90 mins

TopicSemaphore Implementation

4 / 5

Fairness Considerations

The Price of Unfairness

Imagine a situation: five threads compete for a semaphore. Four of them acquire and release it repeatedly, while the fifth waits endlessly. Hours pass. The system is "working"—resources are being used, throughput is maintained—but one thread has been starved completely.

This is the fairness problem in synchronization. A semaphore that allows some waiters to be indefinitely bypassed violates a fundamental expectation: that waiting should eventually lead to progress.

Fairness isn't free, however. Strict fairness guarantees impose overhead and can reduce throughput in some scenarios. This page explores the spectrum of fairness properties, the mechanisms that provide them, and the engineering tradeoffs involved in choosing a fairness policy.

What You Will Learn

By the end of this page, you will understand: the definition and importance of fairness in synchronization, how starvation occurs in unfair implementations, queue disciplines and their fairness properties, priority-based wakeup and its implications, the tradeoff between fairness and performance, and real-world fairness approaches in major operating systems.

Defining Fairness

Fairness in synchronization is surprisingly nuanced. There are multiple definitions, each with different guarantees and implementation costs.

Starvation-Freedom (Weak Fairness):

A synchronization primitive is starvation-free if every thread that attempts to acquire will eventually succeed, assuming the primitive is eventually released. This is the weakest useful fairness guarantee.

Formally: If thread T waits for the semaphore and other threads repeatedly release it, T will eventually acquire.

Bounded Waiting:

A stronger property: there exists a bound B such that no thread waits for more than B other threads to go ahead of it. This provides predictable wait times.

Formally: If T starts waiting, at most B threads that start waiting after T will acquire before T.

FIFO Fairness (Strong Fairness):

The strongest common guarantee: threads acquire in the order they requested. First-come, first-served. No one cuts in line.

Formally: If T₁ starts waiting before T₂, then T₁ acquires before T₂.

Priority-Respecting:

An alternative fairness model: higher-priority threads should acquire before lower-priority threads, regardless of arrival order. This trades FIFO fairness for priority responsiveness.

Fairness Property Hierarchy
Property	Guarantee	Cost
No guarantee	None (starvation possible)	Minimal overhead
Starvation-free	Eventually acquire	Requires queue structure
Bounded waiting	Acquire within B others	Requires counting or ordering
FIFO	First-come, first-served	Strict queue ordering, O(1) tracking
Priority-respecting	Highest priority first	Priority queue, O(log n) operations

Fairness vs Throughput

Perfect FIFO fairness can reduce throughput. Consider a thread that just released a semaphore but wants it again immediately. If other waiters are on different CPUs (cold caches), forcing FIFO order means a context switch and cache migration. Allowing the local thread to reacquire can be significantly faster—but unfair.

Starvation: The Failure Mode

Starvation occurs when a thread waits indefinitely for a resource that is repeatedly acquired by others. Unlike deadlock (where all threads are stuck), starvation is insidious: the system makes progress but individual threads don't.

How Starvation Happens:

Consider a naive spinlock-based semaphore:

while (!try_acquire()) { /* spin */ }

When the lock is released, multiple spinners race to acquire. The winner is essentially random—whichever CPU's cache line gets validated first. A pathological pattern can emerge:

Thread A releases, Thread B acquires
Thread B releases, Thread C acquires
Thread C releases, Thread A acquires
(Thread D is always slightly slower... never wins)

Thread D can spin forever while A, B, C rotate through the lock.

Contributing Factors:

CPU speed variations: Some cores slightly faster than others
Cache topology: NUMA effects, shared L3 caches
Interrupt distribution: Some CPUs handle more interrupts
Luck: Random timing alignment

The Barging Problem:

Even with FIFO wait queues, barging can occur. A new arrival might acquire the semaphore before a waiter is fully awakened:

Thread A: signal() -> wakes Thread B
Thread C: arrives, tests semaphore, finds available, acquires!
Thread B: wakes up, finds semaphore acquired, blocks again

Thread C "barged" ahead of Thread B who had been waiting longer.

starvation_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
// Demonstration of starvation-prone semaphore
 
#include <pthread.h>
#include <stdio.h>
#include <stdbool.h>
#include <stdatomic.h>
 
// Simple unfair semaphore (demonstrates starvation)
typedef struct {
    atomic_int value;
} unfair_sem_t;
 
void unfair_sem_init(unfair_sem_t *sem, int initial) {
    atomic_store(&sem->value, initial);
}
 
void unfair_sem_wait(unfair_sem_t *sem) {
    while (1) {
        int val = atomic_load(&sem->value);
        if (val > 0) {
            // Try to decrement
            if (atomic_compare_exchange_weak(&sem->value, &val, val - 1)) {
                return;  // Acquired!
            }
            // CAS failed, retry
        }
        // Spin - no fairness, no queue
    }
}
 
void unfair_sem_signal(unfair_sem_t *sem) {
    atomic_fetch_add(&sem->value, 1);
}
 
// Test case showing potential starvation
unfair_sem_t test_sem;
atomic_int acquisitions[4];  // Per-thread acquisition count
 
void *aggressive_worker(void *arg) {
    int id = *(int *)arg;
    
    for (int i = 0; i < 100000; i++) {
        unfair_sem_wait(&test_sem);
        atomic_fetch_add(&acquisitions[id], 1);
        // Minimal critical section
        unfair_sem_signal(&test_sem);
    }
    
    return NULL;
}
 
int main() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};
    
    unfair_sem_init(&test_sem, 1);
    
    for (int i = 0; i < 4; i++) {
        atomic_store(&acquisitions[i], 0);
        pthread_create(&threads[i], NULL, aggressive_worker, &ids[i]);
    }
    
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);
    }
    
    // Check distribution - unfair implementations show skew
    printf("Acquisitions per thread:\n");
    for (int i = 0; i < 4; i++) {
        printf("  Thread %d: %d\n", i, atomic_load(&acquisitions[i]));
    }
    
    // In an unfair implementation, you might see:
    // Thread 0: 250000
    // Thread 1: 120000  
    // Thread 2: 28000
    // Thread 3: 2000    <-- Severe starvation!
    
    return 0;
}

Real-World Starvation Impact

Starvation has caused real production incidents. A web server thread starved for a database lock appears 'hung' from the user's perspective. Timeouts trigger, requests fail, and users see errors—even though the database is processing requests normally for other threads.

Queue Disciplines

The wait queue discipline determines the order in which blocked threads are awakened. Different disciplines provide different fairness and performance characteristics.

FIFO Queue:

The most intuitive and commonly used discipline. Threads are added to the tail of the queue and removed from the head. Guarantees:

Strict arrival-order fairness
O(1) enqueue and dequeue
Bounded waiting (at most N-1 threads ahead for N waiters)

LIFO Stack:

Last-in, first-out. New arrivals are favored over long waiters. Seems unfair, but can have advantages:

Cache locality: recently running thread likely has warm cache
Low latency for recent arrivals
Danger: can cause starvation of old waiters

Rarely used for general synchronization but common in memory allocators (recently freed memory is still cached).

Priority Queue:

Orders by thread priority rather than arrival time. High-priority threads jump the queue:

O(log n) operations due to heap maintenance
Respects priority scheduling semantics
Can starve low-priority threads if high-priority traffic is continuous
Common in real-time systems

Random Selection:

Wake a random waiter. Provides probabilistic fairness:

No thread is systematically disadvantaged
Expected wait proportional to time waiting (no starvation in expectation)
Simple implementation
Variance in wait times is high

queue_disciplines.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
// Different queue disciplines for semaphore wait queues
 
// ==================================================
// FIFO Queue (Doubly-Linked List)
// ==================================================
typedef struct fifo_node {
    struct task_struct *task;
    struct fifo_node *next;
    struct fifo_node *prev;
} fifo_node_t;
 
typedef struct {
    fifo_node_t *head;  // Dequeue from here
    fifo_node_t *tail;  // Enqueue here
    int count;
} fifo_queue_t;
 
void fifo_enqueue(fifo_queue_t *q, fifo_node_t *node) {
    node->next = NULL;
    node->prev = q->tail;
    
    if (q->tail) {
        q->tail->next = node;
    } else {
        q->head = node;
    }
    q->tail = node;
    q->count++;
}
 
fifo_node_t *fifo_dequeue(fifo_queue_t *q) {
    fifo_node_t *node = q->head;
    if (!node) return NULL;
    
    q->head = node->next;
    if (q->head) {
        q->head->prev = NULL;
    } else {
        q->tail = NULL;
    }
    q->count--;
    return node;
}
 
// ==================================================
// LIFO Stack (Singly-Linked)
// ==================================================
typedef struct lifo_node {
    struct task_struct *task;
    struct lifo_node *next;
} lifo_node_t;
 
typedef struct {
    lifo_node_t *top;
    int count;
} lifo_stack_t;
 
void lifo_push(lifo_stack_t *s, lifo_node_t *node) {
    node->next = s->top;
    s->top = node;
    s->count++;
}
 
lifo_node_t *lifo_pop(lifo_stack_t *s) {
    lifo_node_t *node = s->top;
    if (!node) return NULL;
    
    s->top = node->next;
    s->count--;
    return node;
}
 
// ==================================================
// Priority Queue (Min-Heap by priority)
// ==================================================
typedef struct {
    struct task_struct *task;
    int priority;  // Lower number = higher priority
} prio_node_t;
 
typedef struct {
    prio_node_t *nodes;
    int count;
    int capacity;
} priority_queue_t;
 
void prio_insert(priority_queue_t *pq, prio_node_t node) {
    // Add at end
    int i = pq->count++;
    pq->nodes[i] = node;
    
    // Bubble up
    while (i > 0) {
        int parent = (i - 1) / 2;
        if (pq->nodes[i].priority >= pq->nodes[parent].priority)
            break;
        
        // Swap with parent
        prio_node_t tmp = pq->nodes[i];
        pq->nodes[i] = pq->nodes[parent];
        pq->nodes[parent] = tmp;
        i = parent;
    }
}
 
prio_node_t prio_extract_min(priority_queue_t *pq) {
    prio_node_t min = pq->nodes[0];
    pq->nodes[0] = pq->nodes[--pq->count];
    
    // Bubble down
    int i = 0;
    while (2*i + 1 < pq->count) {
        int left = 2*i + 1;
        int right = 2*i + 2;
        int smallest = i;
        
        if (pq->nodes[left].priority < pq->nodes[smallest].priority)
            smallest = left;
        if (right < pq->count && 
            pq->nodes[right].priority < pq->nodes[smallest].priority)
            smallest = right;
        
        if (smallest == i) break;
        
        prio_node_t tmp = pq->nodes[i];
        pq->nodes[i] = pq->nodes[smallest];
        pq->nodes[smallest] = tmp;
        i = smallest;
    }
    
    return min;
}
 
// ==================================================
// Fair Semaphore using FIFO Queue
// ==================================================
typedef struct {
    int value;
    spinlock_t lock;
    fifo_queue_t waiters;
} fair_semaphore_t;
 
void fair_sem_wait(fair_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    if (sem->value > 0) {
        sem->value--;
        spin_unlock(&sem->lock);
        return;
    }
    
    // Must wait - add to FIFO queue
    fifo_node_t wait_node;
    wait_node.task = current;
    fifo_enqueue(&sem->waiters, &wait_node);
    
    set_current_state(TASK_INTERRUPTIBLE);
    spin_unlock(&sem->lock);
    schedule();  // Sleep until woken
    
    // When we wake, we're guaranteed to have the semaphore (FIFO)
}
 
void fair_sem_signal(fair_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    fifo_node_t *waiter = fifo_dequeue(&sem->waiters);
    if (waiter) {
        // Wake the longest-waiting thread
        // Don't increment value - transferring directly
        wake_up_process(waiter->task);
    } else {
        // No waiters - increment for future acquirer
        sem->value++;
    }
    
    spin_unlock(&sem->lock);
}

Queue Discipline Comparison

•FIFO — Fair, predictable, O(1) operations; standard choice for general use
•LIFO — Cache-friendly, unfair, risks starvation; used in specialized scenarios (allocators)
•Priority — Respects urgency, O(log n), can starve low priority; essential for real-time
•Random — Statistically fair, high variance, simple; rarely used in practice

Priority-Based Wakeup

In priority-scheduled systems, the question arises: should semaphore wakeup respect thread priorities? If a high-priority thread is waiting, should it be woken before a low-priority thread that arrived earlier?

The Tension:

Priority scheduling promises that high-priority threads run before low-priority ones. If semaphores always use FIFO wakeup, a low-priority thread could block a high-priority one for extended periods, violating priority guarantees.

Conversely, always waking the highest-priority waiter can starve low-priority threads indefinitely—especially if high-priority threads are frequent.

Priority Queue Implementation:

The most direct approach uses a priority queue (as shown previously). Waiters are inserted with their priority, and the highest-priority waiter is always extracted first.

Priority Inheritance:

A related mechanism: if a low-priority thread holds a semaphore that a high-priority thread needs, temporarily boost the holder's priority to match the waiter. This ensures the holder runs and releases the resource quickly.

Priority Inheritance Protocol (PIP): Holder inherits highest waiter priority
Priority Ceiling Protocol (PCP): Holder inherits ceiling priority of the resource

Hybrid Approaches:

Some systems use priority-based wakeup with aging:

Low-priority waiters have their effective priority boosted over time
Eventually, even low-priority threads achieve sufficient priority to be woken
Prevents indefinite starvation while respecting priorities in the short term

priority_wakeup.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
// Priority-based wakeup with inheritance
 
// ==================================================
// Priority Semaphore Structure
// ==================================================
typedef struct {
    int value;
    spinlock_t lock;
    
    // Priority queue of waiters
    struct prio_waiter *waiters;  // Heap organized by priority
    int waiter_count;
    
    // For priority inheritance
    struct task_struct *owner;    // Current holder (for binary sems)
    int original_owner_priority;
} priority_semaphore_t;
 
struct prio_waiter {
    struct task_struct *task;
    int base_priority;       // Original priority
    int effective_priority;  // After aging/boosting
    unsigned long wait_start; // For aging calculation
};
 
// ==================================================
// Priority-aware wait
// ==================================================
void prio_sem_wait(priority_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    if (sem->value > 0) {
        sem->value--;
        sem->owner = current;
        sem->original_owner_priority = current->priority;
        spin_unlock(&sem->lock);
        return;
    }
    
    // Must wait
    struct prio_waiter waiter = {
        .task = current,
        .base_priority = current->priority,
        .effective_priority = current->priority,
        .wait_start = jiffies,
    };
    
    // Insert into priority queue
    prio_heap_insert(&sem->waiters, &waiter, &sem->waiter_count);
    
    // Priority inheritance: boost owner if we're higher priority
    if (sem->owner && current->priority < sem->owner->priority) {
        // We're higher priority than owner - boost them
        sem->owner->priority = current->priority;
    }
    
    set_current_state(TASK_INTERRUPTIBLE);
    spin_unlock(&sem->lock);
    schedule();
    
    // Woken - we now own the semaphore
}
 
// ==================================================
// Priority-aware signal
// ==================================================
void prio_sem_signal(priority_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    // Restore our original priority if it was inherited
    if (current == sem->owner) {
        current->priority = sem->original_owner_priority;
        sem->owner = NULL;
    }
    
    if (sem->waiter_count > 0) {
        // Wake highest priority waiter
        struct prio_waiter *waiter = prio_heap_extract_min(
            &sem->waiters, &sem->waiter_count);
        
        // Transfer ownership directly
        sem->owner = waiter->task;
        sem->original_owner_priority = waiter->task->priority;
        
        wake_up_process(waiter->task);
    } else {
        sem->value++;
    }
    
    spin_unlock(&sem->lock);
}
 
// ==================================================
// Priority aging (call periodically or on wakeup check)
// ==================================================
void age_waiters(priority_semaphore_t *sem) {
    unsigned long now = jiffies;
    
    for (int i = 0; i < sem->waiter_count; i++) {
        struct prio_waiter *w = &sem->waiters[i];
        unsigned long wait_time = now - w->wait_start;
        
        // Calculate priority boost based on wait time
        // Example: boost by 1 priority level per 100ms waited
        int boost = wait_time / msecs_to_jiffies(100);
        
        // Apply boost (lower number = higher priority)
        w->effective_priority = max(0, w->base_priority - boost);
    }
    
    // Re-heapify with new priorities
    rebuild_heap(&sem->waiters, sem->waiter_count);
}
 
// ==================================================
// Priority Ceiling Protocol (alternative)
// ==================================================
typedef struct {
    int value;
    spinlock_t lock;
    int ceiling_priority;  // Highest priority of any potential user
    int saved_priority;
} pcp_semaphore_t;
 
void pcp_sem_wait(pcp_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    while (sem->value <= 0) {
        spin_unlock(&sem->lock);
        schedule();
        spin_lock(&sem->lock);
    }
    
    sem->value--;
    
    // Immediately raise to ceiling priority
    sem->saved_priority = current->priority;
    current->priority = sem->ceiling_priority;
    
    spin_unlock(&sem->lock);
}
 
void pcp_sem_signal(pcp_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    // Restore priority
    current->priority = sem->saved_priority;
    sem->value++;
    
    // Wake a waiter if any
    // (Standard wake logic here)
    
    spin_unlock(&sem->lock);
}

Linux Kernel's RT-Mutexes

Linux provides rt_mutex for real-time scenarios. It implements full priority inheritance: when a high-priority task blocks on an rt_mutex, the holder's priority is boosted. This propagates through chains of dependencies. RT-mutexes are used when predictable priority behavior is essential.

The Convoy Problem

The convoy problem is a fairness-related performance pathology. It occurs when a slow process holds a semaphore and many fast processes queue up behind it, creating a "convoy" that serializes through the resource even after the slow process finishes.

The Scenario:

Multiple processes compete for a semaphore
Some processes hold it briefly; one holds it longer
The long-holder causes others to queue
When released, the queue drains one-at-a-time
Even fast processes now acquire sequentially

Why Convoys Persist:

Once a convoy forms, it tends to persist:

FIFO ordering means each waiter awakens, acquires, runs, releases
By the time one finishes, others have queued again
The serialization continues even without the original slow holder
Throughput drops permanently

The Psychology Analogy:

Think of a traffic jam caused by one slow driver. Even after the slow driver exits, the accordion effect keeps cars braking and accelerating for miles.

Mitigation Strategies:

Reduced hold times: Minimize critical section length
Barging: Allow new arrivals to acquire before waking waiters (trades fairness for throughput)
Reader-writer locks: Parallelize readers, only serialize writers
Lock-free algorithms: Avoid serialization entirely
Random backoff: Prevent synchronized arrivals

convoy_demonstration.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
// Convoy problem demonstration
 
#include <pthread.h>
#include <stdio.h>
#include <time.h>
 
pthread_mutex_t convoy_lock = PTHREAD_MUTEX_INITIALIZER;
volatile int convoy_formed = 0;
 
// Measure throughput before and after convoy
long operations_before_convoy = 0;
long operations_after_convoy = 0;
 
void *fast_worker(void *arg) {
    int id = *(int *)arg;
    struct timespec start, end;
    long ops = 0;
    
    clock_gettime(CLOCK_MONOTONIC, &start);
    
    while (1) {
        pthread_mutex_lock(&convoy_lock);
        ops++;
        // Very short critical section
        __asm__ volatile("" ::: "memory");
        pthread_mutex_unlock(&convoy_lock);
        
        clock_gettime(CLOCK_MONOTONIC, &end);
        double elapsed = (end.tv_sec - start.tv_sec) + 
                         (end.tv_nsec - start.tv_nsec) / 1e9;
        
        if (elapsed > 10.0) break;  // 10 second test
        
        // Track when convoy forms
        if (!convoy_formed && elapsed < 5.0) {
            operations_before_convoy = ops;
        }
        if (convoy_formed && elapsed >= 5.0) {
            operations_after_convoy = ops - operations_before_convoy;
        }
    }
    
    return NULL;
}
 
void *slow_worker(void *arg) {
    // Wait a bit, then hold the lock for a long time
    sleep(2);
    
    pthread_mutex_lock(&convoy_lock);
    convoy_formed = 1;
    
    // Hold for 100ms - causes convoy to form
    usleep(100000);
    
    pthread_mutex_unlock(&convoy_lock);
    
    return NULL;
}
 
int main() {
    pthread_t fast_threads[10];
    pthread_t slow_thread;
    int ids[10];
    
    // Start fast workers
    for (int i = 0; i < 10; i++) {
        ids[i] = i;
        pthread_create(&fast_threads[i], NULL, fast_worker, &ids[i]);
    }
    
    // Start the convoy-inducing slow worker
    pthread_create(&slow_thread, NULL, slow_worker, NULL);
    
    // Wait for all threads
    for (int i = 0; i < 10; i++) {
        pthread_join(fast_threads[i], NULL);
    }
    pthread_join(slow_thread, NULL);
    
    printf("Operations in first 5 seconds: %ld\n", operations_before_convoy);
    printf("Operations in last 5 seconds:  %ld\n", operations_after_convoy);
    printf("Throughput ratio: %.2fx\n", 
           (double)operations_after_convoy / operations_before_convoy);
    
    // You'll likely see significantly reduced throughput after convoy forms!
    
    return 0;
}
 
// ==================================================
// Convoy mitigation: barging semaphore
// ==================================================
typedef struct {
    atomic_int value;
    pthread_mutex_t wait_lock;
    pthread_cond_t wait_cond;
    int waiters;
} barging_sem_t;
 
void barging_sem_wait(barging_sem_t *sem) {
    // Try to acquire without waiting (barge attempt)
    int val = atomic_load(&sem->value);
    while (val > 0) {
        if (atomic_compare_exchange_weak(&sem->value, &val, val - 1)) {
            return;  // Acquired by barging!
        }
    }
    
    // Failed to barge - must wait properly
    pthread_mutex_lock(&sem->wait_lock);
    sem->waiters++;
    
    while (atomic_load(&sem->value) <= 0) {
        pthread_cond_wait(&sem->wait_cond, &sem->wait_lock);
    }
    
    atomic_fetch_sub(&sem->value, 1);
    sem->waiters--;
    pthread_mutex_unlock(&sem->wait_lock);
}
 
void barging_sem_signal(barging_sem_t *sem) {
    atomic_fetch_add(&sem->value, 1);
    
    // Wake a waiter if any
    pthread_mutex_lock(&sem->wait_lock);
    if (sem->waiters > 0) {
        pthread_cond_signal(&sem->wait_cond);
    }
    pthread_mutex_unlock(&sem->wait_lock);
}

Convoys and Fairness Trade-off

Strict FIFO fairness exacerbates convoys by preventing barging. Allowing some unfairness (letting arrivals acquire before waking waiters) can break convoy patterns but risks starvation. The right balance depends on workload characteristics.

Fairness in Real Systems

Let's examine how major operating systems and runtimes approach fairness in their semaphore and lock implementations.

Linux Kernel Semaphores:

Linux kernel semaphores use a FIFO wait list. Waiters are added to the tail and woken from the head. This provides strict fairness within the kernel.

Linux Kernel Mutexes:

Linux mutexes are more complex:

Use optimistic spinning for short-term contention
Fall back to blocking with a wait list
FIFO ordering for blocked waiters
Some barging is possible during the spin phase

Linux RT-Mutexes:

For real-time workloads, Linux provides rt_mutex with:

Priority inheritance
Priority queue ordering (highest priority wakes first)
Deadlock detection

POSIX Pthread Mutexes:

POSIX doesn't mandate a specific fairness policy. Implementations vary:

glibc on Linux: Uses futex, generally FIFO-ish but with barging
macOS: Uses os_unfair_lock, explicitly unfair for performance

Java Locks:

Java's ReentrantLock has an explicit fairness parameter:

new ReentrantLock(true) — fair, FIFO ordering
new ReentrantLock(false) — unfair, barging allowed

Fair locks have 2-10x overhead in the uncontended case.

Go Mutexes:

Go's sync.Mutex switched from fair to a hybrid model:

New goroutines can barge if they can spin briefly
After 1ms of waiting, switched to strict FIFO
Prevents both starvation and convoy problems

Fairness Policies in Major Systems
System	Default Policy	Priority Support	Notes
Linux semaphore	FIFO	No	Simple, fair within kernel
Linux mutex	FIFO with spin	Via rt_mutex	Optimistic spinning, then FIFO
Linux rt_mutex	Priority order	Full PI	For real-time requirements
pthread_mutex (glibc)	FIFO-ish	Via PTHREAD_PRIO_INHERIT	Some barging possible
macOS os_unfair_lock	Unfair	No	Explicitly trades fairness for speed
Windows CRITICAL_SECTION	Approximately FIFO	No	Spin count tunable
Java ReentrantLock	Configurable	No	fair=true for FIFO
Go sync.Mutex	Hybrid	No	Barge for 1ms, then FIFO

java_fairness.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import java.util.concurrent.locks.ReentrantLock;
import java.util.concurrent.Semaphore;
 
public class FairnessDemo {
    
    // ==================================================
    // Unfair lock (default) - allows barging
    // ==================================================
    private final ReentrantLock unfairLock = new ReentrantLock(false);
    
    public void unfairOperation() {
        unfairLock.lock();
        try {
            // Critical section
            // New arrivals can jump the queue!
        } finally {
            unfairLock.unlock();
        }
    }
    
    // ==================================================
    // Fair lock - strict FIFO ordering
    // ==================================================
    private final ReentrantLock fairLock = new ReentrantLock(true);
    
    public void fairOperation() {
        fairLock.lock();
        try {
            // Critical section
            // Strict FIFO - longer waits if contended
        } finally {
            fairLock.unlock();
        }
    }
    
    // ==================================================
    // Fair vs Unfair Semaphore
    // ==================================================
    private final Semaphore unfairSem = new Semaphore(10, false);
    private final Semaphore fairSem = new Semaphore(10, true);
    
    public void demonstrateFairness() throws InterruptedException {
        // Fair semaphore guarantees FIFO
        fairSem.acquire();
        try {
            // Threads acquire in order of request
        } finally {
            fairSem.release();
        }
        
        // Unfair semaphore may reorder
        unfairSem.acquire();
        try {
            // Threads may acquire out of order
            // (Typically faster when uncontended)
        } finally {
            unfairSem.release();
        }
    }
    
    // ==================================================
    // Benchmarking fair vs unfair
    // ==================================================
    public static void main(String[] args) throws Exception {
        FairnessDemo demo = new FairnessDemo();
        
        // Benchmark uncontended
        long start = System.nanoTime();
        for (int i = 0; i < 1_000_000; i++) {
            demo.unfairLock.lock();
            demo.unfairLock.unlock();
        }
        long unfairTime = System.nanoTime() - start;
        
        start = System.nanoTime();
        for (int i = 0; i < 1_000_000; i++) {
            demo.fairLock.lock();
            demo.fairLock.unlock();
        }
        long fairTime = System.nanoTime() - start;
        
        System.out.printf("Unfair uncontended: %d ns/op%n", 
                          unfairTime / 1_000_000);
        System.out.printf("Fair uncontended:   %d ns/op%n", 
                          fairTime / 1_000_000);
        System.out.printf("Fair overhead:      %.2fx%n",
                          (double) fairTime / unfairTime);
        
        // Typically fair has 2-10x overhead!
    }
}

Apple's Unfair Philosophy

Apple explicitly named their lock 'os_unfair_lock' to communicate its policy. Their documentation states: 'This lock must be unlocked from the same thread that locked it. A mismatch is undefined behavior.' The unfair policy intentionally allows reacquisition by the releasing thread for cache locality benefits.

Measuring and Ensuring Fairness

How do you verify that your synchronization is fair? Fairness violations can be subtle and load-dependent. Here are techniques for measurement and enforcement.

Metrics for Fairness:

Distribution of acquisitions: Each thread should get roughly equal acquisitions over time
Maximum wait time: No thread should wait disproportionately longer than others
Wait time variance: Fair systems have low variance in wait times
Jain's Fairness Index: A formula giving 1.0 for perfect fairness, lower for unfair

Jain's Fairness Index:

$$FI = \frac{(\sum_{i=1}^{n} x_i)^2}{n \cdot \sum_{i=1}^{n} x_i^2}$$

Where $x_i$ is the throughput (or wait time) for thread $i$. Perfect fairness (all equal) gives FI = 1.0. Complete unfairness (one gets everything) gives FI = 1/n.

Testing for Fairness:

Stress testing: Run many threads competing for extended periods
Check distributions: Are acquisitions evenly distributed?
Monitor max wait: Does any thread wait excessively?
Inject slow holders: Does one slow thread cause permanent degradation?

Runtime Enforcement:

For critical systems, you might add runtime checks:

Track wait start time per waiter
Alert if any waiter exceeds a threshold
Boost priority of long-waiters automatically

fairness_measurement.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
// Fairness measurement utilities
 
#include <stdio.h>
#include <math.h>
#include <pthread.h>
#include <time.h>
 
#define NUM_THREADS 8
 
// Per-thread statistics
typedef struct {
    long acquisitions;          // Number of times acquired
    double total_wait_ns;       // Total wait time
    double max_wait_ns;         // Maximum single wait
    double min_wait_ns;         // Minimum single wait
} thread_stats_t;
 
thread_stats_t stats[NUM_THREADS];
pthread_mutex_t test_lock = PTHREAD_MUTEX_INITIALIZER;
 
// ==================================================
// Jain's Fairness Index calculation
// ==================================================
double jains_fairness_index(long *values, int n) {
    double sum = 0;
    double sum_squares = 0;
    
    for (int i = 0; i < n; i++) {
        sum += values[i];
        sum_squares += (double)values[i] * values[i];
    }
    
    if (sum_squares == 0) return 1.0;  // All zero = fair
    
    return (sum * sum) / (n * sum_squares);
}
 
// ==================================================
// Test worker with timing
// ==================================================
void *timed_worker(void *arg) {
    int id = *(int *)arg;
    struct timespec start, end;
    
    stats[id].min_wait_ns = 1e18;  // Initialize high
    
    for (int i = 0; i < 100000; i++) {
        clock_gettime(CLOCK_MONOTONIC, &start);
        
        pthread_mutex_lock(&test_lock);
        
        clock_gettime(CLOCK_MONOTONIC, &end);
        
        double wait_ns = (end.tv_sec - start.tv_sec) * 1e9 +
                         (end.tv_nsec - start.tv_nsec);
        
        stats[id].acquisitions++;
        stats[id].total_wait_ns += wait_ns;
        if (wait_ns > stats[id].max_wait_ns) 
            stats[id].max_wait_ns = wait_ns;
        if (wait_ns < stats[id].min_wait_ns) 
            stats[id].min_wait_ns = wait_ns;
        
        // Brief critical section
        __asm__ volatile("" ::: "memory");
        
        pthread_mutex_unlock(&test_lock);
    }
    
    return NULL;
}
 
// ==================================================
// Main fairness test
// ==================================================
int main() {
    pthread_t threads[NUM_THREADS];
    int ids[NUM_THREADS];
    long acquisitions[NUM_THREADS];
    
    for (int i = 0; i < NUM_THREADS; i++) {
        ids[i] = i;
        memset(&stats[i], 0, sizeof(thread_stats_t));
        pthread_create(&threads[i], NULL, timed_worker, &ids[i]);
    }
    
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        acquisitions[i] = stats[i].acquisitions;
    }
    
    // Calculate and display fairness metrics
    printf("\n=== Fairness Analysis ===\n\n");
    
    printf("Per-thread statistics:\n");
    printf("%-8s %12s %12s %12s %12s\n", 
           "Thread", "Acquisitions", "Avg Wait(μs)", 
           "Max Wait(μs)", "Min Wait(μs)");
    
    long total_acquisitions = 0;
    double max_variance = 0;
    
    for (int i = 0; i < NUM_THREADS; i++) {
        double avg_wait = stats[i].total_wait_ns / stats[i].acquisitions / 1000;
        printf("%-8d %12ld %12.2f %12.2f %12.2f\n",
               i, stats[i].acquisitions, avg_wait,
               stats[i].max_wait_ns / 1000,
               stats[i].min_wait_ns / 1000);
        total_acquisitions += stats[i].acquisitions;
    }
    
    double expected = total_acquisitions / (double)NUM_THREADS;
    printf("\nExpected per thread: %.0f\n", expected);
    
    double fi = jains_fairness_index(acquisitions, NUM_THREADS);
    printf("Jain's Fairness Index: %.4f (1.0 = perfect)\n", fi);
    
    // Variance analysis
    double variance = 0;
    for (int i = 0; i < NUM_THREADS; i++) {
        double diff = acquisitions[i] - expected;
        variance += diff * diff;
    }
    variance /= NUM_THREADS;
    printf("Variance: %.2f\n", variance);
    printf("Std Dev: %.2f (%.1f%% of expected)\n", 
           sqrt(variance), 
           sqrt(variance) / expected * 100);
    
    // Interpretation
    printf("\n");
    if (fi > 0.99) {
        printf("Result: VERY FAIR - Distribution nearly equal\n");
    } else if (fi > 0.95) {
        printf("Result: FAIR - Minor variance, acceptable\n");
    } else if (fi > 0.90) {
        printf("Result: SOMEWHAT FAIR - Noticeable skew\n");
    } else {
        printf("Result: UNFAIR - Significant starvation risk\n");
    }
    
    return 0;
}

Fairness Testing Checklist

•Run extended stress tests — Short tests may not reveal starvation patterns
•Calculate Jain's Index — Values below 0.95 indicate fairness concerns
•Monitor per-thread max wait — One thread with 10x wait time indicates starvation
•Test with mixed work — Include slow and fast critical sections
•Test with varying thread counts — Fairness issues often scale with contention
•Profile convoy formation — Check if throughput degrades permanently after transient load

Summary: Fairness in Semaphore Implementation

We've explored the nuanced world of fairness in synchronization—from definitions and failure modes through implementation mechanisms to measurement techniques. Let's consolidate the key insights:

Key Takeaways

•Fairness spectrum — From starvation-freedom (weak) through bounded waiting to FIFO (strong); each level costs more
•Starvation danger — Unfair implementations can leave threads waiting indefinitely while the system appears healthy
•Queue disciplines — FIFO provides fairness, LIFO provides locality, priority queue respects urgency
•Priority considerations — Priority-based wakeup aligns with scheduling but can starve low-priority threads
•Convoy problem — Strict fairness can cause persistent convoy patterns that degrade throughput
•Real system tradeoffs — Production systems range from explicitly unfair (macOS) to configurable (Java) to hybrid (Go)
•Measurement matters — Use Jain's Index, max wait times, and distribution analysis to verify fairness properties
•Barging vs FIFO — Barging improves throughput but trades fairness; hybrid approaches try to balance both

What's Next:

With fairness understood, we turn to a related but distinct challenge: priority inversion. When synchronization primitives interact with priority scheduling, unexpected behaviors emerge—a low-priority thread can effectively block a high-priority one. We'll explore this phenomenon and its solutions in the next page.

Page Complete

You now understand fairness considerations in semaphore implementation—from theoretical definitions through practical queue disciplines to real-system policies and measurement techniques. This knowledge enables informed decisions when selecting or configuring synchronization primitives for specific workloads.

4 / 5

Loading learning content...

Operating SystemsSemaphore Implementation

Semaphore Implementation

LevelIntermediate

Duration90 mins

TopicSemaphore Implementation

4 / 5

Fairness Considerations

The Price of Unfairness

This is the fairness problem in synchronization. A semaphore that allows some waiters to be indefinitely bypassed violates a fundamental expectation: that waiting should eventually lead to progress.

What You Will Learn

Defining Fairness

Fairness in synchronization is surprisingly nuanced. There are multiple definitions, each with different guarantees and implementation costs.

Starvation-Freedom (Weak Fairness):

Formally: If thread T waits for the semaphore and other threads repeatedly release it, T will eventually acquire.

Bounded Waiting:

A stronger property: there exists a bound B such that no thread waits for more than B other threads to go ahead of it. This provides predictable wait times.

Formally: If T starts waiting, at most B threads that start waiting after T will acquire before T.

FIFO Fairness (Strong Fairness):

The strongest common guarantee: threads acquire in the order they requested. First-come, first-served. No one cuts in line.

Formally: If T₁ starts waiting before T₂, then T₁ acquires before T₂.

Priority-Respecting:

An alternative fairness model: higher-priority threads should acquire before lower-priority threads, regardless of arrival order. This trades FIFO fairness for priority responsiveness.

Fairness Property Hierarchy
Property	Guarantee	Cost
No guarantee	None (starvation possible)	Minimal overhead
Starvation-free	Eventually acquire	Requires queue structure
Bounded waiting	Acquire within B others	Requires counting or ordering
FIFO	First-come, first-served	Strict queue ordering, O(1) tracking
Priority-respecting	Highest priority first	Priority queue, O(log n) operations

Fairness vs Throughput

Starvation: The Failure Mode

How Starvation Happens:

Consider a naive spinlock-based semaphore:

while (!try_acquire()) { /* spin */ }

When the lock is released, multiple spinners race to acquire. The winner is essentially random—whichever CPU's cache line gets validated first. A pathological pattern can emerge:

Thread A releases, Thread B acquires
Thread B releases, Thread C acquires
Thread C releases, Thread A acquires
(Thread D is always slightly slower... never wins)

Thread D can spin forever while A, B, C rotate through the lock.

Contributing Factors:

CPU speed variations: Some cores slightly faster than others
Cache topology: NUMA effects, shared L3 caches
Interrupt distribution: Some CPUs handle more interrupts
Luck: Random timing alignment

The Barging Problem:

Even with FIFO wait queues, barging can occur. A new arrival might acquire the semaphore before a waiter is fully awakened:

Thread A: signal() -> wakes Thread B
Thread C: arrives, tests semaphore, finds available, acquires!
Thread B: wakes up, finds semaphore acquired, blocks again

Thread C "barged" ahead of Thread B who had been waiting longer.

starvation_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
// Demonstration of starvation-prone semaphore
 
#include <pthread.h>
#include <stdio.h>
#include <stdbool.h>
#include <stdatomic.h>
 
// Simple unfair semaphore (demonstrates starvation)
typedef struct {
    atomic_int value;
} unfair_sem_t;
 
void unfair_sem_init(unfair_sem_t *sem, int initial) {
    atomic_store(&sem->value, initial);
}
 
void unfair_sem_wait(unfair_sem_t *sem) {
    while (1) {
        int val = atomic_load(&sem->value);
        if (val > 0) {
            // Try to decrement
            if (atomic_compare_exchange_weak(&sem->value, &val, val - 1)) {
                return;  // Acquired!
            }
            // CAS failed, retry
        }
        // Spin - no fairness, no queue
    }
}
 
void unfair_sem_signal(unfair_sem_t *sem) {
    atomic_fetch_add(&sem->value, 1);
}
 
// Test case showing potential starvation
unfair_sem_t test_sem;
atomic_int acquisitions[4];  // Per-thread acquisition count
 
void *aggressive_worker(void *arg) {
    int id = *(int *)arg;
    
    for (int i = 0; i < 100000; i++) {
        unfair_sem_wait(&test_sem);
        atomic_fetch_add(&acquisitions[id], 1);
        // Minimal critical section
        unfair_sem_signal(&test_sem);
    }
    
    return NULL;
}
 
int main() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};
    
    unfair_sem_init(&test_sem, 1);
    
    for (int i = 0; i < 4; i++) {
        atomic_store(&acquisitions[i], 0);
        pthread_create(&threads[i], NULL, aggressive_worker, &ids[i]);
    }
    
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);
    }
    
    // Check distribution - unfair implementations show skew
    printf("Acquisitions per thread:\n");
    for (int i = 0; i < 4; i++) {
        printf("  Thread %d: %d\n", i, atomic_load(&acquisitions[i]));
    }
    
    // In an unfair implementation, you might see:
    // Thread 0: 250000
    // Thread 1: 120000  
    // Thread 2: 28000
    // Thread 3: 2000    <-- Severe starvation!
    
    return 0;
}

Real-World Starvation Impact

Queue Disciplines

The wait queue discipline determines the order in which blocked threads are awakened. Different disciplines provide different fairness and performance characteristics.

FIFO Queue:

The most intuitive and commonly used discipline. Threads are added to the tail of the queue and removed from the head. Guarantees:

Strict arrival-order fairness
O(1) enqueue and dequeue
Bounded waiting (at most N-1 threads ahead for N waiters)

LIFO Stack:

Last-in, first-out. New arrivals are favored over long waiters. Seems unfair, but can have advantages:

Cache locality: recently running thread likely has warm cache
Low latency for recent arrivals
Danger: can cause starvation of old waiters

Rarely used for general synchronization but common in memory allocators (recently freed memory is still cached).

Priority Queue:

Orders by thread priority rather than arrival time. High-priority threads jump the queue:

O(log n) operations due to heap maintenance
Respects priority scheduling semantics
Can starve low-priority threads if high-priority traffic is continuous
Common in real-time systems

Random Selection:

Wake a random waiter. Provides probabilistic fairness:

No thread is systematically disadvantaged
Expected wait proportional to time waiting (no starvation in expectation)
Simple implementation
Variance in wait times is high

queue_disciplines.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
// Different queue disciplines for semaphore wait queues
 
// ==================================================
// FIFO Queue (Doubly-Linked List)
// ==================================================
typedef struct fifo_node {
    struct task_struct *task;
    struct fifo_node *next;
    struct fifo_node *prev;
} fifo_node_t;
 
typedef struct {
    fifo_node_t *head;  // Dequeue from here
    fifo_node_t *tail;  // Enqueue here
    int count;
} fifo_queue_t;
 
void fifo_enqueue(fifo_queue_t *q, fifo_node_t *node) {
    node->next = NULL;
    node->prev = q->tail;
    
    if (q->tail) {
        q->tail->next = node;
    } else {
        q->head = node;
    }
    q->tail = node;
    q->count++;
}
 
fifo_node_t *fifo_dequeue(fifo_queue_t *q) {
    fifo_node_t *node = q->head;
    if (!node) return NULL;
    
    q->head = node->next;
    if (q->head) {
        q->head->prev = NULL;
    } else {
        q->tail = NULL;
    }
    q->count--;
    return node;
}
 
// ==================================================
// LIFO Stack (Singly-Linked)
// ==================================================
typedef struct lifo_node {
    struct task_struct *task;
    struct lifo_node *next;
} lifo_node_t;
 
typedef struct {
    lifo_node_t *top;
    int count;
} lifo_stack_t;
 
void lifo_push(lifo_stack_t *s, lifo_node_t *node) {
    node->next = s->top;
    s->top = node;
    s->count++;
}
 
lifo_node_t *lifo_pop(lifo_stack_t *s) {
    lifo_node_t *node = s->top;
    if (!node) return NULL;
    
    s->top = node->next;
    s->count--;
    return node;
}
 
// ==================================================
// Priority Queue (Min-Heap by priority)
// ==================================================
typedef struct {
    struct task_struct *task;
    int priority;  // Lower number = higher priority
} prio_node_t;
 
typedef struct {
    prio_node_t *nodes;
    int count;
    int capacity;
} priority_queue_t;
 
void prio_insert(priority_queue_t *pq, prio_node_t node) {
    // Add at end
    int i = pq->count++;
    pq->nodes[i] = node;
    
    // Bubble up
    while (i > 0) {
        int parent = (i - 1) / 2;
        if (pq->nodes[i].priority >= pq->nodes[parent].priority)
            break;
        
        // Swap with parent
        prio_node_t tmp = pq->nodes[i];
        pq->nodes[i] = pq->nodes[parent];
        pq->nodes[parent] = tmp;
        i = parent;
    }
}
 
prio_node_t prio_extract_min(priority_queue_t *pq) {
    prio_node_t min = pq->nodes[0];
    pq->nodes[0] = pq->nodes[--pq->count];
    
    // Bubble down
    int i = 0;
    while (2*i + 1 < pq->count) {
        int left = 2*i + 1;
        int right = 2*i + 2;
        int smallest = i;
        
        if (pq->nodes[left].priority < pq->nodes[smallest].priority)
            smallest = left;
        if (right < pq->count && 
            pq->nodes[right].priority < pq->nodes[smallest].priority)
            smallest = right;
        
        if (smallest == i) break;
        
        prio_node_t tmp = pq->nodes[i];
        pq->nodes[i] = pq->nodes[smallest];
        pq->nodes[smallest] = tmp;
        i = smallest;
    }
    
    return min;
}
 
// ==================================================
// Fair Semaphore using FIFO Queue
// ==================================================
typedef struct {
    int value;
    spinlock_t lock;
    fifo_queue_t waiters;
} fair_semaphore_t;
 
void fair_sem_wait(fair_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    if (sem->value > 0) {
        sem->value--;
        spin_unlock(&sem->lock);
        return;
    }
    
    // Must wait - add to FIFO queue
    fifo_node_t wait_node;
    wait_node.task = current;
    fifo_enqueue(&sem->waiters, &wait_node);
    
    set_current_state(TASK_INTERRUPTIBLE);
    spin_unlock(&sem->lock);
    schedule();  // Sleep until woken
    
    // When we wake, we're guaranteed to have the semaphore (FIFO)
}
 
void fair_sem_signal(fair_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    fifo_node_t *waiter = fifo_dequeue(&sem->waiters);
    if (waiter) {
        // Wake the longest-waiting thread
        // Don't increment value - transferring directly
        wake_up_process(waiter->task);
    } else {
        // No waiters - increment for future acquirer
        sem->value++;
    }
    
    spin_unlock(&sem->lock);
}

Queue Discipline Comparison

•FIFO — Fair, predictable, O(1) operations; standard choice for general use
•LIFO — Cache-friendly, unfair, risks starvation; used in specialized scenarios (allocators)
•Priority — Respects urgency, O(log n), can starve low priority; essential for real-time
•Random — Statistically fair, high variance, simple; rarely used in practice

Priority-Based Wakeup

The Tension:

Conversely, always waking the highest-priority waiter can starve low-priority threads indefinitely—especially if high-priority threads are frequent.

Priority Queue Implementation:

The most direct approach uses a priority queue (as shown previously). Waiters are inserted with their priority, and the highest-priority waiter is always extracted first.

Priority Inheritance:

Priority Inheritance Protocol (PIP): Holder inherits highest waiter priority
Priority Ceiling Protocol (PCP): Holder inherits ceiling priority of the resource

Hybrid Approaches:

Some systems use priority-based wakeup with aging:

Low-priority waiters have their effective priority boosted over time
Eventually, even low-priority threads achieve sufficient priority to be woken
Prevents indefinite starvation while respecting priorities in the short term

priority_wakeup.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
// Priority-based wakeup with inheritance
 
// ==================================================
// Priority Semaphore Structure
// ==================================================
typedef struct {
    int value;
    spinlock_t lock;
    
    // Priority queue of waiters
    struct prio_waiter *waiters;  // Heap organized by priority
    int waiter_count;
    
    // For priority inheritance
    struct task_struct *owner;    // Current holder (for binary sems)
    int original_owner_priority;
} priority_semaphore_t;
 
struct prio_waiter {
    struct task_struct *task;
    int base_priority;       // Original priority
    int effective_priority;  // After aging/boosting
    unsigned long wait_start; // For aging calculation
};
 
// ==================================================
// Priority-aware wait
// ==================================================
void prio_sem_wait(priority_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    if (sem->value > 0) {
        sem->value--;
        sem->owner = current;
        sem->original_owner_priority = current->priority;
        spin_unlock(&sem->lock);
        return;
    }
    
    // Must wait
    struct prio_waiter waiter = {
        .task = current,
        .base_priority = current->priority,
        .effective_priority = current->priority,
        .wait_start = jiffies,
    };
    
    // Insert into priority queue
    prio_heap_insert(&sem->waiters, &waiter, &sem->waiter_count);
    
    // Priority inheritance: boost owner if we're higher priority
    if (sem->owner && current->priority < sem->owner->priority) {
        // We're higher priority than owner - boost them
        sem->owner->priority = current->priority;
    }
    
    set_current_state(TASK_INTERRUPTIBLE);
    spin_unlock(&sem->lock);
    schedule();
    
    // Woken - we now own the semaphore
}
 
// ==================================================
// Priority-aware signal
// ==================================================
void prio_sem_signal(priority_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    // Restore our original priority if it was inherited
    if (current == sem->owner) {
        current->priority = sem->original_owner_priority;
        sem->owner = NULL;
    }
    
    if (sem->waiter_count > 0) {
        // Wake highest priority waiter
        struct prio_waiter *waiter = prio_heap_extract_min(
            &sem->waiters, &sem->waiter_count);
        
        // Transfer ownership directly
        sem->owner = waiter->task;
        sem->original_owner_priority = waiter->task->priority;
        
        wake_up_process(waiter->task);
    } else {
        sem->value++;
    }
    
    spin_unlock(&sem->lock);
}
 
// ==================================================
// Priority aging (call periodically or on wakeup check)
// ==================================================
void age_waiters(priority_semaphore_t *sem) {
    unsigned long now = jiffies;
    
    for (int i = 0; i < sem->waiter_count; i++) {
        struct prio_waiter *w = &sem->waiters[i];
        unsigned long wait_time = now - w->wait_start;
        
        // Calculate priority boost based on wait time
        // Example: boost by 1 priority level per 100ms waited
        int boost = wait_time / msecs_to_jiffies(100);
        
        // Apply boost (lower number = higher priority)
        w->effective_priority = max(0, w->base_priority - boost);
    }
    
    // Re-heapify with new priorities
    rebuild_heap(&sem->waiters, sem->waiter_count);
}
 
// ==================================================
// Priority Ceiling Protocol (alternative)
// ==================================================
typedef struct {
    int value;
    spinlock_t lock;
    int ceiling_priority;  // Highest priority of any potential user
    int saved_priority;
} pcp_semaphore_t;
 
void pcp_sem_wait(pcp_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    while (sem->value <= 0) {
        spin_unlock(&sem->lock);
        schedule();
        spin_lock(&sem->lock);
    }
    
    sem->value--;
    
    // Immediately raise to ceiling priority
    sem->saved_priority = current->priority;
    current->priority = sem->ceiling_priority;
    
    spin_unlock(&sem->lock);
}
 
void pcp_sem_signal(pcp_semaphore_t *sem) {
    spin_lock(&sem->lock);
    
    // Restore priority
    current->priority = sem->saved_priority;
    sem->value++;
    
    // Wake a waiter if any
    // (Standard wake logic here)
    
    spin_unlock(&sem->lock);
}

Linux Kernel's RT-Mutexes

The Convoy Problem

The Scenario:

Multiple processes compete for a semaphore
Some processes hold it briefly; one holds it longer
The long-holder causes others to queue
When released, the queue drains one-at-a-time
Even fast processes now acquire sequentially

Why Convoys Persist:

Once a convoy forms, it tends to persist:

FIFO ordering means each waiter awakens, acquires, runs, releases
By the time one finishes, others have queued again
The serialization continues even without the original slow holder
Throughput drops permanently

The Psychology Analogy:

Think of a traffic jam caused by one slow driver. Even after the slow driver exits, the accordion effect keeps cars braking and accelerating for miles.

Mitigation Strategies:

Reduced hold times: Minimize critical section length
Barging: Allow new arrivals to acquire before waking waiters (trades fairness for throughput)
Reader-writer locks: Parallelize readers, only serialize writers
Lock-free algorithms: Avoid serialization entirely
Random backoff: Prevent synchronized arrivals

convoy_demonstration.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
// Convoy problem demonstration
 
#include <pthread.h>
#include <stdio.h>
#include <time.h>
 
pthread_mutex_t convoy_lock = PTHREAD_MUTEX_INITIALIZER;
volatile int convoy_formed = 0;
 
// Measure throughput before and after convoy
long operations_before_convoy = 0;
long operations_after_convoy = 0;
 
void *fast_worker(void *arg) {
    int id = *(int *)arg;
    struct timespec start, end;
    long ops = 0;
    
    clock_gettime(CLOCK_MONOTONIC, &start);
    
    while (1) {
        pthread_mutex_lock(&convoy_lock);
        ops++;
        // Very short critical section
        __asm__ volatile("" ::: "memory");
        pthread_mutex_unlock(&convoy_lock);
        
        clock_gettime(CLOCK_MONOTONIC, &end);
        double elapsed = (end.tv_sec - start.tv_sec) + 
                         (end.tv_nsec - start.tv_nsec) / 1e9;
        
        if (elapsed > 10.0) break;  // 10 second test
        
        // Track when convoy forms
        if (!convoy_formed && elapsed < 5.0) {
            operations_before_convoy = ops;
        }
        if (convoy_formed && elapsed >= 5.0) {
            operations_after_convoy = ops - operations_before_convoy;
        }
    }
    
    return NULL;
}
 
void *slow_worker(void *arg) {
    // Wait a bit, then hold the lock for a long time
    sleep(2);
    
    pthread_mutex_lock(&convoy_lock);
    convoy_formed = 1;
    
    // Hold for 100ms - causes convoy to form
    usleep(100000);
    
    pthread_mutex_unlock(&convoy_lock);
    
    return NULL;
}
 
int main() {
    pthread_t fast_threads[10];
    pthread_t slow_thread;
    int ids[10];
    
    // Start fast workers
    for (int i = 0; i < 10; i++) {
        ids[i] = i;
        pthread_create(&fast_threads[i], NULL, fast_worker, &ids[i]);
    }
    
    // Start the convoy-inducing slow worker
    pthread_create(&slow_thread, NULL, slow_worker, NULL);
    
    // Wait for all threads
    for (int i = 0; i < 10; i++) {
        pthread_join(fast_threads[i], NULL);
    }
    pthread_join(slow_thread, NULL);
    
    printf("Operations in first 5 seconds: %ld\n", operations_before_convoy);
    printf("Operations in last 5 seconds:  %ld\n", operations_after_convoy);
    printf("Throughput ratio: %.2fx\n", 
           (double)operations_after_convoy / operations_before_convoy);
    
    // You'll likely see significantly reduced throughput after convoy forms!
    
    return 0;
}
 
// ==================================================
// Convoy mitigation: barging semaphore
// ==================================================
typedef struct {
    atomic_int value;
    pthread_mutex_t wait_lock;
    pthread_cond_t wait_cond;
    int waiters;
} barging_sem_t;
 
void barging_sem_wait(barging_sem_t *sem) {
    // Try to acquire without waiting (barge attempt)
    int val = atomic_load(&sem->value);
    while (val > 0) {
        if (atomic_compare_exchange_weak(&sem->value, &val, val - 1)) {
            return;  // Acquired by barging!
        }
    }
    
    // Failed to barge - must wait properly
    pthread_mutex_lock(&sem->wait_lock);
    sem->waiters++;
    
    while (atomic_load(&sem->value) <= 0) {
        pthread_cond_wait(&sem->wait_cond, &sem->wait_lock);
    }
    
    atomic_fetch_sub(&sem->value, 1);
    sem->waiters--;
    pthread_mutex_unlock(&sem->wait_lock);
}
 
void barging_sem_signal(barging_sem_t *sem) {
    atomic_fetch_add(&sem->value, 1);
    
    // Wake a waiter if any
    pthread_mutex_lock(&sem->wait_lock);
    if (sem->waiters > 0) {
        pthread_cond_signal(&sem->wait_cond);
    }
    pthread_mutex_unlock(&sem->wait_lock);
}

Convoys and Fairness Trade-off

Fairness in Real Systems

Let's examine how major operating systems and runtimes approach fairness in their semaphore and lock implementations.

Linux Kernel Semaphores:

Linux kernel semaphores use a FIFO wait list. Waiters are added to the tail and woken from the head. This provides strict fairness within the kernel.

Linux Kernel Mutexes:

Linux mutexes are more complex:

Use optimistic spinning for short-term contention
Fall back to blocking with a wait list
FIFO ordering for blocked waiters
Some barging is possible during the spin phase

Linux RT-Mutexes:

For real-time workloads, Linux provides rt_mutex with:

Priority inheritance
Priority queue ordering (highest priority wakes first)
Deadlock detection

POSIX Pthread Mutexes:

POSIX doesn't mandate a specific fairness policy. Implementations vary:

glibc on Linux: Uses futex, generally FIFO-ish but with barging
macOS: Uses os_unfair_lock, explicitly unfair for performance

Java Locks:

Java's ReentrantLock has an explicit fairness parameter:

new ReentrantLock(true) — fair, FIFO ordering
new ReentrantLock(false) — unfair, barging allowed

Fair locks have 2-10x overhead in the uncontended case.

Go Mutexes:

Go's sync.Mutex switched from fair to a hybrid model:

New goroutines can barge if they can spin briefly
After 1ms of waiting, switched to strict FIFO
Prevents both starvation and convoy problems

Fairness Policies in Major Systems
System	Default Policy	Priority Support	Notes
Linux semaphore	FIFO	No	Simple, fair within kernel
Linux mutex	FIFO with spin	Via rt_mutex	Optimistic spinning, then FIFO
Linux rt_mutex	Priority order	Full PI	For real-time requirements
pthread_mutex (glibc)	FIFO-ish	Via PTHREAD_PRIO_INHERIT	Some barging possible
macOS os_unfair_lock	Unfair	No	Explicitly trades fairness for speed
Windows CRITICAL_SECTION	Approximately FIFO	No	Spin count tunable
Java ReentrantLock	Configurable	No	fair=true for FIFO
Go sync.Mutex	Hybrid	No	Barge for 1ms, then FIFO

java_fairness.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import java.util.concurrent.locks.ReentrantLock;
import java.util.concurrent.Semaphore;
 
public class FairnessDemo {
    
    // ==================================================
    // Unfair lock (default) - allows barging
    // ==================================================
    private final ReentrantLock unfairLock = new ReentrantLock(false);
    
    public void unfairOperation() {
        unfairLock.lock();
        try {
            // Critical section
            // New arrivals can jump the queue!
        } finally {
            unfairLock.unlock();
        }
    }
    
    // ==================================================
    // Fair lock - strict FIFO ordering
    // ==================================================
    private final ReentrantLock fairLock = new ReentrantLock(true);
    
    public void fairOperation() {
        fairLock.lock();
        try {
            // Critical section
            // Strict FIFO - longer waits if contended
        } finally {
            fairLock.unlock();
        }
    }
    
    // ==================================================
    // Fair vs Unfair Semaphore
    // ==================================================
    private final Semaphore unfairSem = new Semaphore(10, false);
    private final Semaphore fairSem = new Semaphore(10, true);
    
    public void demonstrateFairness() throws InterruptedException {
        // Fair semaphore guarantees FIFO
        fairSem.acquire();
        try {
            // Threads acquire in order of request
        } finally {
            fairSem.release();
        }
        
        // Unfair semaphore may reorder
        unfairSem.acquire();
        try {
            // Threads may acquire out of order
            // (Typically faster when uncontended)
        } finally {
            unfairSem.release();
        }
    }
    
    // ==================================================
    // Benchmarking fair vs unfair
    // ==================================================
    public static void main(String[] args) throws Exception {
        FairnessDemo demo = new FairnessDemo();
        
        // Benchmark uncontended
        long start = System.nanoTime();
        for (int i = 0; i < 1_000_000; i++) {
            demo.unfairLock.lock();
            demo.unfairLock.unlock();
        }
        long unfairTime = System.nanoTime() - start;
        
        start = System.nanoTime();
        for (int i = 0; i < 1_000_000; i++) {
            demo.fairLock.lock();
            demo.fairLock.unlock();
        }
        long fairTime = System.nanoTime() - start;
        
        System.out.printf("Unfair uncontended: %d ns/op%n", 
                          unfairTime / 1_000_000);
        System.out.printf("Fair uncontended:   %d ns/op%n", 
                          fairTime / 1_000_000);
        System.out.printf("Fair overhead:      %.2fx%n",
                          (double) fairTime / unfairTime);
        
        // Typically fair has 2-10x overhead!
    }
}

Apple's Unfair Philosophy

Measuring and Ensuring Fairness

How do you verify that your synchronization is fair? Fairness violations can be subtle and load-dependent. Here are techniques for measurement and enforcement.

Metrics for Fairness:

Distribution of acquisitions: Each thread should get roughly equal acquisitions over time
Maximum wait time: No thread should wait disproportionately longer than others
Wait time variance: Fair systems have low variance in wait times
Jain's Fairness Index: A formula giving 1.0 for perfect fairness, lower for unfair

Jain's Fairness Index:

$$FI = \frac{(\sum_{i=1}^{n} x_i)^2}{n \cdot \sum_{i=1}^{n} x_i^2}$$

Where $x_i$ is the throughput (or wait time) for thread $i$. Perfect fairness (all equal) gives FI = 1.0. Complete unfairness (one gets everything) gives FI = 1/n.

Testing for Fairness:

Stress testing: Run many threads competing for extended periods
Check distributions: Are acquisitions evenly distributed?
Monitor max wait: Does any thread wait excessively?
Inject slow holders: Does one slow thread cause permanent degradation?

Runtime Enforcement:

For critical systems, you might add runtime checks:

Track wait start time per waiter
Alert if any waiter exceeds a threshold
Boost priority of long-waiters automatically

fairness_measurement.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
// Fairness measurement utilities
 
#include <stdio.h>
#include <math.h>
#include <pthread.h>
#include <time.h>
 
#define NUM_THREADS 8
 
// Per-thread statistics
typedef struct {
    long acquisitions;          // Number of times acquired
    double total_wait_ns;       // Total wait time
    double max_wait_ns;         // Maximum single wait
    double min_wait_ns;         // Minimum single wait
} thread_stats_t;
 
thread_stats_t stats[NUM_THREADS];
pthread_mutex_t test_lock = PTHREAD_MUTEX_INITIALIZER;
 
// ==================================================
// Jain's Fairness Index calculation
// ==================================================
double jains_fairness_index(long *values, int n) {
    double sum = 0;
    double sum_squares = 0;
    
    for (int i = 0; i < n; i++) {
        sum += values[i];
        sum_squares += (double)values[i] * values[i];
    }
    
    if (sum_squares == 0) return 1.0;  // All zero = fair
    
    return (sum * sum) / (n * sum_squares);
}
 
// ==================================================
// Test worker with timing
// ==================================================
void *timed_worker(void *arg) {
    int id = *(int *)arg;
    struct timespec start, end;
    
    stats[id].min_wait_ns = 1e18;  // Initialize high
    
    for (int i = 0; i < 100000; i++) {
        clock_gettime(CLOCK_MONOTONIC, &start);
        
        pthread_mutex_lock(&test_lock);
        
        clock_gettime(CLOCK_MONOTONIC, &end);
        
        double wait_ns = (end.tv_sec - start.tv_sec) * 1e9 +
                         (end.tv_nsec - start.tv_nsec);
        
        stats[id].acquisitions++;
        stats[id].total_wait_ns += wait_ns;
        if (wait_ns > stats[id].max_wait_ns) 
            stats[id].max_wait_ns = wait_ns;
        if (wait_ns < stats[id].min_wait_ns) 
            stats[id].min_wait_ns = wait_ns;
        
        // Brief critical section
        __asm__ volatile("" ::: "memory");
        
        pthread_mutex_unlock(&test_lock);
    }
    
    return NULL;
}
 
// ==================================================
// Main fairness test
// ==================================================
int main() {
    pthread_t threads[NUM_THREADS];
    int ids[NUM_THREADS];
    long acquisitions[NUM_THREADS];
    
    for (int i = 0; i < NUM_THREADS; i++) {
        ids[i] = i;
        memset(&stats[i], 0, sizeof(thread_stats_t));
        pthread_create(&threads[i], NULL, timed_worker, &ids[i]);
    }
    
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        acquisitions[i] = stats[i].acquisitions;
    }
    
    // Calculate and display fairness metrics
    printf("\n=== Fairness Analysis ===\n\n");
    
    printf("Per-thread statistics:\n");
    printf("%-8s %12s %12s %12s %12s\n", 
           "Thread", "Acquisitions", "Avg Wait(μs)", 
           "Max Wait(μs)", "Min Wait(μs)");
    
    long total_acquisitions = 0;
    double max_variance = 0;
    
    for (int i = 0; i < NUM_THREADS; i++) {
        double avg_wait = stats[i].total_wait_ns / stats[i].acquisitions / 1000;
        printf("%-8d %12ld %12.2f %12.2f %12.2f\n",
               i, stats[i].acquisitions, avg_wait,
               stats[i].max_wait_ns / 1000,
               stats[i].min_wait_ns / 1000);
        total_acquisitions += stats[i].acquisitions;
    }
    
    double expected = total_acquisitions / (double)NUM_THREADS;
    printf("\nExpected per thread: %.0f\n", expected);
    
    double fi = jains_fairness_index(acquisitions, NUM_THREADS);
    printf("Jain's Fairness Index: %.4f (1.0 = perfect)\n", fi);
    
    // Variance analysis
    double variance = 0;
    for (int i = 0; i < NUM_THREADS; i++) {
        double diff = acquisitions[i] - expected;
        variance += diff * diff;
    }
    variance /= NUM_THREADS;
    printf("Variance: %.2f\n", variance);
    printf("Std Dev: %.2f (%.1f%% of expected)\n", 
           sqrt(variance), 
           sqrt(variance) / expected * 100);
    
    // Interpretation
    printf("\n");
    if (fi > 0.99) {
        printf("Result: VERY FAIR - Distribution nearly equal\n");
    } else if (fi > 0.95) {
        printf("Result: FAIR - Minor variance, acceptable\n");
    } else if (fi > 0.90) {
        printf("Result: SOMEWHAT FAIR - Noticeable skew\n");
    } else {
        printf("Result: UNFAIR - Significant starvation risk\n");
    }
    
    return 0;
}

Fairness Testing Checklist

•Run extended stress tests — Short tests may not reveal starvation patterns
•Calculate Jain's Index — Values below 0.95 indicate fairness concerns
•Monitor per-thread max wait — One thread with 10x wait time indicates starvation
•Test with mixed work — Include slow and fast critical sections
•Test with varying thread counts — Fairness issues often scale with contention
•Profile convoy formation — Check if throughput degrades permanently after transient load

Summary: Fairness in Semaphore Implementation

We've explored the nuanced world of fairness in synchronization—from definitions and failure modes through implementation mechanisms to measurement techniques. Let's consolidate the key insights:

Key Takeaways

•Fairness spectrum — From starvation-freedom (weak) through bounded waiting to FIFO (strong); each level costs more
•Starvation danger — Unfair implementations can leave threads waiting indefinitely while the system appears healthy
•Queue disciplines — FIFO provides fairness, LIFO provides locality, priority queue respects urgency
•Priority considerations — Priority-based wakeup aligns with scheduling but can starve low-priority threads
•Convoy problem — Strict fairness can cause persistent convoy patterns that degrade throughput
•Real system tradeoffs — Production systems range from explicitly unfair (macOS) to configurable (Java) to hybrid (Go)
•Measurement matters — Use Jain's Index, max wait times, and distribution analysis to verify fairness properties
•Barging vs FIFO — Barging improves throughput but trades fairness; hybrid approaches try to balance both

What's Next:

Page Complete

4 / 5