Operating SystemsSemaphore Implementation

Semaphore Implementation

LevelIntermediate

Duration90 mins

TopicSemaphore Implementation

3 / 5

Kernel Implementation

The Operating System's Role

User-space applications don't implement semaphores from scratch—they request semaphore services from the operating system kernel. The kernel provides semaphores as a protected abstraction, managing the state, enforcing access control, and integrating with process scheduling.

This relationship is fundamental: the kernel is the ultimate arbiter of resource access. It can atomically modify semaphore state while simultaneously manipulating scheduler data structures, putting processes to sleep, and waking them up. User-space code cannot do this safely without kernel assistance.

This page explores how operating system kernels implement semaphores—the data structures stored in kernel memory, the system call interfaces exposed to applications, and the deep integration with process lifecycle management. We'll examine both the venerable System V semaphores and the more modern POSIX semaphore implementation.

What You Will Learn

By the end of this page, you will understand: the system call interface for semaphore operations, kernel data structures for semaphore management, the integration with process scheduler and wait queues, differences between System V and POSIX semaphores, and how the kernel handles semaphore lifecycle from creation to destruction.

Why Kernel-Level Semaphores

Before diving into implementation, let's understand why semaphores are typically kernel-provided rather than purely user-space constructs.

The Problem with User-Space Only:

Imagine trying to implement a blocking semaphore entirely in user-space:

Thread A checks the semaphore, finds it occupied
Thread A needs to "sleep" — but how?
Busy-waiting wastes CPU (we covered this)
True sleeping requires kernel involvement

User-space code cannot modify scheduler data structures or put itself on wait queues. Only the kernel can manipulate process state (RUNNING → SLEEPING) and scheduler run queues.

The Atomicity Problem:

Even if you could sleep in user-space, the operation "check value and decide to sleep" must be atomic with the sleep itself. Otherwise:

Thread A: if (sem.value == 0) {
Thread B: // signals the semaphore
Thread B: // finds no one waiting
Thread A:     sleep(); // LOST WAKEUP!
}

The kernel can make this atomic because it controls when interrupts/preemption occur and when other processes run.

Cross-Process Coordination:

Semaphores often coordinate between different processes, not just threads. User-space memory is private to each process. The kernel can provide shared semaphore state visible to multiple processes, stored in kernel memory or in shared memory regions.

Security and Access Control:

The kernel can enforce permissions on semaphores—which users/processes can create, use, or destroy them. User-space cannot enforce such access control.

Kernel Semaphore Advantages

•True blocking — Only the kernel can put processes to sleep and manage wait queues
•Atomic check-and-sleep — Kernel controls preemption, preventing lost wakeups
•Inter-process coordination — Kernel memory is shared; user memory is per-process
•Resource accounting — Kernel tracks semaphores as system resources with quotas
•Access control — Permissions can be enforced on semaphore creation and access
•Cleanup on process termination — Kernel can detect when a process dies holding a semaphore

Hybrid Approaches: Futex

Modern systems use hybrid approaches like Linux's futex (fast userspace mutex). The fast path (uncontended acquisition) happens entirely in user-space with atomic operations. The slow path (blocking) makes a system call to the kernel. This combines user-space speed with kernel-level blocking.

Kernel Data Structures

Operating system kernels maintain extensive data structures to track semaphores. These structures live in kernel memory—protected from user-space corruption and shared across all processes.

Core Semaphore Structure (Linux-style):

The kernel semaphore structure contains:

The counter (count): Current semaphore value
Wait queue head: List of threads blocked on this semaphore
Lock: Spinlock protecting the structure
Owner tracking (optional): For debugging and priority inheritance

Per-CPU Considerations:

On multiprocessor systems, semaphore operations must be SMP-safe. The internal spinlock serializes access from different CPUs. Some implementations use per-CPU wait queues or lock striping for reduced contention.

Lifecycle Management:

Kernel semaphores have clear lifecycle states:

Uninitialized: Memory allocated but not initialized
Initialized: Count set, wait queue empty, ready for use
In use: Count may be zero, waiters may be present
Destroyed: Marked invalid, any waiters must be addressed

kernel_semaphore_structures.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// Linux kernel semaphore structure (include/linux/semaphore.h)
 
struct semaphore {
    raw_spinlock_t      lock;       // Protects count and wait_list
    unsigned int        count;      // Current value (resources available)
    struct list_head    wait_list;  // Threads waiting to acquire
};
 
// Initialization macro
#define __SEMAPHORE_INITIALIZER(name, n) {            \
    .lock       = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \
    .count      = n,                                   \
    .wait_list  = LIST_HEAD_INIT((name).wait_list),   \
}
 
// Static declaration
#define DEFINE_SEMAPHORE(name)  \
    struct semaphore name = __SEMAPHORE_INITIALIZER(name, 1)
 
// Runtime initialization
static inline void sema_init(struct semaphore *sem, int val) {
    static struct lock_class_key __key;
    *sem = (struct semaphore) __SEMAPHORE_INITIALIZER(*sem, val);
    lockdep_init_map(&sem->lock.dep_map, "semaphore->lock", &__key, 0);
}
 
// Wait queue entry for semaphores
struct semaphore_waiter {
    struct list_head list;      // Links in semaphore's wait_list
    struct task_struct *task;   // The waiting process/thread
    bool up;                    // Set to true when woken
};
 
// ==================================================
// System V IPC Semaphore Structures
// ==================================================
 
// System V semaphore set (can contain multiple semaphores)
struct sem_array {
    struct kern_ipc_perm  sem_perm;    // Permissions and IPC key
    time64_t              sem_ctime;   // Last change time
    time64_t              sem_otime;   // Last semop time
    struct list_head      pending_alter;  // Pending operations
    struct list_head      pending_const;  // Pending non-alter ops
    struct list_head      *pending;       // Array of pending lists
    struct sem            *sems;          // Array of semaphores
    int                   sem_nsems;      // Number of semaphores in set
    int                   complex_count;  // Operations affecting multiple sems
    unsigned int          use_global_lock;
    spinlock_t            global_lock;    // Sometimes needed for atomicity
};
 
// Individual semaphore in a System V set
struct sem {
    int                   semval;      // Current value
    int                   sempid;      // PID of last operation
    spinlock_t            lock;        // Per-semaphore lock
    struct list_head      pending_alter;
    struct list_head      pending_const;
    time64_t              sem_otime;   // Last operation time
};
 
// ==================================================
// POSIX Semaphore Structures (user-space backed)
// ==================================================
 
// POSIX named semaphore (simplified)
typedef struct {
    unsigned int    value;        // Current count
    unsigned int    waiters;      // Number of waiters (for futex)
    // Named semaphores also have:
    // - Reference count
    // - Name hash for lookup
    // - File descriptor (if using fd-based implementation)
} posix_sem_t;

Semaphore Structure Components
Component	Purpose	Critical For
Value/Count	Current number of available resources	Determining if caller can proceed
Wait list	Queue of blocked processes/threads	Waking waiters on signal
Spinlock	Mutual exclusion for structure access	SMP safety, atomicity
Owner/PID	Which process last operated	Debugging, priority inheritance
Permissions	Access control (System V)	Security enforcement

System Call Interface

User-space programs interact with kernel semaphores through system calls. These calls cross the user-kernel boundary, entering kernel mode to manipulate kernel data structures.

The System Call Mechanism:

User-space program invokes a library wrapper (e.g., sem_wait())
Wrapper prepares arguments and issues a system call (e.g., via syscall instruction on x86)
CPU transitions to kernel mode; kernel's syscall handler runs
Kernel performs the operation with full access to kernel data structures
Result is returned; CPU transitions back to user mode
Library wrapper returns to the user program

POSIX Semaphore System Calls:

POSIX semaphores may use different underlying implementations:

Named semaphores: Often implemented via files and fcntl locks, or dedicated syscalls
Unnamed semaphores: Often implemented using futex on Linux

System V Semaphore System Calls:

System V provides explicit syscalls:

semget(): Create or get a semaphore set
semop(): Perform operations on semaphores
semctl(): Control operations (get value, set value, destroy)

Linux Futex System Call:

Modern POSIX semaphores often use futex() as the building block:

User-space atomic check: if successful, no syscall needed
futex(FUTEX_WAIT): Block until value changes
futex(FUTEX_WAKE): Wake blocked waiters

semaphore_syscalls.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
// System call implementations for semaphores
 
// ==================================================
// POSIX sem_wait() - kernel side (simplified)
// ==================================================
SYSCALL_DEFINE2(sem_wait, sem_t __user *, sem, 
                const struct timespec __user *, timeout) {
    struct semaphore *ksem;
    int result;
    
    // Validate user pointer
    if (!access_ok(sem, sizeof(sem_t)))
        return -EFAULT;
    
    // Get kernel semaphore from user handle
    ksem = lookup_semaphore(sem);
    if (IS_ERR(ksem))
        return PTR_ERR(ksem);
    
    // Perform the down operation
    if (timeout)
        result = down_timeout(ksem, timespec_to_jiffies(timeout));
    else
        result = down_interruptible(ksem);
    
    return result;
}
 
// ==================================================
// Linux kernel down_interruptible() implementation
// ==================================================
int down_interruptible(struct semaphore *sem) {
    unsigned long flags;
    int result = 0;
    
    raw_spin_lock_irqsave(&sem->lock, flags);
    
    if (likely(sem->count > 0)) {
        // Fast path: semaphore available
        sem->count--;
    } else {
        // Slow path: must block
        result = __down_interruptible(sem);
    }
    
    raw_spin_unlock_irqrestore(&sem->lock, flags);
    return result;
}
 
// Slow path: block until resource available or signal
static noinline int __down_interruptible(struct semaphore *sem) {
    struct semaphore_waiter waiter;
    int result = 0;
    
    // Add ourselves to the wait list
    waiter.task = current;
    waiter.up = false;
    list_add_tail(&waiter.list, &sem->wait_list);
    
    while (!waiter.up) {
        // Check for pending signals
        if (signal_pending(current)) {
            // Interrupted by signal
            result = -EINTR;
            break;
        }
        
        // Set state and release lock atomically with sleep
        set_current_state(TASK_INTERRUPTIBLE);
        raw_spin_unlock_irq(&sem->lock);
        
        // Sleep until signaled or interrupted
        schedule();
        
        raw_spin_lock_irq(&sem->lock);
    }
    
    // Cleanup
    if (!waiter.up)
        list_del(&waiter.list);
    
    set_current_state(TASK_RUNNING);
    return result;
}
 
// ==================================================
// System V semop() system call (simplified)
// ==================================================
SYSCALL_DEFINE3(semop, int, semid, struct sembuf __user *, sops,
                unsigned, nsops) {
    struct sem_array *sma;
    struct sembuf *operations;
    int error;
    
    // Validate and copy operations from user space
    if (nsops < 1 || nsops > SEMOPM)
        return -EINVAL;
    
    operations = kmalloc(sizeof(*operations) * nsops, GFP_KERNEL);
    if (!operations)
        return -ENOMEM;
    
    if (copy_from_user(operations, sops, sizeof(*operations) * nsops)) {
        kfree(operations);
        return -EFAULT;
    }
    
    // Find the semaphore set by ID
    sma = sem_obtain_object_check(semid);
    if (IS_ERR(sma)) {
        kfree(operations);
        return PTR_ERR(sma);
    }
    
    // Check permissions
    error = ipcperms(&sma->sem_perm, SEM_OPER);
    if (error) {
        kfree(operations);
        return error;
    }
    
    // Perform the atomic semaphore operations
    error = perform_atomic_semop(sma, operations, nsops);
    
    kfree(operations);
    return error;
}
 
// ==================================================
// Futex-based implementation (modern POSIX semaphores)
// ==================================================
int sem_wait_futex(sem_t *sem) {
    while (1) {
        // Try to decrement atomically in user space
        int val = atomic_load(&sem->value);
        
        if (val > 0) {
            if (atomic_compare_exchange(&sem->value, &val, val - 1))
                return 0;  // Success, no syscall!
            continue;
        }
        
        // Value is 0 - must block
        // System call to kernel
        syscall(SYS_futex, &sem->value, FUTEX_WAIT, 
                0, NULL, NULL, 0);
        // Retry after waking
    }
}
 
int sem_post_futex(sem_t *sem) {
    // Increment in user space
    int old = atomic_fetch_add(&sem->value, 1);
    
    // If there might be waiters, wake one
    // (Optimization: track waiters separately to avoid syscall)
    syscall(SYS_futex, &sem->value, FUTEX_WAKE, 
            1, NULL, NULL, 0);
    
    return 0;
}

Fast Path Optimization

Modern semaphore implementations aggressively optimize the uncontended case. The 'if (sem->count > 0)' check followed by decrement is the fast path—it stays entirely in kernel mode with no sleeping. Only contended operations pay the cost of wait queue manipulation.

Scheduler Integration

The magic of blocking semaphores lies in their deep integration with the process scheduler. When a process blocks on a semaphore, it must be removed from the run queue; when signaled, it must be added back. This requires careful coordination.

The Blocking Path:

State change: Process state changes from RUNNING to INTERRUPTIBLE (or UNINTERRUPTIBLE)
Wait queue insertion: Process is added to the semaphore's wait list
Run queue removal: Scheduler removes process from its run queue
Context switch: Scheduler selects and switches to another process

The Wake-Up Path:

Wait queue removal: Process is removed from semaphore's wait list
State change: Process state changes back to RUNNING
Run queue insertion: Scheduler adds process to a run queue
Preemption check: If woken process has higher priority, reschedule current

Critical Section Requirement:

The state change, wait queue manipulation, and run queue manipulation must all happen atomically under appropriate locks. The kernel uses carefully ordered lock acquisitions:

Semaphore's lock (protects count and wait_list)
Process's lock (protects state)
Run queue's lock (protects run queue membership)

Lock ordering prevents deadlock while ensuring atomicity.

scheduler_integration.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
// Deep integration between semaphores and scheduler
 
// ==================================================
// Blocking: Removing from run queue
// ==================================================
static void __down_common(struct semaphore *sem) {
    struct semaphore_waiter waiter;
    
    // Initialize waiter
    waiter.task = current;
    waiter.up = false;
    
    // Add to semaphore's wait list
    list_add_tail(&waiter.list, &sem->wait_list);
    
    for (;;) {
        // Set our state to interruptible
        set_current_state(TASK_INTERRUPTIBLE);
        
        // Release semaphore lock (but keep irqs disabled)
        raw_spin_unlock_irq(&sem->lock);
        
        // schedule() will:
        // 1. See our state is not RUNNING
        // 2. Remove us from run queue (deactivate_task)
        // 3. Context switch to another process
        // 4. We won't return here until woken up!
        schedule();
        
        raw_spin_lock_irq(&sem->lock);
        
        // We've been woken - check why
        if (waiter.up)
            break;  // Semaphore released, we got it
            
        // Spurious wake or signal - check signal
        if (signal_pending(current)) {
            list_del(&waiter.list);
            __set_current_state(TASK_RUNNING);
            return;  // -EINTR
        }
        // Loop and sleep again
    }
    
    __set_current_state(TASK_RUNNING);
}
 
// ==================================================
// Wake-up: Adding back to run queue
// ==================================================
static void __up(struct semaphore *sem) {
    struct semaphore_waiter *waiter;
    
    // Get the first waiter
    waiter = list_first_entry(&sem->wait_list,
                               struct semaphore_waiter, list);
    
    // Remove from semaphore's wait list
    list_del(&waiter->list);
    
    // Mark as woken
    waiter->up = true;
    
    // Wake the process
    // This adds it back to a run queue
    wake_up_process(waiter->task);
}
 
// ==================================================
// Core wake_up_process implementation
// ==================================================
int wake_up_process(struct task_struct *p) {
    return try_to_wake_up(p, TASK_NORMAL, 0);
}
 
int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) {
    unsigned long flags;
    int cpu, success = 0;
    
    // Lock the task's pi_lock
    raw_spin_lock_irqsave(&p->pi_lock, flags);
    
    // Check if task is in expected blocked state
    if (!(p->state & state))
        goto unlock;
    
    // Transition to runnable
    p->state = TASK_RUNNING;
    
    // Select CPU for the task
    cpu = select_task_rq(p, p->wake_cpu, wake_flags);
    
    // Need to lock the run queue
    // (double-lock: task's lock, then rq's lock)
    
    // Add to run queue
    struct rq *rq = cpu_rq(cpu);
    raw_spin_lock(&rq->lock);
    
    // Activate: add to run queue
    activate_task(rq, p, ENQUEUE_WAKEUP);
    
    // Check if we should preempt current task
    if (p->prio < rq->curr->prio) {
        resched_curr(rq);
    }
    
    raw_spin_unlock(&rq->lock);
    success = 1;
    
unlock:
    raw_spin_unlock_irqrestore(&p->pi_lock, flags);
    return success;
}
 
// ==================================================
// State transition diagram (conceptual)
// ==================================================
/*
                 ┌─────────────┐
                 │   RUNNING   │←──────────────┐
                 └──────┬──────┘               │
                        │                      │
                  sem_wait()                   │
                  (count == 0)                 │
                        │                      │
                        ▼                      │
        ┌───────────────────────────────┐     │
        │       INTERRUPTIBLE           │     │
        │  (on semaphore wait list)     │     │
        │  (off run queue)              │     │
        └───────────────┬───────────────┘     │
                        │                      │
                  sem_post()                   │
                   (from another process)      │
                        │                      │
                        ▼                      │
                wake_up_process()              │
                        │                      │
                        └──────────────────────┘
*/

Lock Ordering is Critical

The kernel uses strict lock ordering to prevent deadlock: semaphore lock → task lock → run queue lock. Violating this order (e.g., holding run queue lock, then trying to acquire semaphore lock) causes deadlock. Lock ordering documentation is essential reading for kernel developers.

System V Semaphores

System V semaphores are one of the oldest IPC (Inter-Process Communication) mechanisms, dating back to AT&T Unix in the early 1980s. Despite their age, they remain widely available and have unique capabilities.

Semaphore Sets:

Unlike POSIX semaphores (individual), System V semaphores are allocated in sets. A single semget() call creates an array of semaphores. This allows atomic operations across multiple semaphores—useful for complex multi-resource locking.

Key-Based Identification:

System V semaphores are identified by a numeric key (like a shared secret). Processes that know the key can access the same semaphore set. Keys can be:

Hardcoded constants (risky, may collide)
Derived from file paths + project ID using ftok()
IPC_PRIVATE for parent-child relationships only

The semop() Operation:

The semop() system call performs operations atomically on one or more semaphores:

Positive value: Increment (signal) the semaphore
Negative value: Decrement (wait) if possible; block if not
Zero value: Wait until semaphore becomes zero

All operations in a single semop() call are atomic—either all succeed or none do.

Undo Semantics (SEM_UNDO):

If a process terminates while holding semaphore resources, the kernel can automatically undo its operations. This prevents resource leaks from crashed processes.

system_v_semaphores.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <stdio.h>
#include <stdlib.h>
 
// ==================================================
// System V Semaphore Example
// ==================================================
 
// Union for semctl() operations
union semun {
    int              val;      // Value for SETVAL
    struct semid_ds *buf;      // Buffer for IPC_STAT, IPC_SET
    unsigned short  *array;    // Array for GETALL, SETALL
};
 
int main() {
    key_t key;
    int semid;
    struct sembuf sop;
    union semun arg;
    
    // Generate a key from a file path and project ID
    key = ftok("/tmp/my_semaphore", 'S');
    if (key == -1) {
        perror("ftok");
        exit(1);
    }
    
    // Create or get a semaphore set with 2 semaphores
    semid = semget(key, 2, IPC_CREAT | 0666);
    if (semid == -1) {
        perror("semget");
        exit(1);
    }
    
    // Initialize semaphore 0 to value 1 (binary semaphore/mutex)
    arg.val = 1;
    if (semctl(semid, 0, SETVAL, arg) == -1) {
        perror("semctl SETVAL");
        exit(1);
    }
    
    // Initialize semaphore 1 to value 5 (counting semaphore)
    arg.val = 5;
    if (semctl(semid, 1, SETVAL, arg) == -1) {
        perror("semctl SETVAL");
        exit(1);
    }
    
    // ==================================================
    // Wait (P/down) on semaphore 0
    // ==================================================
    sop.sem_num = 0;       // Which semaphore in the set
    sop.sem_op = -1;       // Decrement by 1
    sop.sem_flg = SEM_UNDO; // Undo on process exit
    
    printf("Waiting on semaphore 0...\n");
    if (semop(semid, &sop, 1) == -1) {
        perror("semop wait");
        exit(1);
    }
    printf("Acquired semaphore 0\n");
    
    // ... critical section ...
    
    // ==================================================
    // Signal (V/up) on semaphore 0
    // ==================================================
    sop.sem_num = 0;
    sop.sem_op = 1;        // Increment by 1
    sop.sem_flg = SEM_UNDO;
    
    if (semop(semid, &sop, 1) == -1) {
        perror("semop signal");
        exit(1);
    }
    printf("Released semaphore 0\n");
    
    // ==================================================
    // Atomic operation on multiple semaphores
    // ==================================================
    struct sembuf sops[2];
    
    // Acquire semaphore 0 AND decrement semaphore 1
    sops[0].sem_num = 0;
    sops[0].sem_op = -1;
    sops[0].sem_flg = SEM_UNDO;
    
    sops[1].sem_num = 1;
    sops[1].sem_op = -1;
    sops[1].sem_flg = SEM_UNDO;
    
    printf("Atomically acquiring both semaphores...\n");
    if (semop(semid, sops, 2) == -1) {
        perror("semop multi");
        exit(1);
    }
    printf("Acquired both\n");
    
    // Release both atomically
    sops[0].sem_op = 1;
    sops[1].sem_op = 1;
    
    if (semop(semid, sops, 2) == -1) {
        perror("semop multi release");
        exit(1);
    }
    printf("Released both\n");
    
    // ==================================================
    // Get current semaphore value
    // ==================================================
    int val = semctl(semid, 0, GETVAL);
    printf("Semaphore 0 current value: %d\n", val);
    
    // ==================================================
    // Delete the semaphore set (when done)
    // ==================================================
    if (semctl(semid, 0, IPC_RMID) == -1) {
        perror("semctl IPC_RMID");
        exit(1);
    }
    printf("Semaphore set removed\n");
    
    return 0;
}

System V Semaphore Operations
Operation	sem_op Value	Behavior
Wait (P/down)	Negative (-n)	Decrement by n; block if would go negative
Signal (V/up)	Positive (+n)	Increment by n; may wake waiters
Wait for zero	Zero (0)	Block until value becomes 0

POSIX Semaphores

POSIX semaphores provide a cleaner, more modern API than System V. They come in two varieties: named semaphores (for inter-process) and unnamed semaphores (for threads or related processes).

Named Semaphores:

Named semaphores exist in the filesystem namespace (typically /dev/shm/ on Linux). Any process that knows the name can open the same semaphore. They persist until explicitly unlinked or the system reboots.

sem_open(): Create or open a named semaphore
sem_close(): Close (but not delete) a named semaphore
sem_unlink(): Remove the semaphore from the namespace

Unnamed Semaphores:

Unnamed semaphores live in memory (user-allocated). They can be used between threads of the same process, or between processes if placed in shared memory.

sem_init(): Initialize an unnamed semaphore
sem_destroy(): Destroy an unnamed semaphore

Common Operations:

sem_wait(): Blocking wait (P operation)
sem_trywait(): Non-blocking wait (returns error if would block)
sem_timedwait(): Wait with timeout
sem_post(): Signal (V operation)
sem_getvalue(): Get current value

Implementation on Linux:

Modern Linux implements POSIX semaphores using futex under the hood. This makes uncontended operations extremely fast (no syscall), while still providing proper blocking when needed.

posix_semaphores.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
#include <semaphore.h>
#include <pthread.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
// ==================================================
// Named Semaphore Example (inter-process)
// ==================================================
void named_semaphore_example() {
    sem_t *sem;
    
    // Create or open named semaphore
    // O_CREAT: create if doesn't exist
    // 0644: permissions (owner rw, group r, other r)
    // 1: initial value
    sem = sem_open("/my_named_sem", O_CREAT, 0644, 1);
    if (sem == SEM_FAILED) {
        perror("sem_open");
        exit(1);
    }
    
    printf("Waiting on named semaphore...\n");
    if (sem_wait(sem) == -1) {
        perror("sem_wait");
        exit(1);
    }
    printf("Acquired named semaphore\n");
    
    // ... critical section ...
    sleep(2);  // Simulate work
    
    printf("Releasing named semaphore...\n");
    if (sem_post(sem) == -1) {
        perror("sem_post");
        exit(1);
    }
    
    // Close our reference
    sem_close(sem);
    
    // Unlink removes from namespace (when refcount hits 0)
    // Only do this when truly done with the semaphore
    // sem_unlink("/my_named_sem");
}
 
// ==================================================
// Unnamed Semaphore Example (thread synchronization)
// ==================================================
sem_t thread_sem;
int shared_counter = 0;
 
void *thread_func(void *arg) {
    int id = *(int *)arg;
    
    for (int i = 0; i < 1000; i++) {
        sem_wait(&thread_sem);  // P operation
        
        shared_counter++;       // Critical section
        
        sem_post(&thread_sem);  // V operation
    }
    
    printf("Thread %d done\n", id);
    return NULL;
}
 
void unnamed_semaphore_example() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};
    
    // Initialize unnamed semaphore
    // 0: shared between threads (not processes)
    // 1: initial value (binary semaphore/mutex)
    if (sem_init(&thread_sem, 0, 1) == -1) {
        perror("sem_init");
        exit(1);
    }
    
    // Create threads
    for (int i = 0; i < 4; i++) {
        pthread_create(&threads[i], NULL, thread_func, &ids[i]);
    }
    
    // Wait for all threads
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);
    }
    
    printf("Final counter: %d (expected 4000)\n", shared_counter);
    
    // Destroy the semaphore
    sem_destroy(&thread_sem);
}
 
// ==================================================
// Process-shared Unnamed Semaphore (in shared memory)
// ==================================================
void shared_memory_semaphore_example() {
    // Create shared memory region
    int fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0644);
    ftruncate(fd, sizeof(sem_t));
    
    sem_t *sem = mmap(NULL, sizeof(sem_t), PROT_READ | PROT_WRITE,
                       MAP_SHARED, fd, 0);
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        // Note: 1 means process-shared!
        sem_init(sem, 1, 1);
        
        sem_wait(sem);
        printf("Child acquired semaphore\n");
        sleep(2);
        printf("Child releasing semaphore\n");
        sem_post(sem);
        
        _exit(0);
    } else {
        // Parent process
        sleep(1);  // Let child init and acquire first
        
        printf("Parent waiting for semaphore...\n");
        sem_wait(sem);
        printf("Parent acquired semaphore\n");
        sem_post(sem);
        
        wait(NULL);  // Wait for child
        
        sem_destroy(sem);
        shm_unlink("/my_shm");
    }
}
 
// ==================================================
// Non-blocking and timed wait
// ==================================================
void advanced_wait_example(sem_t *sem) {
    // Non-blocking try
    if (sem_trywait(sem) == 0) {
        printf("Acquired immediately\n");
        sem_post(sem);
    } else {
        if (errno == EAGAIN) {
            printf("Would block - semaphore busy\n");
        }
    }
    
    // Timed wait
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    ts.tv_sec += 5;  // 5 second timeout
    
    if (sem_timedwait(sem, &ts) == 0) {
        printf("Acquired within timeout\n");
        sem_post(sem);
    } else {
        if (errno == ETIMEDOUT) {
            printf("Timeout waiting for semaphore\n");
        }
    }
    
    // Get current value
    int value;
    sem_getvalue(sem, &value);
    printf("Current value: %d\n", value);
}

POSIX Semaphore Advantages

•Cleaner API than System V
•Named and unnamed variants
•Fast (futex-based on Linux)
•Individual semaphores (simpler)
•Standard across POSIX systems

POSIX Semaphore Limitations

•No atomic multi-semaphore ops
•No built-in undo on process death
•Named semaphores need unlink
•Implementation varies by OS
•No wait-for-zero operation

Kernel Semaphore Lifecycle

Kernel semaphores have a well-defined lifecycle from creation to destruction. The kernel must handle each stage correctly to prevent resource leaks and ensure correctness.

Creation:

Semaphores can be created in several ways:

Static allocation: Declared as global variables in kernel code, initialized at compile time or boot
Dynamic allocation: Allocated via kmalloc() and initialized at runtime
User request: Created via system calls (semget, sem_open, etc.)

Usage:

During active use, the semaphore may have:

A non-zero count (resources available)
Waiting processes in its wait queue
Multiple concurrent users across CPUs

Destruction:

Destruction is tricky when waiters might exist:

Check for waiters—if any, wake them with an error code
Remove from any lookup structures (key mappings, namespace)
Free the memory only after ensuring no references remain

Process Death Handling:

When a process holding a semaphore dies:

SEM_UNDO (System V): Kernel reverses the process's semaphore operations
No automatic cleanup (POSIX): Resources may remain held; robust mutexes address this

Orphaned semaphore resources are a major source of bugs in poorly designed systems.

semaphore_lifecycle.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
// Kernel semaphore lifecycle management
 
// ==================================================
// Static allocation and initialization
// ==================================================
// Compile-time initialization (kernel code)
static DEFINE_SEMAPHORE(my_driver_sem);
 
// Boot-time initialization
static struct semaphore late_init_sem;
 
static int __init my_module_init(void) {
    sema_init(&late_init_sem, 1);
    return 0;
}
 
// ==================================================
// Dynamic allocation
// ==================================================
struct my_resource {
    struct semaphore lock;
    void *data;
    // ...
};
 
struct my_resource *create_resource(void) {
    struct my_resource *res;
    
    res = kmalloc(sizeof(*res), GFP_KERNEL);
    if (!res)
        return NULL;
    
    // Initialize the embedded semaphore
    sema_init(&res->lock, 1);
    res->data = NULL;
    
    return res;
}
 
void destroy_resource(struct my_resource *res) {
    // Semaphore has no explicit destroy in Linux kernel
    // but we must ensure no one is using it
    down(&res->lock);  // Acquire to ensure no one is in critical section
    // No up() needed - we're destroying
    
    kfree(res);
}
 
// ==================================================
// System V semaphore creation and destruction
// ==================================================
static void sysv_sem_create(void) {
    struct sem_array *sma;
    
    // Allocate semaphore set structure
    sma = kvzalloc(sizeof(*sma), GFP_KERNEL);
    if (!sma)
        return;
    
    // Allocate array of individual semaphores
    sma->sems = kvzalloc(sizeof(struct sem) * nsems, GFP_KERNEL);
    
    // Initialize each semaphore
    for (int i = 0; i < nsems; i++) {
        spin_lock_init(&sma->sems[i].lock);
        INIT_LIST_HEAD(&sma->sems[i].pending_alter);
        INIT_LIST_HEAD(&sma->sems[i].pending_const);
        sma->sems[i].semval = 0;
    }
    
    // Add to global IPC namespace
    // ...
}
 
static void sysv_sem_destroy(struct sem_array *sma) {
    // Wake all waiting processes with error
    struct sem_queue *q, *tmp;
    
    list_for_each_entry_safe(q, tmp, &sma->pending_alter, list) {
        q->status = -EIDRM;  // Identifier removed
        wake_up_process(q->sleeper);
        list_del(&q->list);
    }
    
    // Free semaphore array
    kvfree(sma->sems);
    
    // Free the set structure
    kvfree(sma);
}
 
// ==================================================
// SEM_UNDO: Automatic cleanup on process death
// ==================================================
struct sem_undo {
    struct list_head list_proc;  // Per-process undo list
    struct sem_array *semary;     // Semaphore set
    short *semadj;                // Adjustment values per semaphore
};
 
void exit_sem(struct task_struct *tsk) {
    struct sem_undo *un, *tmp;
    
    if (!tsk->sysvsem.undo_list)
        return;
    
    // Process all undo entries
    list_for_each_entry_safe(un, tmp, &tsk->sysvsem.undo_list, list_proc) {
        struct sem_array *sma = un->semary;
        
        // Apply the undo adjustments
        for (int i = 0; i < sma->sem_nsems; i++) {
            if (un->semadj[i]) {
                sma->sems[i].semval += un->semadj[i];
                // Wake any waiters that can now proceed
                update_queue(sma, i);
            }
        }
        
        // Free the undo structure
        kfree(un);
    }
}
 
// ==================================================
// Reference counting for safe destruction
// ==================================================
struct refcounted_sem {
    struct semaphore sem;
    atomic_t refcount;
};
 
void sem_get(struct refcounted_sem *s) {
    atomic_inc(&s->refcount);
}
 
void sem_put(struct refcounted_sem *s) {
    if (atomic_dec_and_test(&s->refcount)) {
        // Last reference - safe to free
        kfree(s);
    }
}

Destruction with Waiters

Never free a semaphore while processes are waiting on it! The waiters hold pointers to the semaphore's wait queue. If freed, they'll access freed memory (use-after-free bug). Always wake waiters with an error code first, or use reference counting to delay freeing until all users are done.

Summary: Kernel Semaphore Implementation

We've explored how operating system kernels implement semaphores—from the rationale for kernel involvement through data structures, system calls, and lifecycle management. Let's consolidate the key insights:

Key Takeaways

•Kernel involvement is necessary — True blocking requires scheduler integration; only the kernel can manipulate process state and run queues
•Data structures — Kernel semaphores contain a counter, wait queue, and spinlock; System V adds permission structures and semaphore sets
•System call interface — Users access semaphores via syscalls (semop, sem_wait); modern implementations use futex for fast uncontended paths
•Scheduler integration — Blocking moves processes from run queue to wait queue; wakeup reverses this with proper state transitions
•System V semaphores — Provide semaphore sets, atomic multi-semaphore operations, and SEM_UNDO for cleanup on process death
•POSIX semaphores — Cleaner API with named and unnamed variants; implemented via futex on Linux for high performance
•Lifecycle management — Creation, usage, and especially destruction require careful handling of waiters and reference counting

What's Next:

With implementation mechanics understood, we turn to fairness considerations. How do semaphore implementations ensure that waiting threads are treated equitably? What happens when naive implementations allow starvation? We'll explore queue disciplines, priority-based wakeup, and the tradeoffs involved in providing fairness guarantees.

Page Complete

You now understand how operating system kernels implement semaphores—from kernel data structures through system call interfaces to the deep scheduler integration that enables true blocking semantics. This knowledge is essential for understanding any blocking synchronization primitive in modern operating systems.

3 / 5

Loading learning content...

Operating SystemsSemaphore Implementation

Semaphore Implementation

LevelIntermediate

Duration90 mins

TopicSemaphore Implementation

3 / 5

Kernel Implementation

The Operating System's Role

What You Will Learn

Why Kernel-Level Semaphores

Before diving into implementation, let's understand why semaphores are typically kernel-provided rather than purely user-space constructs.

The Problem with User-Space Only:

Imagine trying to implement a blocking semaphore entirely in user-space:

Thread A checks the semaphore, finds it occupied
Thread A needs to "sleep" — but how?
Busy-waiting wastes CPU (we covered this)
True sleeping requires kernel involvement

User-space code cannot modify scheduler data structures or put itself on wait queues. Only the kernel can manipulate process state (RUNNING → SLEEPING) and scheduler run queues.

The Atomicity Problem:

Even if you could sleep in user-space, the operation "check value and decide to sleep" must be atomic with the sleep itself. Otherwise:

Thread A: if (sem.value == 0) {
Thread B: // signals the semaphore
Thread B: // finds no one waiting
Thread A:     sleep(); // LOST WAKEUP!
}

The kernel can make this atomic because it controls when interrupts/preemption occur and when other processes run.

Cross-Process Coordination:

Security and Access Control:

The kernel can enforce permissions on semaphores—which users/processes can create, use, or destroy them. User-space cannot enforce such access control.

Kernel Semaphore Advantages

•True blocking — Only the kernel can put processes to sleep and manage wait queues
•Atomic check-and-sleep — Kernel controls preemption, preventing lost wakeups
•Inter-process coordination — Kernel memory is shared; user memory is per-process
•Resource accounting — Kernel tracks semaphores as system resources with quotas
•Access control — Permissions can be enforced on semaphore creation and access
•Cleanup on process termination — Kernel can detect when a process dies holding a semaphore

Hybrid Approaches: Futex

Kernel Data Structures

Operating system kernels maintain extensive data structures to track semaphores. These structures live in kernel memory—protected from user-space corruption and shared across all processes.

Core Semaphore Structure (Linux-style):

The kernel semaphore structure contains:

The counter (count): Current semaphore value
Wait queue head: List of threads blocked on this semaphore
Lock: Spinlock protecting the structure
Owner tracking (optional): For debugging and priority inheritance

Per-CPU Considerations:

Lifecycle Management:

Kernel semaphores have clear lifecycle states:

Uninitialized: Memory allocated but not initialized
Initialized: Count set, wait queue empty, ready for use
In use: Count may be zero, waiters may be present
Destroyed: Marked invalid, any waiters must be addressed

kernel_semaphore_structures.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// Linux kernel semaphore structure (include/linux/semaphore.h)
 
struct semaphore {
    raw_spinlock_t      lock;       // Protects count and wait_list
    unsigned int        count;      // Current value (resources available)
    struct list_head    wait_list;  // Threads waiting to acquire
};
 
// Initialization macro
#define __SEMAPHORE_INITIALIZER(name, n) {            \
    .lock       = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \
    .count      = n,                                   \
    .wait_list  = LIST_HEAD_INIT((name).wait_list),   \
}
 
// Static declaration
#define DEFINE_SEMAPHORE(name)  \
    struct semaphore name = __SEMAPHORE_INITIALIZER(name, 1)
 
// Runtime initialization
static inline void sema_init(struct semaphore *sem, int val) {
    static struct lock_class_key __key;
    *sem = (struct semaphore) __SEMAPHORE_INITIALIZER(*sem, val);
    lockdep_init_map(&sem->lock.dep_map, "semaphore->lock", &__key, 0);
}
 
// Wait queue entry for semaphores
struct semaphore_waiter {
    struct list_head list;      // Links in semaphore's wait_list
    struct task_struct *task;   // The waiting process/thread
    bool up;                    // Set to true when woken
};
 
// ==================================================
// System V IPC Semaphore Structures
// ==================================================
 
// System V semaphore set (can contain multiple semaphores)
struct sem_array {
    struct kern_ipc_perm  sem_perm;    // Permissions and IPC key
    time64_t              sem_ctime;   // Last change time
    time64_t              sem_otime;   // Last semop time
    struct list_head      pending_alter;  // Pending operations
    struct list_head      pending_const;  // Pending non-alter ops
    struct list_head      *pending;       // Array of pending lists
    struct sem            *sems;          // Array of semaphores
    int                   sem_nsems;      // Number of semaphores in set
    int                   complex_count;  // Operations affecting multiple sems
    unsigned int          use_global_lock;
    spinlock_t            global_lock;    // Sometimes needed for atomicity
};
 
// Individual semaphore in a System V set
struct sem {
    int                   semval;      // Current value
    int                   sempid;      // PID of last operation
    spinlock_t            lock;        // Per-semaphore lock
    struct list_head      pending_alter;
    struct list_head      pending_const;
    time64_t              sem_otime;   // Last operation time
};
 
// ==================================================
// POSIX Semaphore Structures (user-space backed)
// ==================================================
 
// POSIX named semaphore (simplified)
typedef struct {
    unsigned int    value;        // Current count
    unsigned int    waiters;      // Number of waiters (for futex)
    // Named semaphores also have:
    // - Reference count
    // - Name hash for lookup
    // - File descriptor (if using fd-based implementation)
} posix_sem_t;

Semaphore Structure Components
Component	Purpose	Critical For
Value/Count	Current number of available resources	Determining if caller can proceed
Wait list	Queue of blocked processes/threads	Waking waiters on signal
Spinlock	Mutual exclusion for structure access	SMP safety, atomicity
Owner/PID	Which process last operated	Debugging, priority inheritance
Permissions	Access control (System V)	Security enforcement

System Call Interface

User-space programs interact with kernel semaphores through system calls. These calls cross the user-kernel boundary, entering kernel mode to manipulate kernel data structures.

The System Call Mechanism:

User-space program invokes a library wrapper (e.g., sem_wait())
Wrapper prepares arguments and issues a system call (e.g., via syscall instruction on x86)
CPU transitions to kernel mode; kernel's syscall handler runs
Kernel performs the operation with full access to kernel data structures
Result is returned; CPU transitions back to user mode
Library wrapper returns to the user program

POSIX Semaphore System Calls:

POSIX semaphores may use different underlying implementations:

Named semaphores: Often implemented via files and fcntl locks, or dedicated syscalls
Unnamed semaphores: Often implemented using futex on Linux

System V Semaphore System Calls:

System V provides explicit syscalls:

semget(): Create or get a semaphore set
semop(): Perform operations on semaphores
semctl(): Control operations (get value, set value, destroy)

Linux Futex System Call:

Modern POSIX semaphores often use futex() as the building block:

User-space atomic check: if successful, no syscall needed
futex(FUTEX_WAIT): Block until value changes
futex(FUTEX_WAKE): Wake blocked waiters

semaphore_syscalls.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
// System call implementations for semaphores
 
// ==================================================
// POSIX sem_wait() - kernel side (simplified)
// ==================================================
SYSCALL_DEFINE2(sem_wait, sem_t __user *, sem, 
                const struct timespec __user *, timeout) {
    struct semaphore *ksem;
    int result;
    
    // Validate user pointer
    if (!access_ok(sem, sizeof(sem_t)))
        return -EFAULT;
    
    // Get kernel semaphore from user handle
    ksem = lookup_semaphore(sem);
    if (IS_ERR(ksem))
        return PTR_ERR(ksem);
    
    // Perform the down operation
    if (timeout)
        result = down_timeout(ksem, timespec_to_jiffies(timeout));
    else
        result = down_interruptible(ksem);
    
    return result;
}
 
// ==================================================
// Linux kernel down_interruptible() implementation
// ==================================================
int down_interruptible(struct semaphore *sem) {
    unsigned long flags;
    int result = 0;
    
    raw_spin_lock_irqsave(&sem->lock, flags);
    
    if (likely(sem->count > 0)) {
        // Fast path: semaphore available
        sem->count--;
    } else {
        // Slow path: must block
        result = __down_interruptible(sem);
    }
    
    raw_spin_unlock_irqrestore(&sem->lock, flags);
    return result;
}
 
// Slow path: block until resource available or signal
static noinline int __down_interruptible(struct semaphore *sem) {
    struct semaphore_waiter waiter;
    int result = 0;
    
    // Add ourselves to the wait list
    waiter.task = current;
    waiter.up = false;
    list_add_tail(&waiter.list, &sem->wait_list);
    
    while (!waiter.up) {
        // Check for pending signals
        if (signal_pending(current)) {
            // Interrupted by signal
            result = -EINTR;
            break;
        }
        
        // Set state and release lock atomically with sleep
        set_current_state(TASK_INTERRUPTIBLE);
        raw_spin_unlock_irq(&sem->lock);
        
        // Sleep until signaled or interrupted
        schedule();
        
        raw_spin_lock_irq(&sem->lock);
    }
    
    // Cleanup
    if (!waiter.up)
        list_del(&waiter.list);
    
    set_current_state(TASK_RUNNING);
    return result;
}
 
// ==================================================
// System V semop() system call (simplified)
// ==================================================
SYSCALL_DEFINE3(semop, int, semid, struct sembuf __user *, sops,
                unsigned, nsops) {
    struct sem_array *sma;
    struct sembuf *operations;
    int error;
    
    // Validate and copy operations from user space
    if (nsops < 1 || nsops > SEMOPM)
        return -EINVAL;
    
    operations = kmalloc(sizeof(*operations) * nsops, GFP_KERNEL);
    if (!operations)
        return -ENOMEM;
    
    if (copy_from_user(operations, sops, sizeof(*operations) * nsops)) {
        kfree(operations);
        return -EFAULT;
    }
    
    // Find the semaphore set by ID
    sma = sem_obtain_object_check(semid);
    if (IS_ERR(sma)) {
        kfree(operations);
        return PTR_ERR(sma);
    }
    
    // Check permissions
    error = ipcperms(&sma->sem_perm, SEM_OPER);
    if (error) {
        kfree(operations);
        return error;
    }
    
    // Perform the atomic semaphore operations
    error = perform_atomic_semop(sma, operations, nsops);
    
    kfree(operations);
    return error;
}
 
// ==================================================
// Futex-based implementation (modern POSIX semaphores)
// ==================================================
int sem_wait_futex(sem_t *sem) {
    while (1) {
        // Try to decrement atomically in user space
        int val = atomic_load(&sem->value);
        
        if (val > 0) {
            if (atomic_compare_exchange(&sem->value, &val, val - 1))
                return 0;  // Success, no syscall!
            continue;
        }
        
        // Value is 0 - must block
        // System call to kernel
        syscall(SYS_futex, &sem->value, FUTEX_WAIT, 
                0, NULL, NULL, 0);
        // Retry after waking
    }
}
 
int sem_post_futex(sem_t *sem) {
    // Increment in user space
    int old = atomic_fetch_add(&sem->value, 1);
    
    // If there might be waiters, wake one
    // (Optimization: track waiters separately to avoid syscall)
    syscall(SYS_futex, &sem->value, FUTEX_WAKE, 
            1, NULL, NULL, 0);
    
    return 0;
}

Fast Path Optimization

Scheduler Integration

The Blocking Path:

State change: Process state changes from RUNNING to INTERRUPTIBLE (or UNINTERRUPTIBLE)
Wait queue insertion: Process is added to the semaphore's wait list
Run queue removal: Scheduler removes process from its run queue
Context switch: Scheduler selects and switches to another process

The Wake-Up Path:

Wait queue removal: Process is removed from semaphore's wait list
State change: Process state changes back to RUNNING
Run queue insertion: Scheduler adds process to a run queue
Preemption check: If woken process has higher priority, reschedule current

Critical Section Requirement:

The state change, wait queue manipulation, and run queue manipulation must all happen atomically under appropriate locks. The kernel uses carefully ordered lock acquisitions:

Semaphore's lock (protects count and wait_list)
Process's lock (protects state)
Run queue's lock (protects run queue membership)

Lock ordering prevents deadlock while ensuring atomicity.

scheduler_integration.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
// Deep integration between semaphores and scheduler
 
// ==================================================
// Blocking: Removing from run queue
// ==================================================
static void __down_common(struct semaphore *sem) {
    struct semaphore_waiter waiter;
    
    // Initialize waiter
    waiter.task = current;
    waiter.up = false;
    
    // Add to semaphore's wait list
    list_add_tail(&waiter.list, &sem->wait_list);
    
    for (;;) {
        // Set our state to interruptible
        set_current_state(TASK_INTERRUPTIBLE);
        
        // Release semaphore lock (but keep irqs disabled)
        raw_spin_unlock_irq(&sem->lock);
        
        // schedule() will:
        // 1. See our state is not RUNNING
        // 2. Remove us from run queue (deactivate_task)
        // 3. Context switch to another process
        // 4. We won't return here until woken up!
        schedule();
        
        raw_spin_lock_irq(&sem->lock);
        
        // We've been woken - check why
        if (waiter.up)
            break;  // Semaphore released, we got it
            
        // Spurious wake or signal - check signal
        if (signal_pending(current)) {
            list_del(&waiter.list);
            __set_current_state(TASK_RUNNING);
            return;  // -EINTR
        }
        // Loop and sleep again
    }
    
    __set_current_state(TASK_RUNNING);
}
 
// ==================================================
// Wake-up: Adding back to run queue
// ==================================================
static void __up(struct semaphore *sem) {
    struct semaphore_waiter *waiter;
    
    // Get the first waiter
    waiter = list_first_entry(&sem->wait_list,
                               struct semaphore_waiter, list);
    
    // Remove from semaphore's wait list
    list_del(&waiter->list);
    
    // Mark as woken
    waiter->up = true;
    
    // Wake the process
    // This adds it back to a run queue
    wake_up_process(waiter->task);
}
 
// ==================================================
// Core wake_up_process implementation
// ==================================================
int wake_up_process(struct task_struct *p) {
    return try_to_wake_up(p, TASK_NORMAL, 0);
}
 
int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) {
    unsigned long flags;
    int cpu, success = 0;
    
    // Lock the task's pi_lock
    raw_spin_lock_irqsave(&p->pi_lock, flags);
    
    // Check if task is in expected blocked state
    if (!(p->state & state))
        goto unlock;
    
    // Transition to runnable
    p->state = TASK_RUNNING;
    
    // Select CPU for the task
    cpu = select_task_rq(p, p->wake_cpu, wake_flags);
    
    // Need to lock the run queue
    // (double-lock: task's lock, then rq's lock)
    
    // Add to run queue
    struct rq *rq = cpu_rq(cpu);
    raw_spin_lock(&rq->lock);
    
    // Activate: add to run queue
    activate_task(rq, p, ENQUEUE_WAKEUP);
    
    // Check if we should preempt current task
    if (p->prio < rq->curr->prio) {
        resched_curr(rq);
    }
    
    raw_spin_unlock(&rq->lock);
    success = 1;
    
unlock:
    raw_spin_unlock_irqrestore(&p->pi_lock, flags);
    return success;
}
 
// ==================================================
// State transition diagram (conceptual)
// ==================================================
/*
                 ┌─────────────┐
                 │   RUNNING   │←──────────────┐
                 └──────┬──────┘               │
                        │                      │
                  sem_wait()                   │
                  (count == 0)                 │
                        │                      │
                        ▼                      │
        ┌───────────────────────────────┐     │
        │       INTERRUPTIBLE           │     │
        │  (on semaphore wait list)     │     │
        │  (off run queue)              │     │
        └───────────────┬───────────────┘     │
                        │                      │
                  sem_post()                   │
                   (from another process)      │
                        │                      │
                        ▼                      │
                wake_up_process()              │
                        │                      │
                        └──────────────────────┘
*/

Lock Ordering is Critical

System V Semaphores

Semaphore Sets:

Key-Based Identification:

System V semaphores are identified by a numeric key (like a shared secret). Processes that know the key can access the same semaphore set. Keys can be:

Hardcoded constants (risky, may collide)
Derived from file paths + project ID using ftok()
IPC_PRIVATE for parent-child relationships only

The semop() Operation:

The semop() system call performs operations atomically on one or more semaphores:

Positive value: Increment (signal) the semaphore
Negative value: Decrement (wait) if possible; block if not
Zero value: Wait until semaphore becomes zero

All operations in a single semop() call are atomic—either all succeed or none do.

Undo Semantics (SEM_UNDO):

If a process terminates while holding semaphore resources, the kernel can automatically undo its operations. This prevents resource leaks from crashed processes.

system_v_semaphores.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <stdio.h>
#include <stdlib.h>
 
// ==================================================
// System V Semaphore Example
// ==================================================
 
// Union for semctl() operations
union semun {
    int              val;      // Value for SETVAL
    struct semid_ds *buf;      // Buffer for IPC_STAT, IPC_SET
    unsigned short  *array;    // Array for GETALL, SETALL
};
 
int main() {
    key_t key;
    int semid;
    struct sembuf sop;
    union semun arg;
    
    // Generate a key from a file path and project ID
    key = ftok("/tmp/my_semaphore", 'S');
    if (key == -1) {
        perror("ftok");
        exit(1);
    }
    
    // Create or get a semaphore set with 2 semaphores
    semid = semget(key, 2, IPC_CREAT | 0666);
    if (semid == -1) {
        perror("semget");
        exit(1);
    }
    
    // Initialize semaphore 0 to value 1 (binary semaphore/mutex)
    arg.val = 1;
    if (semctl(semid, 0, SETVAL, arg) == -1) {
        perror("semctl SETVAL");
        exit(1);
    }
    
    // Initialize semaphore 1 to value 5 (counting semaphore)
    arg.val = 5;
    if (semctl(semid, 1, SETVAL, arg) == -1) {
        perror("semctl SETVAL");
        exit(1);
    }
    
    // ==================================================
    // Wait (P/down) on semaphore 0
    // ==================================================
    sop.sem_num = 0;       // Which semaphore in the set
    sop.sem_op = -1;       // Decrement by 1
    sop.sem_flg = SEM_UNDO; // Undo on process exit
    
    printf("Waiting on semaphore 0...\n");
    if (semop(semid, &sop, 1) == -1) {
        perror("semop wait");
        exit(1);
    }
    printf("Acquired semaphore 0\n");
    
    // ... critical section ...
    
    // ==================================================
    // Signal (V/up) on semaphore 0
    // ==================================================
    sop.sem_num = 0;
    sop.sem_op = 1;        // Increment by 1
    sop.sem_flg = SEM_UNDO;
    
    if (semop(semid, &sop, 1) == -1) {
        perror("semop signal");
        exit(1);
    }
    printf("Released semaphore 0\n");
    
    // ==================================================
    // Atomic operation on multiple semaphores
    // ==================================================
    struct sembuf sops[2];
    
    // Acquire semaphore 0 AND decrement semaphore 1
    sops[0].sem_num = 0;
    sops[0].sem_op = -1;
    sops[0].sem_flg = SEM_UNDO;
    
    sops[1].sem_num = 1;
    sops[1].sem_op = -1;
    sops[1].sem_flg = SEM_UNDO;
    
    printf("Atomically acquiring both semaphores...\n");
    if (semop(semid, sops, 2) == -1) {
        perror("semop multi");
        exit(1);
    }
    printf("Acquired both\n");
    
    // Release both atomically
    sops[0].sem_op = 1;
    sops[1].sem_op = 1;
    
    if (semop(semid, sops, 2) == -1) {
        perror("semop multi release");
        exit(1);
    }
    printf("Released both\n");
    
    // ==================================================
    // Get current semaphore value
    // ==================================================
    int val = semctl(semid, 0, GETVAL);
    printf("Semaphore 0 current value: %d\n", val);
    
    // ==================================================
    // Delete the semaphore set (when done)
    // ==================================================
    if (semctl(semid, 0, IPC_RMID) == -1) {
        perror("semctl IPC_RMID");
        exit(1);
    }
    printf("Semaphore set removed\n");
    
    return 0;
}

System V Semaphore Operations
Operation	sem_op Value	Behavior
Wait (P/down)	Negative (-n)	Decrement by n; block if would go negative
Signal (V/up)	Positive (+n)	Increment by n; may wake waiters
Wait for zero	Zero (0)	Block until value becomes 0

POSIX Semaphores

POSIX semaphores provide a cleaner, more modern API than System V. They come in two varieties: named semaphores (for inter-process) and unnamed semaphores (for threads or related processes).

Named Semaphores:

sem_open(): Create or open a named semaphore
sem_close(): Close (but not delete) a named semaphore
sem_unlink(): Remove the semaphore from the namespace

Unnamed Semaphores:

Unnamed semaphores live in memory (user-allocated). They can be used between threads of the same process, or between processes if placed in shared memory.

sem_init(): Initialize an unnamed semaphore
sem_destroy(): Destroy an unnamed semaphore

Common Operations:

sem_wait(): Blocking wait (P operation)
sem_trywait(): Non-blocking wait (returns error if would block)
sem_timedwait(): Wait with timeout
sem_post(): Signal (V operation)
sem_getvalue(): Get current value

Implementation on Linux:

Modern Linux implements POSIX semaphores using futex under the hood. This makes uncontended operations extremely fast (no syscall), while still providing proper blocking when needed.

posix_semaphores.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
#include <semaphore.h>
#include <pthread.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
// ==================================================
// Named Semaphore Example (inter-process)
// ==================================================
void named_semaphore_example() {
    sem_t *sem;
    
    // Create or open named semaphore
    // O_CREAT: create if doesn't exist
    // 0644: permissions (owner rw, group r, other r)
    // 1: initial value
    sem = sem_open("/my_named_sem", O_CREAT, 0644, 1);
    if (sem == SEM_FAILED) {
        perror("sem_open");
        exit(1);
    }
    
    printf("Waiting on named semaphore...\n");
    if (sem_wait(sem) == -1) {
        perror("sem_wait");
        exit(1);
    }
    printf("Acquired named semaphore\n");
    
    // ... critical section ...
    sleep(2);  // Simulate work
    
    printf("Releasing named semaphore...\n");
    if (sem_post(sem) == -1) {
        perror("sem_post");
        exit(1);
    }
    
    // Close our reference
    sem_close(sem);
    
    // Unlink removes from namespace (when refcount hits 0)
    // Only do this when truly done with the semaphore
    // sem_unlink("/my_named_sem");
}
 
// ==================================================
// Unnamed Semaphore Example (thread synchronization)
// ==================================================
sem_t thread_sem;
int shared_counter = 0;
 
void *thread_func(void *arg) {
    int id = *(int *)arg;
    
    for (int i = 0; i < 1000; i++) {
        sem_wait(&thread_sem);  // P operation
        
        shared_counter++;       // Critical section
        
        sem_post(&thread_sem);  // V operation
    }
    
    printf("Thread %d done\n", id);
    return NULL;
}
 
void unnamed_semaphore_example() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};
    
    // Initialize unnamed semaphore
    // 0: shared between threads (not processes)
    // 1: initial value (binary semaphore/mutex)
    if (sem_init(&thread_sem, 0, 1) == -1) {
        perror("sem_init");
        exit(1);
    }
    
    // Create threads
    for (int i = 0; i < 4; i++) {
        pthread_create(&threads[i], NULL, thread_func, &ids[i]);
    }
    
    // Wait for all threads
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);
    }
    
    printf("Final counter: %d (expected 4000)\n", shared_counter);
    
    // Destroy the semaphore
    sem_destroy(&thread_sem);
}
 
// ==================================================
// Process-shared Unnamed Semaphore (in shared memory)
// ==================================================
void shared_memory_semaphore_example() {
    // Create shared memory region
    int fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0644);
    ftruncate(fd, sizeof(sem_t));
    
    sem_t *sem = mmap(NULL, sizeof(sem_t), PROT_READ | PROT_WRITE,
                       MAP_SHARED, fd, 0);
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        // Note: 1 means process-shared!
        sem_init(sem, 1, 1);
        
        sem_wait(sem);
        printf("Child acquired semaphore\n");
        sleep(2);
        printf("Child releasing semaphore\n");
        sem_post(sem);
        
        _exit(0);
    } else {
        // Parent process
        sleep(1);  // Let child init and acquire first
        
        printf("Parent waiting for semaphore...\n");
        sem_wait(sem);
        printf("Parent acquired semaphore\n");
        sem_post(sem);
        
        wait(NULL);  // Wait for child
        
        sem_destroy(sem);
        shm_unlink("/my_shm");
    }
}
 
// ==================================================
// Non-blocking and timed wait
// ==================================================
void advanced_wait_example(sem_t *sem) {
    // Non-blocking try
    if (sem_trywait(sem) == 0) {
        printf("Acquired immediately\n");
        sem_post(sem);
    } else {
        if (errno == EAGAIN) {
            printf("Would block - semaphore busy\n");
        }
    }
    
    // Timed wait
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    ts.tv_sec += 5;  // 5 second timeout
    
    if (sem_timedwait(sem, &ts) == 0) {
        printf("Acquired within timeout\n");
        sem_post(sem);
    } else {
        if (errno == ETIMEDOUT) {
            printf("Timeout waiting for semaphore\n");
        }
    }
    
    // Get current value
    int value;
    sem_getvalue(sem, &value);
    printf("Current value: %d\n", value);
}

POSIX Semaphore Advantages

•Cleaner API than System V
•Named and unnamed variants
•Fast (futex-based on Linux)
•Individual semaphores (simpler)
•Standard across POSIX systems

POSIX Semaphore Limitations

•No atomic multi-semaphore ops
•No built-in undo on process death
•Named semaphores need unlink
•Implementation varies by OS
•No wait-for-zero operation

Kernel Semaphore Lifecycle

Kernel semaphores have a well-defined lifecycle from creation to destruction. The kernel must handle each stage correctly to prevent resource leaks and ensure correctness.

Creation:

Semaphores can be created in several ways:

Static allocation: Declared as global variables in kernel code, initialized at compile time or boot
Dynamic allocation: Allocated via kmalloc() and initialized at runtime
User request: Created via system calls (semget, sem_open, etc.)

Usage:

During active use, the semaphore may have:

A non-zero count (resources available)
Waiting processes in its wait queue
Multiple concurrent users across CPUs

Destruction:

Destruction is tricky when waiters might exist:

Check for waiters—if any, wake them with an error code
Remove from any lookup structures (key mappings, namespace)
Free the memory only after ensuring no references remain

Process Death Handling:

When a process holding a semaphore dies:

SEM_UNDO (System V): Kernel reverses the process's semaphore operations
No automatic cleanup (POSIX): Resources may remain held; robust mutexes address this

Orphaned semaphore resources are a major source of bugs in poorly designed systems.

semaphore_lifecycle.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
// Kernel semaphore lifecycle management
 
// ==================================================
// Static allocation and initialization
// ==================================================
// Compile-time initialization (kernel code)
static DEFINE_SEMAPHORE(my_driver_sem);
 
// Boot-time initialization
static struct semaphore late_init_sem;
 
static int __init my_module_init(void) {
    sema_init(&late_init_sem, 1);
    return 0;
}
 
// ==================================================
// Dynamic allocation
// ==================================================
struct my_resource {
    struct semaphore lock;
    void *data;
    // ...
};
 
struct my_resource *create_resource(void) {
    struct my_resource *res;
    
    res = kmalloc(sizeof(*res), GFP_KERNEL);
    if (!res)
        return NULL;
    
    // Initialize the embedded semaphore
    sema_init(&res->lock, 1);
    res->data = NULL;
    
    return res;
}
 
void destroy_resource(struct my_resource *res) {
    // Semaphore has no explicit destroy in Linux kernel
    // but we must ensure no one is using it
    down(&res->lock);  // Acquire to ensure no one is in critical section
    // No up() needed - we're destroying
    
    kfree(res);
}
 
// ==================================================
// System V semaphore creation and destruction
// ==================================================
static void sysv_sem_create(void) {
    struct sem_array *sma;
    
    // Allocate semaphore set structure
    sma = kvzalloc(sizeof(*sma), GFP_KERNEL);
    if (!sma)
        return;
    
    // Allocate array of individual semaphores
    sma->sems = kvzalloc(sizeof(struct sem) * nsems, GFP_KERNEL);
    
    // Initialize each semaphore
    for (int i = 0; i < nsems; i++) {
        spin_lock_init(&sma->sems[i].lock);
        INIT_LIST_HEAD(&sma->sems[i].pending_alter);
        INIT_LIST_HEAD(&sma->sems[i].pending_const);
        sma->sems[i].semval = 0;
    }
    
    // Add to global IPC namespace
    // ...
}
 
static void sysv_sem_destroy(struct sem_array *sma) {
    // Wake all waiting processes with error
    struct sem_queue *q, *tmp;
    
    list_for_each_entry_safe(q, tmp, &sma->pending_alter, list) {
        q->status = -EIDRM;  // Identifier removed
        wake_up_process(q->sleeper);
        list_del(&q->list);
    }
    
    // Free semaphore array
    kvfree(sma->sems);
    
    // Free the set structure
    kvfree(sma);
}
 
// ==================================================
// SEM_UNDO: Automatic cleanup on process death
// ==================================================
struct sem_undo {
    struct list_head list_proc;  // Per-process undo list
    struct sem_array *semary;     // Semaphore set
    short *semadj;                // Adjustment values per semaphore
};
 
void exit_sem(struct task_struct *tsk) {
    struct sem_undo *un, *tmp;
    
    if (!tsk->sysvsem.undo_list)
        return;
    
    // Process all undo entries
    list_for_each_entry_safe(un, tmp, &tsk->sysvsem.undo_list, list_proc) {
        struct sem_array *sma = un->semary;
        
        // Apply the undo adjustments
        for (int i = 0; i < sma->sem_nsems; i++) {
            if (un->semadj[i]) {
                sma->sems[i].semval += un->semadj[i];
                // Wake any waiters that can now proceed
                update_queue(sma, i);
            }
        }
        
        // Free the undo structure
        kfree(un);
    }
}
 
// ==================================================
// Reference counting for safe destruction
// ==================================================
struct refcounted_sem {
    struct semaphore sem;
    atomic_t refcount;
};
 
void sem_get(struct refcounted_sem *s) {
    atomic_inc(&s->refcount);
}
 
void sem_put(struct refcounted_sem *s) {
    if (atomic_dec_and_test(&s->refcount)) {
        // Last reference - safe to free
        kfree(s);
    }
}

Destruction with Waiters

Summary: Kernel Semaphore Implementation

Key Takeaways

•Kernel involvement is necessary — True blocking requires scheduler integration; only the kernel can manipulate process state and run queues
•Data structures — Kernel semaphores contain a counter, wait queue, and spinlock; System V adds permission structures and semaphore sets
•System call interface — Users access semaphores via syscalls (semop, sem_wait); modern implementations use futex for fast uncontended paths
•Scheduler integration — Blocking moves processes from run queue to wait queue; wakeup reverses this with proper state transitions
•System V semaphores — Provide semaphore sets, atomic multi-semaphore operations, and SEM_UNDO for cleanup on process death
•POSIX semaphores — Cleaner API with named and unnamed variants; implemented via futex on Linux for high performance
•Lifecycle management — Creation, usage, and especially destruction require careful handling of waiters and reference counting

What's Next:

Page Complete

3 / 5