Loading learning content...
User-space applications don't implement semaphores from scratch—they request semaphore services from the operating system kernel. The kernel provides semaphores as a protected abstraction, managing the state, enforcing access control, and integrating with process scheduling.
This relationship is fundamental: the kernel is the ultimate arbiter of resource access. It can atomically modify semaphore state while simultaneously manipulating scheduler data structures, putting processes to sleep, and waking them up. User-space code cannot do this safely without kernel assistance.
This page explores how operating system kernels implement semaphores—the data structures stored in kernel memory, the system call interfaces exposed to applications, and the deep integration with process lifecycle management. We'll examine both the venerable System V semaphores and the more modern POSIX semaphore implementation.
By the end of this page, you will understand: the system call interface for semaphore operations, kernel data structures for semaphore management, the integration with process scheduler and wait queues, differences between System V and POSIX semaphores, and how the kernel handles semaphore lifecycle from creation to destruction.
Before diving into implementation, let's understand why semaphores are typically kernel-provided rather than purely user-space constructs.
The Problem with User-Space Only:
Imagine trying to implement a blocking semaphore entirely in user-space:
User-space code cannot modify scheduler data structures or put itself on wait queues. Only the kernel can manipulate process state (RUNNING → SLEEPING) and scheduler run queues.
The Atomicity Problem:
Even if you could sleep in user-space, the operation "check value and decide to sleep" must be atomic with the sleep itself. Otherwise:
Thread A: if (sem.value == 0) {
Thread B: // signals the semaphore
Thread B: // finds no one waiting
Thread A: sleep(); // LOST WAKEUP!
}
The kernel can make this atomic because it controls when interrupts/preemption occur and when other processes run.
Cross-Process Coordination:
Semaphores often coordinate between different processes, not just threads. User-space memory is private to each process. The kernel can provide shared semaphore state visible to multiple processes, stored in kernel memory or in shared memory regions.
Security and Access Control:
The kernel can enforce permissions on semaphores—which users/processes can create, use, or destroy them. User-space cannot enforce such access control.
Modern systems use hybrid approaches like Linux's futex (fast userspace mutex). The fast path (uncontended acquisition) happens entirely in user-space with atomic operations. The slow path (blocking) makes a system call to the kernel. This combines user-space speed with kernel-level blocking.
Operating system kernels maintain extensive data structures to track semaphores. These structures live in kernel memory—protected from user-space corruption and shared across all processes.
Core Semaphore Structure (Linux-style):
The kernel semaphore structure contains:
Per-CPU Considerations:
On multiprocessor systems, semaphore operations must be SMP-safe. The internal spinlock serializes access from different CPUs. Some implementations use per-CPU wait queues or lock striping for reduced contention.
Lifecycle Management:
Kernel semaphores have clear lifecycle states:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
// Linux kernel semaphore structure (include/linux/semaphore.h) struct semaphore { raw_spinlock_t lock; // Protects count and wait_list unsigned int count; // Current value (resources available) struct list_head wait_list; // Threads waiting to acquire}; // Initialization macro#define __SEMAPHORE_INITIALIZER(name, n) { \ .lock = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \ .count = n, \ .wait_list = LIST_HEAD_INIT((name).wait_list), \} // Static declaration#define DEFINE_SEMAPHORE(name) \ struct semaphore name = __SEMAPHORE_INITIALIZER(name, 1) // Runtime initializationstatic inline void sema_init(struct semaphore *sem, int val) { static struct lock_class_key __key; *sem = (struct semaphore) __SEMAPHORE_INITIALIZER(*sem, val); lockdep_init_map(&sem->lock.dep_map, "semaphore->lock", &__key, 0);} // Wait queue entry for semaphoresstruct semaphore_waiter { struct list_head list; // Links in semaphore's wait_list struct task_struct *task; // The waiting process/thread bool up; // Set to true when woken}; // ==================================================// System V IPC Semaphore Structures// ================================================== // System V semaphore set (can contain multiple semaphores)struct sem_array { struct kern_ipc_perm sem_perm; // Permissions and IPC key time64_t sem_ctime; // Last change time time64_t sem_otime; // Last semop time struct list_head pending_alter; // Pending operations struct list_head pending_const; // Pending non-alter ops struct list_head *pending; // Array of pending lists struct sem *sems; // Array of semaphores int sem_nsems; // Number of semaphores in set int complex_count; // Operations affecting multiple sems unsigned int use_global_lock; spinlock_t global_lock; // Sometimes needed for atomicity}; // Individual semaphore in a System V setstruct sem { int semval; // Current value int sempid; // PID of last operation spinlock_t lock; // Per-semaphore lock struct list_head pending_alter; struct list_head pending_const; time64_t sem_otime; // Last operation time}; // ==================================================// POSIX Semaphore Structures (user-space backed)// ================================================== // POSIX named semaphore (simplified)typedef struct { unsigned int value; // Current count unsigned int waiters; // Number of waiters (for futex) // Named semaphores also have: // - Reference count // - Name hash for lookup // - File descriptor (if using fd-based implementation)} posix_sem_t;| Component | Purpose | Critical For |
|---|---|---|
| Value/Count | Current number of available resources | Determining if caller can proceed |
| Wait list | Queue of blocked processes/threads | Waking waiters on signal |
| Spinlock | Mutual exclusion for structure access | SMP safety, atomicity |
| Owner/PID | Which process last operated | Debugging, priority inheritance |
| Permissions | Access control (System V) | Security enforcement |
User-space programs interact with kernel semaphores through system calls. These calls cross the user-kernel boundary, entering kernel mode to manipulate kernel data structures.
The System Call Mechanism:
sem_wait())syscall instruction on x86)POSIX Semaphore System Calls:
POSIX semaphores may use different underlying implementations:
fcntl locks, or dedicated syscallsSystem V Semaphore System Calls:
System V provides explicit syscalls:
semget(): Create or get a semaphore setsemop(): Perform operations on semaphoressemctl(): Control operations (get value, set value, destroy)Linux Futex System Call:
Modern POSIX semaphores often use futex() as the building block:
futex(FUTEX_WAIT): Block until value changesfutex(FUTEX_WAKE): Wake blocked waiters123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161
// System call implementations for semaphores // ==================================================// POSIX sem_wait() - kernel side (simplified)// ==================================================SYSCALL_DEFINE2(sem_wait, sem_t __user *, sem, const struct timespec __user *, timeout) { struct semaphore *ksem; int result; // Validate user pointer if (!access_ok(sem, sizeof(sem_t))) return -EFAULT; // Get kernel semaphore from user handle ksem = lookup_semaphore(sem); if (IS_ERR(ksem)) return PTR_ERR(ksem); // Perform the down operation if (timeout) result = down_timeout(ksem, timespec_to_jiffies(timeout)); else result = down_interruptible(ksem); return result;} // ==================================================// Linux kernel down_interruptible() implementation// ==================================================int down_interruptible(struct semaphore *sem) { unsigned long flags; int result = 0; raw_spin_lock_irqsave(&sem->lock, flags); if (likely(sem->count > 0)) { // Fast path: semaphore available sem->count--; } else { // Slow path: must block result = __down_interruptible(sem); } raw_spin_unlock_irqrestore(&sem->lock, flags); return result;} // Slow path: block until resource available or signalstatic noinline int __down_interruptible(struct semaphore *sem) { struct semaphore_waiter waiter; int result = 0; // Add ourselves to the wait list waiter.task = current; waiter.up = false; list_add_tail(&waiter.list, &sem->wait_list); while (!waiter.up) { // Check for pending signals if (signal_pending(current)) { // Interrupted by signal result = -EINTR; break; } // Set state and release lock atomically with sleep set_current_state(TASK_INTERRUPTIBLE); raw_spin_unlock_irq(&sem->lock); // Sleep until signaled or interrupted schedule(); raw_spin_lock_irq(&sem->lock); } // Cleanup if (!waiter.up) list_del(&waiter.list); set_current_state(TASK_RUNNING); return result;} // ==================================================// System V semop() system call (simplified)// ==================================================SYSCALL_DEFINE3(semop, int, semid, struct sembuf __user *, sops, unsigned, nsops) { struct sem_array *sma; struct sembuf *operations; int error; // Validate and copy operations from user space if (nsops < 1 || nsops > SEMOPM) return -EINVAL; operations = kmalloc(sizeof(*operations) * nsops, GFP_KERNEL); if (!operations) return -ENOMEM; if (copy_from_user(operations, sops, sizeof(*operations) * nsops)) { kfree(operations); return -EFAULT; } // Find the semaphore set by ID sma = sem_obtain_object_check(semid); if (IS_ERR(sma)) { kfree(operations); return PTR_ERR(sma); } // Check permissions error = ipcperms(&sma->sem_perm, SEM_OPER); if (error) { kfree(operations); return error; } // Perform the atomic semaphore operations error = perform_atomic_semop(sma, operations, nsops); kfree(operations); return error;} // ==================================================// Futex-based implementation (modern POSIX semaphores)// ==================================================int sem_wait_futex(sem_t *sem) { while (1) { // Try to decrement atomically in user space int val = atomic_load(&sem->value); if (val > 0) { if (atomic_compare_exchange(&sem->value, &val, val - 1)) return 0; // Success, no syscall! continue; } // Value is 0 - must block // System call to kernel syscall(SYS_futex, &sem->value, FUTEX_WAIT, 0, NULL, NULL, 0); // Retry after waking }} int sem_post_futex(sem_t *sem) { // Increment in user space int old = atomic_fetch_add(&sem->value, 1); // If there might be waiters, wake one // (Optimization: track waiters separately to avoid syscall) syscall(SYS_futex, &sem->value, FUTEX_WAKE, 1, NULL, NULL, 0); return 0;}Modern semaphore implementations aggressively optimize the uncontended case. The 'if (sem->count > 0)' check followed by decrement is the fast path—it stays entirely in kernel mode with no sleeping. Only contended operations pay the cost of wait queue manipulation.
The magic of blocking semaphores lies in their deep integration with the process scheduler. When a process blocks on a semaphore, it must be removed from the run queue; when signaled, it must be added back. This requires careful coordination.
The Blocking Path:
The Wake-Up Path:
Critical Section Requirement:
The state change, wait queue manipulation, and run queue manipulation must all happen atomically under appropriate locks. The kernel uses carefully ordered lock acquisitions:
Lock ordering prevents deadlock while ensuring atomicity.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
// Deep integration between semaphores and scheduler // ==================================================// Blocking: Removing from run queue// ==================================================static void __down_common(struct semaphore *sem) { struct semaphore_waiter waiter; // Initialize waiter waiter.task = current; waiter.up = false; // Add to semaphore's wait list list_add_tail(&waiter.list, &sem->wait_list); for (;;) { // Set our state to interruptible set_current_state(TASK_INTERRUPTIBLE); // Release semaphore lock (but keep irqs disabled) raw_spin_unlock_irq(&sem->lock); // schedule() will: // 1. See our state is not RUNNING // 2. Remove us from run queue (deactivate_task) // 3. Context switch to another process // 4. We won't return here until woken up! schedule(); raw_spin_lock_irq(&sem->lock); // We've been woken - check why if (waiter.up) break; // Semaphore released, we got it // Spurious wake or signal - check signal if (signal_pending(current)) { list_del(&waiter.list); __set_current_state(TASK_RUNNING); return; // -EINTR } // Loop and sleep again } __set_current_state(TASK_RUNNING);} // ==================================================// Wake-up: Adding back to run queue// ==================================================static void __up(struct semaphore *sem) { struct semaphore_waiter *waiter; // Get the first waiter waiter = list_first_entry(&sem->wait_list, struct semaphore_waiter, list); // Remove from semaphore's wait list list_del(&waiter->list); // Mark as woken waiter->up = true; // Wake the process // This adds it back to a run queue wake_up_process(waiter->task);} // ==================================================// Core wake_up_process implementation// ==================================================int wake_up_process(struct task_struct *p) { return try_to_wake_up(p, TASK_NORMAL, 0);} int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { unsigned long flags; int cpu, success = 0; // Lock the task's pi_lock raw_spin_lock_irqsave(&p->pi_lock, flags); // Check if task is in expected blocked state if (!(p->state & state)) goto unlock; // Transition to runnable p->state = TASK_RUNNING; // Select CPU for the task cpu = select_task_rq(p, p->wake_cpu, wake_flags); // Need to lock the run queue // (double-lock: task's lock, then rq's lock) // Add to run queue struct rq *rq = cpu_rq(cpu); raw_spin_lock(&rq->lock); // Activate: add to run queue activate_task(rq, p, ENQUEUE_WAKEUP); // Check if we should preempt current task if (p->prio < rq->curr->prio) { resched_curr(rq); } raw_spin_unlock(&rq->lock); success = 1; unlock: raw_spin_unlock_irqrestore(&p->pi_lock, flags); return success;} // ==================================================// State transition diagram (conceptual)// ==================================================/* ┌─────────────┐ │ RUNNING │←──────────────┐ └──────┬──────┘ │ │ │ sem_wait() │ (count == 0) │ │ │ ▼ │ ┌───────────────────────────────┐ │ │ INTERRUPTIBLE │ │ │ (on semaphore wait list) │ │ │ (off run queue) │ │ └───────────────┬───────────────┘ │ │ │ sem_post() │ (from another process) │ │ │ ▼ │ wake_up_process() │ │ │ └──────────────────────┘*/The kernel uses strict lock ordering to prevent deadlock: semaphore lock → task lock → run queue lock. Violating this order (e.g., holding run queue lock, then trying to acquire semaphore lock) causes deadlock. Lock ordering documentation is essential reading for kernel developers.
System V semaphores are one of the oldest IPC (Inter-Process Communication) mechanisms, dating back to AT&T Unix in the early 1980s. Despite their age, they remain widely available and have unique capabilities.
Semaphore Sets:
Unlike POSIX semaphores (individual), System V semaphores are allocated in sets. A single semget() call creates an array of semaphores. This allows atomic operations across multiple semaphores—useful for complex multi-resource locking.
Key-Based Identification:
System V semaphores are identified by a numeric key (like a shared secret). Processes that know the key can access the same semaphore set. Keys can be:
ftok()IPC_PRIVATE for parent-child relationships onlyThe semop() Operation:
The semop() system call performs operations atomically on one or more semaphores:
All operations in a single semop() call are atomic—either all succeed or none do.
Undo Semantics (SEM_UNDO):
If a process terminates while holding semaphore resources, the kernel can automatically undo its operations. This prevents resource leaks from crashed processes.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
#include <sys/types.h>#include <sys/ipc.h>#include <sys/sem.h>#include <stdio.h>#include <stdlib.h> // ==================================================// System V Semaphore Example// ================================================== // Union for semctl() operationsunion semun { int val; // Value for SETVAL struct semid_ds *buf; // Buffer for IPC_STAT, IPC_SET unsigned short *array; // Array for GETALL, SETALL}; int main() { key_t key; int semid; struct sembuf sop; union semun arg; // Generate a key from a file path and project ID key = ftok("/tmp/my_semaphore", 'S'); if (key == -1) { perror("ftok"); exit(1); } // Create or get a semaphore set with 2 semaphores semid = semget(key, 2, IPC_CREAT | 0666); if (semid == -1) { perror("semget"); exit(1); } // Initialize semaphore 0 to value 1 (binary semaphore/mutex) arg.val = 1; if (semctl(semid, 0, SETVAL, arg) == -1) { perror("semctl SETVAL"); exit(1); } // Initialize semaphore 1 to value 5 (counting semaphore) arg.val = 5; if (semctl(semid, 1, SETVAL, arg) == -1) { perror("semctl SETVAL"); exit(1); } // ================================================== // Wait (P/down) on semaphore 0 // ================================================== sop.sem_num = 0; // Which semaphore in the set sop.sem_op = -1; // Decrement by 1 sop.sem_flg = SEM_UNDO; // Undo on process exit printf("Waiting on semaphore 0...\n"); if (semop(semid, &sop, 1) == -1) { perror("semop wait"); exit(1); } printf("Acquired semaphore 0\n"); // ... critical section ... // ================================================== // Signal (V/up) on semaphore 0 // ================================================== sop.sem_num = 0; sop.sem_op = 1; // Increment by 1 sop.sem_flg = SEM_UNDO; if (semop(semid, &sop, 1) == -1) { perror("semop signal"); exit(1); } printf("Released semaphore 0\n"); // ================================================== // Atomic operation on multiple semaphores // ================================================== struct sembuf sops[2]; // Acquire semaphore 0 AND decrement semaphore 1 sops[0].sem_num = 0; sops[0].sem_op = -1; sops[0].sem_flg = SEM_UNDO; sops[1].sem_num = 1; sops[1].sem_op = -1; sops[1].sem_flg = SEM_UNDO; printf("Atomically acquiring both semaphores...\n"); if (semop(semid, sops, 2) == -1) { perror("semop multi"); exit(1); } printf("Acquired both\n"); // Release both atomically sops[0].sem_op = 1; sops[1].sem_op = 1; if (semop(semid, sops, 2) == -1) { perror("semop multi release"); exit(1); } printf("Released both\n"); // ================================================== // Get current semaphore value // ================================================== int val = semctl(semid, 0, GETVAL); printf("Semaphore 0 current value: %d\n", val); // ================================================== // Delete the semaphore set (when done) // ================================================== if (semctl(semid, 0, IPC_RMID) == -1) { perror("semctl IPC_RMID"); exit(1); } printf("Semaphore set removed\n"); return 0;}| Operation | sem_op Value | Behavior |
|---|---|---|
| Wait (P/down) | Negative (-n) | Decrement by n; block if would go negative |
| Signal (V/up) | Positive (+n) | Increment by n; may wake waiters |
| Wait for zero | Zero (0) | Block until value becomes 0 |
POSIX semaphores provide a cleaner, more modern API than System V. They come in two varieties: named semaphores (for inter-process) and unnamed semaphores (for threads or related processes).
Named Semaphores:
Named semaphores exist in the filesystem namespace (typically /dev/shm/ on Linux). Any process that knows the name can open the same semaphore. They persist until explicitly unlinked or the system reboots.
sem_open(): Create or open a named semaphoresem_close(): Close (but not delete) a named semaphoresem_unlink(): Remove the semaphore from the namespaceUnnamed Semaphores:
Unnamed semaphores live in memory (user-allocated). They can be used between threads of the same process, or between processes if placed in shared memory.
sem_init(): Initialize an unnamed semaphoresem_destroy(): Destroy an unnamed semaphoreCommon Operations:
sem_wait(): Blocking wait (P operation)sem_trywait(): Non-blocking wait (returns error if would block)sem_timedwait(): Wait with timeoutsem_post(): Signal (V operation)sem_getvalue(): Get current valueImplementation on Linux:
Modern Linux implements POSIX semaphores using futex under the hood. This makes uncontended operations extremely fast (no syscall), while still providing proper blocking when needed.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171
#include <semaphore.h>#include <pthread.h>#include <fcntl.h>#include <sys/mman.h>#include <stdio.h>#include <stdlib.h>#include <unistd.h> // ==================================================// Named Semaphore Example (inter-process)// ==================================================void named_semaphore_example() { sem_t *sem; // Create or open named semaphore // O_CREAT: create if doesn't exist // 0644: permissions (owner rw, group r, other r) // 1: initial value sem = sem_open("/my_named_sem", O_CREAT, 0644, 1); if (sem == SEM_FAILED) { perror("sem_open"); exit(1); } printf("Waiting on named semaphore...\n"); if (sem_wait(sem) == -1) { perror("sem_wait"); exit(1); } printf("Acquired named semaphore\n"); // ... critical section ... sleep(2); // Simulate work printf("Releasing named semaphore...\n"); if (sem_post(sem) == -1) { perror("sem_post"); exit(1); } // Close our reference sem_close(sem); // Unlink removes from namespace (when refcount hits 0) // Only do this when truly done with the semaphore // sem_unlink("/my_named_sem");} // ==================================================// Unnamed Semaphore Example (thread synchronization)// ==================================================sem_t thread_sem;int shared_counter = 0; void *thread_func(void *arg) { int id = *(int *)arg; for (int i = 0; i < 1000; i++) { sem_wait(&thread_sem); // P operation shared_counter++; // Critical section sem_post(&thread_sem); // V operation } printf("Thread %d done\n", id); return NULL;} void unnamed_semaphore_example() { pthread_t threads[4]; int ids[4] = {0, 1, 2, 3}; // Initialize unnamed semaphore // 0: shared between threads (not processes) // 1: initial value (binary semaphore/mutex) if (sem_init(&thread_sem, 0, 1) == -1) { perror("sem_init"); exit(1); } // Create threads for (int i = 0; i < 4; i++) { pthread_create(&threads[i], NULL, thread_func, &ids[i]); } // Wait for all threads for (int i = 0; i < 4; i++) { pthread_join(threads[i], NULL); } printf("Final counter: %d (expected 4000)\n", shared_counter); // Destroy the semaphore sem_destroy(&thread_sem);} // ==================================================// Process-shared Unnamed Semaphore (in shared memory)// ==================================================void shared_memory_semaphore_example() { // Create shared memory region int fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0644); ftruncate(fd, sizeof(sem_t)); sem_t *sem = mmap(NULL, sizeof(sem_t), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); pid_t pid = fork(); if (pid == 0) { // Child process // Note: 1 means process-shared! sem_init(sem, 1, 1); sem_wait(sem); printf("Child acquired semaphore\n"); sleep(2); printf("Child releasing semaphore\n"); sem_post(sem); _exit(0); } else { // Parent process sleep(1); // Let child init and acquire first printf("Parent waiting for semaphore...\n"); sem_wait(sem); printf("Parent acquired semaphore\n"); sem_post(sem); wait(NULL); // Wait for child sem_destroy(sem); shm_unlink("/my_shm"); }} // ==================================================// Non-blocking and timed wait// ==================================================void advanced_wait_example(sem_t *sem) { // Non-blocking try if (sem_trywait(sem) == 0) { printf("Acquired immediately\n"); sem_post(sem); } else { if (errno == EAGAIN) { printf("Would block - semaphore busy\n"); } } // Timed wait struct timespec ts; clock_gettime(CLOCK_REALTIME, &ts); ts.tv_sec += 5; // 5 second timeout if (sem_timedwait(sem, &ts) == 0) { printf("Acquired within timeout\n"); sem_post(sem); } else { if (errno == ETIMEDOUT) { printf("Timeout waiting for semaphore\n"); } } // Get current value int value; sem_getvalue(sem, &value); printf("Current value: %d\n", value);}Kernel semaphores have a well-defined lifecycle from creation to destruction. The kernel must handle each stage correctly to prevent resource leaks and ensure correctness.
Creation:
Semaphores can be created in several ways:
kmalloc() and initialized at runtimesemget, sem_open, etc.)Usage:
During active use, the semaphore may have:
Destruction:
Destruction is tricky when waiters might exist:
Process Death Handling:
When a process holding a semaphore dies:
Orphaned semaphore resources are a major source of bugs in poorly designed systems.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
// Kernel semaphore lifecycle management // ==================================================// Static allocation and initialization// ==================================================// Compile-time initialization (kernel code)static DEFINE_SEMAPHORE(my_driver_sem); // Boot-time initializationstatic struct semaphore late_init_sem; static int __init my_module_init(void) { sema_init(&late_init_sem, 1); return 0;} // ==================================================// Dynamic allocation// ==================================================struct my_resource { struct semaphore lock; void *data; // ...}; struct my_resource *create_resource(void) { struct my_resource *res; res = kmalloc(sizeof(*res), GFP_KERNEL); if (!res) return NULL; // Initialize the embedded semaphore sema_init(&res->lock, 1); res->data = NULL; return res;} void destroy_resource(struct my_resource *res) { // Semaphore has no explicit destroy in Linux kernel // but we must ensure no one is using it down(&res->lock); // Acquire to ensure no one is in critical section // No up() needed - we're destroying kfree(res);} // ==================================================// System V semaphore creation and destruction// ==================================================static void sysv_sem_create(void) { struct sem_array *sma; // Allocate semaphore set structure sma = kvzalloc(sizeof(*sma), GFP_KERNEL); if (!sma) return; // Allocate array of individual semaphores sma->sems = kvzalloc(sizeof(struct sem) * nsems, GFP_KERNEL); // Initialize each semaphore for (int i = 0; i < nsems; i++) { spin_lock_init(&sma->sems[i].lock); INIT_LIST_HEAD(&sma->sems[i].pending_alter); INIT_LIST_HEAD(&sma->sems[i].pending_const); sma->sems[i].semval = 0; } // Add to global IPC namespace // ...} static void sysv_sem_destroy(struct sem_array *sma) { // Wake all waiting processes with error struct sem_queue *q, *tmp; list_for_each_entry_safe(q, tmp, &sma->pending_alter, list) { q->status = -EIDRM; // Identifier removed wake_up_process(q->sleeper); list_del(&q->list); } // Free semaphore array kvfree(sma->sems); // Free the set structure kvfree(sma);} // ==================================================// SEM_UNDO: Automatic cleanup on process death// ==================================================struct sem_undo { struct list_head list_proc; // Per-process undo list struct sem_array *semary; // Semaphore set short *semadj; // Adjustment values per semaphore}; void exit_sem(struct task_struct *tsk) { struct sem_undo *un, *tmp; if (!tsk->sysvsem.undo_list) return; // Process all undo entries list_for_each_entry_safe(un, tmp, &tsk->sysvsem.undo_list, list_proc) { struct sem_array *sma = un->semary; // Apply the undo adjustments for (int i = 0; i < sma->sem_nsems; i++) { if (un->semadj[i]) { sma->sems[i].semval += un->semadj[i]; // Wake any waiters that can now proceed update_queue(sma, i); } } // Free the undo structure kfree(un); }} // ==================================================// Reference counting for safe destruction// ==================================================struct refcounted_sem { struct semaphore sem; atomic_t refcount;}; void sem_get(struct refcounted_sem *s) { atomic_inc(&s->refcount);} void sem_put(struct refcounted_sem *s) { if (atomic_dec_and_test(&s->refcount)) { // Last reference - safe to free kfree(s); }}Never free a semaphore while processes are waiting on it! The waiters hold pointers to the semaphore's wait queue. If freed, they'll access freed memory (use-after-free bug). Always wake waiters with an error code first, or use reference counting to delay freeing until all users are done.
We've explored how operating system kernels implement semaphores—from the rationale for kernel involvement through data structures, system calls, and lifecycle management. Let's consolidate the key insights:
What's Next:
With implementation mechanics understood, we turn to fairness considerations. How do semaphore implementations ensure that waiting threads are treated equitably? What happens when naive implementations allow starvation? We'll explore queue disciplines, priority-based wakeup, and the tradeoffs involved in providing fairness guarantees.
You now understand how operating system kernels implement semaphores—from kernel data structures through system call interfaces to the deep scheduler integration that enables true blocking semantics. This knowledge is essential for understanding any blocking synchronization primitive in modern operating systems.