Operating SystemsCPU Scheduling Advanced

Processor Affinity

LevelIntermediate

Duration60 mins

TopicCPU Scheduling Advanced

4 / 5

Setting Affinity

From Theory to Practice

We've explored what processor affinity is, why it matters for cache locality, and how it interacts with the scheduler. Now it's time to put this knowledge into practice. How do we actually configure processor affinity for our applications?

This page serves as a comprehensive practical guide, covering:

System calls for low-level control (Linux and Windows)
Thread library APIs for multithreaded applications
Command-line tools for quick adjustments without code changes
Language-specific approaches for managed runtimes
Container and VM considerations for cloud-native deployments
Best practices for production affinity configuration

By the end of this page, you'll have a complete toolkit for configuring affinity in virtually any environment.

What You Will Learn

By the end of this page, you will be able to: set affinity using system calls in C/C++, configure threads with pthreads and Windows APIs, use taskset and numactl from the command line, handle affinity in Java, Go, and Python, configure affinity in containers and VMs, and apply best practices for production deployments.

Linux System Calls In Depth

Linux provides sched_setaffinity() and sched_getaffinity() for process-level affinity control. Let's examine these in detail, including error handling, capabilities, and edge cases.

Complete API Reference:

Linux affinity system calls - complete reference
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
 
/*
 * sched_setaffinity() - Set the CPU affinity mask
 *
 * @pid: Process ID. Use 0 for calling process, or specific PID.
 *       For threads, use gettid() not pthread_self().
 * @cpusetsize: Size in bytes of the mask buffer.
 * @mask: Pointer to cpu_set_t containing allowed CPUs.
 *
 * Returns 0 on success, -1 on error with errno set.
 *
 * Errors:
 *   EFAULT: mask points to invalid address
 *   EINVAL: No CPUs in mask are online, or cpusetsize is wrong
 *   EPERM:  Caller lacks CAP_SYS_NICE and isn't process owner
 *   ESRCH:  No process with given PID exists
 */
int sched_setaffinity(pid_t pid, size_t cpusetsize,
                      const cpu_set_t *mask);
 
/*
 * sched_getaffinity() - Get the CPU affinity mask
 *
 * Same parameters as sched_setaffinity(), but 'mask' is output.
 * Kernel writes current affinity to *mask.
 *
 * Note: cpusetsize must be large enough for all possible CPUs.
 *       Use sizeof(cpu_set_t) or CPU_ALLOC_SIZE(num_cpus).
 */
int sched_getaffinity(pid_t pid, size_t cpusetsize,
                      cpu_set_t *mask);
 
/*
 * sched_getcpu() - Get the CPU currently executing this thread
 *
 * Returns CPU number (0-based), or -1 on error.
 * This is extremely fast (uses VDSO, no syscall in practice).
 */
int sched_getcpu(void);

Robust Implementation with Error Handling:

affinity_utils.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/syscall.h>
 
/* Get Thread ID (for per-thread affinity) */
pid_t gettid(void) {
    return syscall(SYS_gettid);
}
 
/* Get number of configured (not necessarily online) CPUs */
int get_num_cpus(void) {
    return sysconf(_SC_NPROCESSORS_CONF);
}
 
/* Get number of currently online CPUs */
int get_online_cpus(void) {
    return sysconf(_SC_NPROCESSORS_ONLN);
}
 
/* Print current affinity mask in human-readable form */
void print_affinity(pid_t pid, const char *label) {
    cpu_set_t mask;
    CPU_ZERO(&mask);
    
    if (sched_getaffinity(pid, sizeof(mask), &mask) == -1) {
        fprintf(stderr, "sched_getaffinity(%d): %s\n", 
                pid, strerror(errno));
        return;
    }
    
    printf("%s: ", label);
    int first = 1;
    for (int i = 0; i < CPU_SETSIZE; i++) {
        if (CPU_ISSET(i, &mask)) {
            printf("%s%d", first ? "" : ",", i);
            first = 0;
        }
    }
    printf(" (count: %d)\n", CPU_COUNT(&mask));
}
 
/* Set affinity with full error handling */
int set_affinity_safe(pid_t pid, const cpu_set_t *mask) {
    int retries = 3;
    
    while (retries-- > 0) {
        if (sched_setaffinity(pid, sizeof(cpu_set_t), mask) == 0) {
            return 0;  /* Success */
        }
        
        switch (errno) {
        case EINTR:
            /* Interrupted, retry */
            continue;
            
        case EINVAL:
            fprintf(stderr, "set_affinity: No valid CPUs in mask\n");
            return -1;
            
        case EPERM:
            fprintf(stderr, "set_affinity: Permission denied. "
                    "Need CAP_SYS_NICE or ownership.\n");
            return -1;
            
        case ESRCH:
            fprintf(stderr, "set_affinity: Process %d not found\n", pid);
            return -1;
            
        default:
            fprintf(stderr, "set_affinity: Unexpected error: %s\n",
                    strerror(errno));
            return -1;
        }
    }
    
    return -1;
}
 
/* Pin to a single CPU */
int pin_to_cpu(pid_t pid, int cpu) {
    cpu_set_t mask;
    CPU_ZERO(&mask);
    
    /* Validate CPU number */
    if (cpu < 0 || cpu >= get_num_cpus()) {
        fprintf(stderr, "Invalid CPU: %d (system has %d CPUs)\n",
                cpu, get_num_cpus());
        return -1;
    }
    
    CPU_SET(cpu, &mask);
    return set_affinity_safe(pid, &mask);
}
 
/* Pin to a range of CPUs */
int pin_to_range(pid_t pid, int start, int end) {
    cpu_set_t mask;
    CPU_ZERO(&mask);
    
    for (int i = start; i <= end && i < get_num_cpus(); i++) {
        CPU_SET(i, &mask);
    }
    
    if (CPU_COUNT(&mask) == 0) {
        fprintf(stderr, "No valid CPUs in range %d-%d\n", start, end);
        return -1;
    }
    
    return set_affinity_safe(pid, &mask);
}
 
/* Example usage */
int main() {
    printf("System has %d CPUs configured, %d online\n",
           get_num_cpus(), get_online_cpus());
    
    print_affinity(0, "Initial");
    
    /* Pin to CPU 0 */
    if (pin_to_cpu(0, 0) == 0) {
        print_affinity(0, "After pin_to_cpu(0)");
        printf("Currently on CPU %d\n", sched_getcpu());
    }
    
    /* Pin to CPUs 0-3 */
    if (pin_to_range(0, 0, 3) == 0) {
        print_affinity(0, "After pin_to_range(0,3)");
    }
    
    return 0;
}

Large CPU Sets

For systems with more than 1024 CPUs, use CPU_ALLOC(), CPU_ALLOC_SIZE(), and CPU_FREE() to dynamically allocate appropriately sized masks. The static cpu_set_t is limited to CPU_SETSIZE (typically 1024). Example: cpu_set_t *mask = CPU_ALLOC(num_cpus);

Pthreads Affinity Control

For multithreaded applications, the pthreads library provides thread-specific affinity control. This is essential for NUMA-aware thread pools and per-core processing.

Setting Affinity for Existing Threads:

pthread_affinity.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
#define NUM_THREADS 4
 
typedef struct {
    int thread_id;
    int target_cpu;
} thread_arg_t;
 
void* worker(void* arg) {
    thread_arg_t* targ = (thread_arg_t*)arg;
    cpu_set_t cpuset;
    
    /* Pin this thread to its designated CPU */
    CPU_ZERO(&cpuset);
    CPU_SET(targ->target_cpu, &cpuset);
    
    int rc = pthread_setaffinity_np(pthread_self(),
                                     sizeof(cpuset), &cpuset);
    if (rc != 0) {
        fprintf(stderr, "Thread %d: pthread_setaffinity_np failed: %d\n",
                targ->thread_id, rc);
        return NULL;
    }
    
    /* Verify placement */
    CPU_ZERO(&cpuset);
    pthread_getaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
    
    printf("Thread %d: requested CPU %d, running on CPU %d\n",
           targ->thread_id, targ->target_cpu, sched_getcpu());
    
    /* Worker loop - always runs on target CPU */
    volatile long sum = 0;
    for (long i = 0; i < 1000000000L; i++) {
        sum += i;
    }
    
    printf("Thread %d finished on CPU %d\n", 
           targ->thread_id, sched_getcpu());
    
    return NULL;
}
 
int main() {
    pthread_t threads[NUM_THREADS];
    thread_arg_t args[NUM_THREADS];
    int num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
    
    printf("System has %d CPUs, creating %d threads\n", 
           num_cpus, NUM_THREADS);
    
    for (int i = 0; i < NUM_THREADS; i++) {
        args[i].thread_id = i;
        args[i].target_cpu = i % num_cpus;  /* Round-robin CPUs */
        
        int rc = pthread_create(&threads[i], NULL, worker, &args[i]);
        if (rc != 0) {
            fprintf(stderr, "pthread_create failed: %d\n", rc);
            exit(1);
        }
    }
    
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
    }
    
    return 0;
}

Setting Affinity at Thread Creation:

For optimal cache behavior, set affinity before the thread starts executing:

pthread_attr_affinity.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
 
void* worker(void* arg) {
    int thread_id = *(int*)arg;
    
    /* Thread starts on correct CPU from the beginning */
    printf("Thread %d: started on CPU %d\n", 
           thread_id, sched_getcpu());
    
    /* ... work ... */
    
    return NULL;
}
 
pthread_t create_pinned_thread(int target_cpu, 
                                void* (*func)(void*), 
                                void* arg) {
    pthread_t thread;
    pthread_attr_t attr;
    cpu_set_t cpuset;
    int rc;
    
    /* Initialize attributes */
    rc = pthread_attr_init(&attr);
    if (rc != 0) {
        fprintf(stderr, "pthread_attr_init failed\n");
        return 0;
    }
    
    /* Set affinity in attributes */
    CPU_ZERO(&cpuset);
    CPU_SET(target_cpu, &cpuset);
    
    rc = pthread_attr_setaffinity_np(&attr, sizeof(cpuset), &cpuset);
    if (rc != 0) {
        fprintf(stderr, "pthread_attr_setaffinity_np failed: %d\n", rc);
        pthread_attr_destroy(&attr);
        return 0;
    }
    
    /* Create thread with these attributes */
    rc = pthread_create(&thread, &attr, func, arg);
    if (rc != 0) {
        fprintf(stderr, "pthread_create failed: %d\n", rc);
        pthread_attr_destroy(&attr);
        return 0;
    }
    
    pthread_attr_destroy(&attr);
    return thread;
}
 
int main() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};
    
    for (int i = 0; i < 4; i++) {
        threads[i] = create_pinned_thread(i, worker, &ids[i]);
    }
    
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);
    }
    
    return 0;
}

Why Set Affinity Before Start?

When you set affinity in pthread_attr, the thread starts on the correct CPU immediately. If you set it after pthread_create, the thread may execute briefly on a different CPU, warm that cache, then migrate—wasting the cache warmup. Pre-creation affinity ensures optimal initial placement.

Command-Line Tools for Affinity

Command-line tools enable affinity control without code modifications—invaluable for testing, deployment scripts, and production tuning.

taskset: The Essential Tool

taskset comprehensive examples
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# ===== LAUNCHING PROCESSES WITH AFFINITY =====
 
# Single CPU using hex mask
taskset 0x1 ./app                    # CPU 0 only
taskset 0x2 ./app                    # CPU 1 only
taskset 0x4 ./app                    # CPU 2 only
 
# Using -c for human-readable CPU list
taskset -c 0 ./app                   # CPU 0 only
taskset -c 0,2,4 ./app               # CPUs 0, 2, 4
taskset -c 0-7 ./app                 # CPUs 0 through 7
taskset -c 0-3,8-11 ./app            # CPUs 0-3 and 8-11
 
# ===== QUERYING AFFINITY =====
 
# Get affinity as hex mask
taskset -p 1234
# Output: pid 1234's current affinity mask: ff
 
# Get affinity as CPU list (easier to read)
taskset -cp 1234
# Output: pid 1234's current affinity list: 0-7
 
# ===== MODIFYING RUNNING PROCESSES =====
 
# Change affinity of running process (hex mask)
taskset -p 0x3 1234                  # Set to CPUs 0,1
 
# Change affinity of running process (CPU list)
taskset -cp 0-3 1234                 # Set to CPUs 0-3
 
# Change all threads of a process
taskset -acp 0-3 1234                # -a = all tasks/threads
 
# ===== COMBINING WITH OTHER TOOLS =====
 
# With nice (priority)
nice -n -10 taskset -c 0-3 ./high_priority_app
 
# With numactl (NUMA binding)
numactl --membind=0 taskset -c 0-7 ./app
 
# With cgroups (resource limits)
cgexec -g cpu:mygroup taskset -c 0-7 ./app
 
# In a script with logging
log_and_run() {
    echo "$(date): Starting $1 on CPUs $2"
    taskset -c "$2" "$1" &
    echo "PID: $!"
}
 
log_and_run ./database 0-3
log_and_run ./webserver 4-7

numactl: NUMA-Aware Affinity

For NUMA systems, numactl provides both CPU and memory placement:

numactl comprehensive examples
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# ===== DISPLAYING NUMA TOPOLOGY =====
 
numactl --hardware
# Shows: nodes, CPUs per node, memory per node, inter-node distances
 
numactl --show
# Shows: current policy and bindings
 
# ===== CPU BINDING =====
 
# Run on all CPUs of NUMA node 0
numactl --cpunodebind=0 ./app
 
# Run on multiple NUMA nodes
numactl --cpunodebind=0,1 ./app
 
# Run on specific physical CPUs
numactl --physcpubind=0,2,4,6 ./app
 
# ===== MEMORY BINDING =====
 
# Allocate memory only from node 0
numactl --membind=0 ./app
 
# Preferred allocation from node 0 (fallback to others if full)
numactl --preferred=0 ./app
 
# Interleave memory across all nodes (good for shared data)
numactl --interleave=all ./app
 
# ===== COMBINED CPU + MEMORY BINDING =====
 
# Bind both CPU and memory to node 0 (optimal locality)
numactl --cpunodebind=0 --membind=0 ./app
 
# Run on node 0 CPUs, interleave memory (for multi-node sharing)
numactl --cpunodebind=0 --interleave=all ./app
 
# ===== MONITORING NUMA BEHAVIOR =====
 
# Check NUMA statistics for a process
numastat -p <pid>
# Shows: pages per NUMA node for this process
 
# System-wide NUMA statistics
numastat -m
# Shows: memory per node, numa_hit, numa_miss, etc.
 
# Watch NUMA misses in real-time
watch -n1 'numastat | grep numa_miss'

Production Best Practice

For production deployments, put affinity settings in systemd unit files rather than shell scripts. This ensures affinity is applied consistently on every service restart. Use CPUAffinity= directive in the [Service] section.

Windows Affinity APIs

Windows provides comprehensive affinity control through Win32 APIs. The concepts are similar to Linux but with different naming.

Process Affinity:

Windows process affinity
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <windows.h>
#include <stdio.h>
 
int main() {
    HANDLE hProcess = GetCurrentProcess();
    DWORD_PTR processAffinityMask;
    DWORD_PTR systemAffinityMask;
    
    /* Get current affinity masks */
    if (!GetProcessAffinityMask(hProcess, 
                                 &processAffinityMask, 
                                 &systemAffinityMask)) {
        printf("GetProcessAffinityMask failed: %lu\n", GetLastError());
        return 1;
    }
    
    printf("System mask: 0x%llx\n", (unsigned long long)systemAffinityMask);
    printf("Process mask: 0x%llx\n", (unsigned long long)processAffinityMask);
    
    /* Set process to run on CPUs 0 and 1 only */
    DWORD_PTR newMask = 0x3;  /* Binary: 11 = CPUs 0 and 1 */
    
    if (!SetProcessAffinityMask(hProcess, newMask)) {
        printf("SetProcessAffinityMask failed: %lu\n", GetLastError());
        return 1;
    }
    
    printf("Process affinity set to: 0x%llx\n", (unsigned long long)newMask);
    
    /* Verify change */
    GetProcessAffinityMask(hProcess, &processAffinityMask, &systemAffinityMask);
    printf("New process mask: 0x%llx\n", (unsigned long long)processAffinityMask);
    
    return 0;
}

Thread Affinity:

Windows thread affinity
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <windows.h>
#include <stdio.h>
 
DWORD WINAPI WorkerThread(LPVOID lpParam) {
    int threadId = (int)(intptr_t)lpParam;
    HANDLE hThread = GetCurrentThread();
    
    /* Pin this thread to a specific CPU */
    DWORD_PTR mask = 1ULL << threadId;  /* CPU = thread ID */
    
    DWORD_PTR previousMask = SetThreadAffinityMask(hThread, mask);
    if (previousMask == 0) {
        printf("Thread %d: SetThreadAffinityMask failed\n", threadId);
        return 1;
    }
    
    /* For soft affinity hint (preferred CPU) */
    DWORD previousIdeal = SetThreadIdealProcessor(hThread, threadId);
    
    /* Verify current processor */
    printf("Thread %d: running, ideal processor set\n", threadId);
    
    /* Do work */
    volatile long sum = 0;
    for (long i = 0; i < 1000000000L; i++) {
        sum += i;
    }
    
    return 0;
}
 
int main() {
    HANDLE threads[4];
    SYSTEM_INFO sysInfo;
    
    GetSystemInfo(&sysInfo);
    printf("System has %lu processors\n", sysInfo.dwNumberOfProcessors);
    
    /* Create worker threads */
    for (int i = 0; i < 4; i++) {
        threads[i] = CreateThread(
            NULL,                /* Default security */
            0,                   /* Default stack size */
            WorkerThread,        /* Thread function */
            (LPVOID)(intptr_t)i, /* Thread argument */
            0,                   /* Start immediately */
            NULL                 /* Don't need thread ID */
        );
        
        if (threads[i] == NULL) {
            printf("CreateThread failed: %lu\n", GetLastError());
            return 1;
        }
    }
    
    /* Wait for threads */
    WaitForMultipleObjects(4, threads, TRUE, INFINITE);
    
    /* Clean up */
    for (int i = 0; i < 4; i++) {
        CloseHandle(threads[i]);
    }
    
    return 0;
}

PowerShell Affinity Management:

PowerShell affinity commands
PowerShell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Get all processes and their affinity
Get-Process | Select-Object Name, Id, ProcessorAffinity
 
# Set affinity for a specific process
$process = Get-Process -Id 1234
$process.ProcessorAffinity = 0x0F  # CPUs 0-3
 
# Set affinity when starting a new process
$proc = Start-Process -FilePath "C:\app.exe" -PassThru
$proc.ProcessorAffinity = 0x03  # CPUs 0-1
 
# Function to set affinity by name
function Set-ProcessAffinity {
    param(
        [string]$ProcessName,
        [int]$AffinityMask
    )
    
    Get-Process -Name $ProcessName -ErrorAction SilentlyContinue |
    ForEach-Object {
        $_.ProcessorAffinity = $AffinityMask
        Write-Host "Set $($_.Name) (PID: $($_.Id)) to mask: $AffinityMask"
    }
}
 
# Usage
Set-ProcessAffinity -ProcessName "notepad" -AffinityMask 0x01

Windows Processor Groups

On Windows systems with more than 64 processors, CPUs are organized into 'processor groups.' The basic affinity APIs work within a single group. For cross-group affinity, use SetThreadGroupAffinity() and GetThreadGroupAffinity() with GROUP_AFFINITY structures.

Language-Specific Approaches

Different programming languages provide varying levels of affinity control. Let's examine common approaches.

Java: JNI or External Configuration

Java doesn't provide built-in affinity APIs (the JVM manages threads). Options include:

Java affinity options
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Option 1: Launch JVM with taskset (simplest)
// taskset -c 0-7 java -jar myapp.jar
 
// Option 2: Use Java-Thread-Affinity library (OpenHFT)
// Maven: net.openhft:affinity
 
import net.openhft.affinity.AffinityLock;
 
public class AffinityExample {
    public void pinnedWorker() {
        // Acquire lock on a CPU (thread pinned while lock held)
        try (AffinityLock lock = AffinityLock.acquireLock()) {
            System.out.println("Thread on CPU: " + lock.cpuId());
            
            // Do latency-sensitive work
            doWork();
            
        } // Lock released, thread can migrate again
    }
    
    public void specificCpu() {
        // Pin to specific CPU
        try (AffinityLock lock = AffinityLock.acquireLock(3)) {
            System.out.println("Pinned to CPU 3");
            doWork();
        }
    }
    
    // Option 3: JNI for direct sched_setaffinity access
    // (Requires native library)
    public native void setAffinity(long mask);
}

Go: Runtime and CGO

Go's scheduler (goroutines on OS threads) complicates affinity. The runtime may migrate goroutines between OS threads. For strict affinity:

affinity.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
package main
 
/*
#define _GNU_SOURCE
#include <sched.h>
#include <pthread.h>
 
void pin_thread(int cpu) {
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(cpu, &cpuset);
    pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
}
*/
import "C"
import (
    "fmt"
    "runtime"
)
 
func pinnedWorker(cpu int, done chan bool) {
    // Lock this goroutine to its current OS thread
    runtime.LockOSThread()
    defer runtime.UnlockOSThread()
    
    // Pin the OS thread to specific CPU
    C.pin_thread(C.int(cpu))
    
    fmt.Printf("Goroutine pinned to CPU %d\n", cpu)
    
    // Do work (goroutine won't migrate to different OS thread)
    for i := 0; i < 1000000000; i++ {
        _ = i * 2
    }
    
    done <- true
}
 
func main() {
    numCPU := runtime.NumCPU()
    fmt.Printf("NumCPU: %d\n", numCPU)
    
    // Set GOMAXPROCS to use all CPUs
    runtime.GOMAXPROCS(numCPU)
    
    done := make(chan bool, 4)
    
    // Create pinned workers
    for i := 0; i < 4; i++ {
        go pinnedWorker(i, done)
    }
    
    // Wait for completion
    for i := 0; i < 4; i++ {
        <-done
    }
}

Python: os module and ctypes

affinity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import os
import ctypes
import ctypes.util
 
# Method 1: Using os.sched_setaffinity (Python 3.3+)
def set_affinity_simple(cpus):
    """
    Set affinity to a set of CPUs.
    cpus: set or list of CPU numbers
    """
    pid = 0  # 0 = current process
    os.sched_setaffinity(pid, set(cpus))
    
def get_affinity():
    """Get current affinity as a set of CPU numbers."""
    return os.sched_getaffinity(0)
 
# Example
print(f"Initial affinity: {get_affinity()}")
set_affinity_simple([0, 2, 4])
print(f"After setting: {get_affinity()}")
 
# Method 2: For threads, using threading + ctypes
import threading
import multiprocessing
 
def pinned_worker(cpu_id):
    """Worker that pins itself to a specific CPU."""
    # Get thread ID (Linux-specific)
    libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
    SYS_gettid = 186  # syscall number for gettid
    tid = libc.syscall(SYS_gettid)
    
    # Set affinity using sched_setaffinity
    # Note: This affects the current thread
    os.sched_setaffinity(0, {cpu_id})
    
    print(f"Thread {threading.current_thread().name} (TID {tid}) "
          f"pinned to CPU {cpu_id}")
    
    # Do work
    total = sum(range(100000000))
    
    print(f"Thread finished, sum={total}")
 
# Create pinned threads
threads = []
num_cpus = multiprocessing.cpu_count()
for i in range(4):
    t = threading.Thread(target=pinned_worker, args=(i % num_cpus,))
    threads.append(t)
    t.start()
 
for t in threads:
    t.join()

GIL and CPU Affinity

Python's Global Interpreter Lock (GIL) means only one thread executes Python bytecode at a time. Affinity helps with CPU-bound C extensions or when using multiprocessing instead of threading. For pure Python threading, affinity has limited benefit due to the GIL.

Container and Virtualization Considerations

In containerized and virtualized environments, affinity has additional layers and considerations.

Docker CPU Affinity:

Docker CPU configuration
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# ===== DOCKER CPU AFFINITY =====
 
# Pin container to specific CPUs
docker run --cpuset-cpus="0,1,2,3" myimage
 
# Pin to CPU range
docker run --cpuset-cpus="0-3" myimage
 
# Limit number of CPUs (not specific binding)
docker run --cpus="2.5" myimage
 
# In docker-compose.yml
services:
  database:
    image: postgres:latest
    cpuset: "0-3"           # Bind to CPUs 0-3
    cpu_count: 4
    
  webserver:
    image: nginx:latest
    cpuset: "4-7"           # Bind to CPUs 4-7
 
# ===== KUBERNETES CPU PINNING =====
 
# In pod spec (requires CPUManager policy = static)
# kubelet flag: --cpu-manager-policy=static
 
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: myapp
    resources:
      requests:
        cpu: "4"           # Request whole CPUs
        memory: "8Gi"
      limits:
        cpu: "4"           # Same as request for guaranteed QoS
        memory: "8Gi"
 
# The kubelet's CPU Manager will assign exclusive CPUs
# View assignment:
kubectl get pod mypod -o yaml | grep -A5 "cpuManagerPolicy"

Virtual Machine CPU Pinning:

VM hypervisors allow pinning virtual CPUs (vCPUs) to physical CPUs:

VM CPU pinning examples
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# ===== LIBVIRT/KVM =====
 
# In VM XML definition
<vcpu placement='static'>4</vcpu>
<cputune>
  <vcpupin vcpu='0' cpuset='0'/>
  <vcpupin vcpu='1' cpuset='1'/>
  <vcpupin vcpu='2' cpuset='2'/>
  <vcpupin vcpu='3' cpuset='3'/>
  <emulatorpin cpuset='4'/>  <!-- Emulator on separate CPU -->
</cputune>
 
# Apply at runtime
virsh vcpupin myvm 0 0       # vCPU 0 -> pCPU 0
virsh vcpupin myvm 1 1       # vCPU 1 -> pCPU 1
 
# View current pinning
virsh vcpuinfo myvm
 
# ===== VMWARE VSPHERE =====
 
# In VM settings:
# CPU -> Scheduling Affinity -> Set processor mask
# Or via PowerCLI:
$vm = Get-VM -Name "MyVM"
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
$spec.CpuAffinity = New-Object VMware.Vim.VirtualMachineAffinityInfo
$spec.CpuAffinity.AffinitySet = @(0, 1, 2, 3)  # pCPUs to use
$vm.ExtensionData.ReconfigVM($spec)
 
# ===== HYPER-V =====
 
# PowerShell:
Set-VMProcessor -VMName "MyVM" -CompatibilityForMigrationEnabled $false
Set-VMProcessor -VMName "MyVM" -Count 4
# Note: Hyper-V uses dynamic scheduling by default
# For strict pinning, use processor groups and NUMA configuration

vCPU-to-pCPU Mapping

When pinning VMs, consider NUMA topology. Pin vCPUs to physical CPUs on the same NUMA node as the VM's memory. Many hypervisors have NUMA-aware scheduling that does this automatically, but explicit pinning provides guarantees for latency-sensitive workloads.

Best Practices for Production Affinity

Affinity configuration in production requires careful planning. Here are key best practices:

1. Understand Your Workload First

Workload Analysis Checklist

•Working Set Size: Is it small enough to fit in L1/L2/L3? Larger working sets benefit less from pinning.
•Communication Pattern: Do threads share data? Consider co-locating them on same LLC domain.
•Memory Access Pattern: Random vs. sequential? NUMA-local vs. cross-node?
•Latency vs. Throughput: Latency-sensitive → strong pinning. Throughput-focused → let scheduler balance.
•Burst vs. Steady: Bursty workloads may benefit from access to all CPUs for absorption.

2. Leave Headroom for System Processes

Never pin application threads to all CPUs. Reserve some for:

Kernel threads (kworker, ksoftirqd)
System services (systemd, sshd)
Monitoring agents
Interrupt handling

CPU allocation strategy
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Example: 16 CPU system
 
# Reserve CPUs 0-1 for system
# This is also where most interrupts are handled by default
 
# Application CPUs: 2-15
taskset -c 2-15 ./my_application
 
# For maximum isolation, use isolcpus boot parameter
# (in /etc/default/grub: GRUB_CMDLINE_LINUX="isolcpus=2-15")
# Then only explicitly pinned processes run on 2-15
 
# Alternatively, use cgroups for system reservation
# Create cgroup for system processes limited to CPUs 0-1
# Application gets CPUs 2-15 via cpuset cgroup

3. Align with NUMA Topology

NUMA-aligned affinity
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Discover NUMA topology
numactl --hardware
 
# Example output (2-node system):
# node 0 cpus: 0 1 2 3 4 5 6 7
# node 1 cpus: 8 9 10 11 12 13 14 15
# node distances:
# node   0   1
#   0:  10  21
#   1:  21  10
 
# GOOD: Align CPU and memory to same node
numactl --cpunodebind=0 --membind=0 ./database
 
# BAD: CPUs on node 0, but memory on node 1
numactl --cpunodebind=0 --membind=1 ./database  # Don't do this!
 
# For multi-instance deployments:
# Instance 1: NUMA node 0
numactl --cpunodebind=0 --membind=0 ./instance1 &
 
# Instance 2: NUMA node 1  
numactl --cpunodebind=1 --membind=1 ./instance2 &

4. Use Systemd for Persistent Configuration

Systemd affinity configuration
INI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# /etc/systemd/system/myapp.service
[Unit]
Description=My Application
After=network.target
 
[Service]
Type=simple
ExecStart=/opt/myapp/bin/myapp
Restart=always
 
# CPU Affinity - list of CPUs or ranges
CPUAffinity=2-7
 
# NUMA binding (alternative to numactl)
NUMAPolicy=bind
NUMAMask=0
 
# Nice priority (for additional control)
Nice=-10
 
# Memory locking (for RT workloads)
LimitMEMLOCK=infinity
 
[Install]
WantedBy=multi-user.target

Always Measure Impact

Never deploy affinity changes without benchmarking. Affinity can help OR hurt depending on workload. Test with and without affinity, measuring both performance metrics (throughput, latency) and system metrics (CPU utilization, cache misses, NUMA statistics). What works in one environment may not work in another.

Summary: Your Affinity Toolkit

You now have a comprehensive toolkit for setting processor affinity across platforms and contexts. Let's consolidate:

Key Takeaways

•Linux system calls (sched_setaffinity, sched_getaffinity) provide low-level control with full error handling.
•Pthreads APIs (pthread_setaffinity_np) enable per-thread affinity for multithreaded applications.
•Command-line tools (taskset, numactl) allow affinity control without code modifications.
•Windows APIs (SetProcessAffinityMask, SetThreadAffinityMask) provide equivalent control on Windows.
•Language-specific solutions vary—Java and Go require runtime considerations; Python has built-in support.
•Containers and VMs have their own affinity layers—Docker cpuset, K8s CPUManager, hypervisor vCPU pinning.
•Production best practices include workload analysis, NUMA alignment, system CPU reservation, and measurement.

What's Next:

We've covered the theory (soft/hard affinity, cache effects) and practice (setting affinity APIs and tools). In our final page, we'll examine performance implications—when affinity helps, when it hurts, how to measure its impact, and case studies of affinity optimization in real-world systems.

Page Complete

You now know how to set processor affinity using system calls, threading libraries, command-line tools, and platform-specific APIs. You can configure affinity for processes, threads, containers, and VMs. Next, we'll explore the performance implications of these decisions.

4 / 5

Loading learning content...

Operating SystemsCPU Scheduling Advanced

Processor Affinity

LevelIntermediate

Duration60 mins

TopicCPU Scheduling Advanced

4 / 5

Setting Affinity

From Theory to Practice

This page serves as a comprehensive practical guide, covering:

System calls for low-level control (Linux and Windows)
Thread library APIs for multithreaded applications
Command-line tools for quick adjustments without code changes
Language-specific approaches for managed runtimes
Container and VM considerations for cloud-native deployments
Best practices for production affinity configuration

By the end of this page, you'll have a complete toolkit for configuring affinity in virtually any environment.

What You Will Learn

Linux System Calls In Depth

Linux provides sched_setaffinity() and sched_getaffinity() for process-level affinity control. Let's examine these in detail, including error handling, capabilities, and edge cases.

Complete API Reference:

Linux affinity system calls - complete reference
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
 
/*
 * sched_setaffinity() - Set the CPU affinity mask
 *
 * @pid: Process ID. Use 0 for calling process, or specific PID.
 *       For threads, use gettid() not pthread_self().
 * @cpusetsize: Size in bytes of the mask buffer.
 * @mask: Pointer to cpu_set_t containing allowed CPUs.
 *
 * Returns 0 on success, -1 on error with errno set.
 *
 * Errors:
 *   EFAULT: mask points to invalid address
 *   EINVAL: No CPUs in mask are online, or cpusetsize is wrong
 *   EPERM:  Caller lacks CAP_SYS_NICE and isn't process owner
 *   ESRCH:  No process with given PID exists
 */
int sched_setaffinity(pid_t pid, size_t cpusetsize,
                      const cpu_set_t *mask);
 
/*
 * sched_getaffinity() - Get the CPU affinity mask
 *
 * Same parameters as sched_setaffinity(), but 'mask' is output.
 * Kernel writes current affinity to *mask.
 *
 * Note: cpusetsize must be large enough for all possible CPUs.
 *       Use sizeof(cpu_set_t) or CPU_ALLOC_SIZE(num_cpus).
 */
int sched_getaffinity(pid_t pid, size_t cpusetsize,
                      cpu_set_t *mask);
 
/*
 * sched_getcpu() - Get the CPU currently executing this thread
 *
 * Returns CPU number (0-based), or -1 on error.
 * This is extremely fast (uses VDSO, no syscall in practice).
 */
int sched_getcpu(void);

Robust Implementation with Error Handling:

affinity_utils.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/syscall.h>
 
/* Get Thread ID (for per-thread affinity) */
pid_t gettid(void) {
    return syscall(SYS_gettid);
}
 
/* Get number of configured (not necessarily online) CPUs */
int get_num_cpus(void) {
    return sysconf(_SC_NPROCESSORS_CONF);
}
 
/* Get number of currently online CPUs */
int get_online_cpus(void) {
    return sysconf(_SC_NPROCESSORS_ONLN);
}
 
/* Print current affinity mask in human-readable form */
void print_affinity(pid_t pid, const char *label) {
    cpu_set_t mask;
    CPU_ZERO(&mask);
    
    if (sched_getaffinity(pid, sizeof(mask), &mask) == -1) {
        fprintf(stderr, "sched_getaffinity(%d): %s\n", 
                pid, strerror(errno));
        return;
    }
    
    printf("%s: ", label);
    int first = 1;
    for (int i = 0; i < CPU_SETSIZE; i++) {
        if (CPU_ISSET(i, &mask)) {
            printf("%s%d", first ? "" : ",", i);
            first = 0;
        }
    }
    printf(" (count: %d)\n", CPU_COUNT(&mask));
}
 
/* Set affinity with full error handling */
int set_affinity_safe(pid_t pid, const cpu_set_t *mask) {
    int retries = 3;
    
    while (retries-- > 0) {
        if (sched_setaffinity(pid, sizeof(cpu_set_t), mask) == 0) {
            return 0;  /* Success */
        }
        
        switch (errno) {
        case EINTR:
            /* Interrupted, retry */
            continue;
            
        case EINVAL:
            fprintf(stderr, "set_affinity: No valid CPUs in mask\n");
            return -1;
            
        case EPERM:
            fprintf(stderr, "set_affinity: Permission denied. "
                    "Need CAP_SYS_NICE or ownership.\n");
            return -1;
            
        case ESRCH:
            fprintf(stderr, "set_affinity: Process %d not found\n", pid);
            return -1;
            
        default:
            fprintf(stderr, "set_affinity: Unexpected error: %s\n",
                    strerror(errno));
            return -1;
        }
    }
    
    return -1;
}
 
/* Pin to a single CPU */
int pin_to_cpu(pid_t pid, int cpu) {
    cpu_set_t mask;
    CPU_ZERO(&mask);
    
    /* Validate CPU number */
    if (cpu < 0 || cpu >= get_num_cpus()) {
        fprintf(stderr, "Invalid CPU: %d (system has %d CPUs)\n",
                cpu, get_num_cpus());
        return -1;
    }
    
    CPU_SET(cpu, &mask);
    return set_affinity_safe(pid, &mask);
}
 
/* Pin to a range of CPUs */
int pin_to_range(pid_t pid, int start, int end) {
    cpu_set_t mask;
    CPU_ZERO(&mask);
    
    for (int i = start; i <= end && i < get_num_cpus(); i++) {
        CPU_SET(i, &mask);
    }
    
    if (CPU_COUNT(&mask) == 0) {
        fprintf(stderr, "No valid CPUs in range %d-%d\n", start, end);
        return -1;
    }
    
    return set_affinity_safe(pid, &mask);
}
 
/* Example usage */
int main() {
    printf("System has %d CPUs configured, %d online\n",
           get_num_cpus(), get_online_cpus());
    
    print_affinity(0, "Initial");
    
    /* Pin to CPU 0 */
    if (pin_to_cpu(0, 0) == 0) {
        print_affinity(0, "After pin_to_cpu(0)");
        printf("Currently on CPU %d\n", sched_getcpu());
    }
    
    /* Pin to CPUs 0-3 */
    if (pin_to_range(0, 0, 3) == 0) {
        print_affinity(0, "After pin_to_range(0,3)");
    }
    
    return 0;
}

Large CPU Sets

Pthreads Affinity Control

For multithreaded applications, the pthreads library provides thread-specific affinity control. This is essential for NUMA-aware thread pools and per-core processing.

Setting Affinity for Existing Threads:

pthread_affinity.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
#define NUM_THREADS 4
 
typedef struct {
    int thread_id;
    int target_cpu;
} thread_arg_t;
 
void* worker(void* arg) {
    thread_arg_t* targ = (thread_arg_t*)arg;
    cpu_set_t cpuset;
    
    /* Pin this thread to its designated CPU */
    CPU_ZERO(&cpuset);
    CPU_SET(targ->target_cpu, &cpuset);
    
    int rc = pthread_setaffinity_np(pthread_self(),
                                     sizeof(cpuset), &cpuset);
    if (rc != 0) {
        fprintf(stderr, "Thread %d: pthread_setaffinity_np failed: %d\n",
                targ->thread_id, rc);
        return NULL;
    }
    
    /* Verify placement */
    CPU_ZERO(&cpuset);
    pthread_getaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
    
    printf("Thread %d: requested CPU %d, running on CPU %d\n",
           targ->thread_id, targ->target_cpu, sched_getcpu());
    
    /* Worker loop - always runs on target CPU */
    volatile long sum = 0;
    for (long i = 0; i < 1000000000L; i++) {
        sum += i;
    }
    
    printf("Thread %d finished on CPU %d\n", 
           targ->thread_id, sched_getcpu());
    
    return NULL;
}
 
int main() {
    pthread_t threads[NUM_THREADS];
    thread_arg_t args[NUM_THREADS];
    int num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
    
    printf("System has %d CPUs, creating %d threads\n", 
           num_cpus, NUM_THREADS);
    
    for (int i = 0; i < NUM_THREADS; i++) {
        args[i].thread_id = i;
        args[i].target_cpu = i % num_cpus;  /* Round-robin CPUs */
        
        int rc = pthread_create(&threads[i], NULL, worker, &args[i]);
        if (rc != 0) {
            fprintf(stderr, "pthread_create failed: %d\n", rc);
            exit(1);
        }
    }
    
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
    }
    
    return 0;
}

Setting Affinity at Thread Creation:

For optimal cache behavior, set affinity before the thread starts executing:

pthread_attr_affinity.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
 
void* worker(void* arg) {
    int thread_id = *(int*)arg;
    
    /* Thread starts on correct CPU from the beginning */
    printf("Thread %d: started on CPU %d\n", 
           thread_id, sched_getcpu());
    
    /* ... work ... */
    
    return NULL;
}
 
pthread_t create_pinned_thread(int target_cpu, 
                                void* (*func)(void*), 
                                void* arg) {
    pthread_t thread;
    pthread_attr_t attr;
    cpu_set_t cpuset;
    int rc;
    
    /* Initialize attributes */
    rc = pthread_attr_init(&attr);
    if (rc != 0) {
        fprintf(stderr, "pthread_attr_init failed\n");
        return 0;
    }
    
    /* Set affinity in attributes */
    CPU_ZERO(&cpuset);
    CPU_SET(target_cpu, &cpuset);
    
    rc = pthread_attr_setaffinity_np(&attr, sizeof(cpuset), &cpuset);
    if (rc != 0) {
        fprintf(stderr, "pthread_attr_setaffinity_np failed: %d\n", rc);
        pthread_attr_destroy(&attr);
        return 0;
    }
    
    /* Create thread with these attributes */
    rc = pthread_create(&thread, &attr, func, arg);
    if (rc != 0) {
        fprintf(stderr, "pthread_create failed: %d\n", rc);
        pthread_attr_destroy(&attr);
        return 0;
    }
    
    pthread_attr_destroy(&attr);
    return thread;
}
 
int main() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};
    
    for (int i = 0; i < 4; i++) {
        threads[i] = create_pinned_thread(i, worker, &ids[i]);
    }
    
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);
    }
    
    return 0;
}

Why Set Affinity Before Start?

Command-Line Tools for Affinity

Command-line tools enable affinity control without code modifications—invaluable for testing, deployment scripts, and production tuning.

taskset: The Essential Tool

taskset comprehensive examples
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# ===== LAUNCHING PROCESSES WITH AFFINITY =====
 
# Single CPU using hex mask
taskset 0x1 ./app                    # CPU 0 only
taskset 0x2 ./app                    # CPU 1 only
taskset 0x4 ./app                    # CPU 2 only
 
# Using -c for human-readable CPU list
taskset -c 0 ./app                   # CPU 0 only
taskset -c 0,2,4 ./app               # CPUs 0, 2, 4
taskset -c 0-7 ./app                 # CPUs 0 through 7
taskset -c 0-3,8-11 ./app            # CPUs 0-3 and 8-11
 
# ===== QUERYING AFFINITY =====
 
# Get affinity as hex mask
taskset -p 1234
# Output: pid 1234's current affinity mask: ff
 
# Get affinity as CPU list (easier to read)
taskset -cp 1234
# Output: pid 1234's current affinity list: 0-7
 
# ===== MODIFYING RUNNING PROCESSES =====
 
# Change affinity of running process (hex mask)
taskset -p 0x3 1234                  # Set to CPUs 0,1
 
# Change affinity of running process (CPU list)
taskset -cp 0-3 1234                 # Set to CPUs 0-3
 
# Change all threads of a process
taskset -acp 0-3 1234                # -a = all tasks/threads
 
# ===== COMBINING WITH OTHER TOOLS =====
 
# With nice (priority)
nice -n -10 taskset -c 0-3 ./high_priority_app
 
# With numactl (NUMA binding)
numactl --membind=0 taskset -c 0-7 ./app
 
# With cgroups (resource limits)
cgexec -g cpu:mygroup taskset -c 0-7 ./app
 
# In a script with logging
log_and_run() {
    echo "$(date): Starting $1 on CPUs $2"
    taskset -c "$2" "$1" &
    echo "PID: $!"
}
 
log_and_run ./database 0-3
log_and_run ./webserver 4-7

numactl: NUMA-Aware Affinity

For NUMA systems, numactl provides both CPU and memory placement:

numactl comprehensive examples
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# ===== DISPLAYING NUMA TOPOLOGY =====
 
numactl --hardware
# Shows: nodes, CPUs per node, memory per node, inter-node distances
 
numactl --show
# Shows: current policy and bindings
 
# ===== CPU BINDING =====
 
# Run on all CPUs of NUMA node 0
numactl --cpunodebind=0 ./app
 
# Run on multiple NUMA nodes
numactl --cpunodebind=0,1 ./app
 
# Run on specific physical CPUs
numactl --physcpubind=0,2,4,6 ./app
 
# ===== MEMORY BINDING =====
 
# Allocate memory only from node 0
numactl --membind=0 ./app
 
# Preferred allocation from node 0 (fallback to others if full)
numactl --preferred=0 ./app
 
# Interleave memory across all nodes (good for shared data)
numactl --interleave=all ./app
 
# ===== COMBINED CPU + MEMORY BINDING =====
 
# Bind both CPU and memory to node 0 (optimal locality)
numactl --cpunodebind=0 --membind=0 ./app
 
# Run on node 0 CPUs, interleave memory (for multi-node sharing)
numactl --cpunodebind=0 --interleave=all ./app
 
# ===== MONITORING NUMA BEHAVIOR =====
 
# Check NUMA statistics for a process
numastat -p <pid>
# Shows: pages per NUMA node for this process
 
# System-wide NUMA statistics
numastat -m
# Shows: memory per node, numa_hit, numa_miss, etc.
 
# Watch NUMA misses in real-time
watch -n1 'numastat | grep numa_miss'

Production Best Practice

Windows Affinity APIs

Windows provides comprehensive affinity control through Win32 APIs. The concepts are similar to Linux but with different naming.

Process Affinity:

Windows process affinity
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <windows.h>
#include <stdio.h>
 
int main() {
    HANDLE hProcess = GetCurrentProcess();
    DWORD_PTR processAffinityMask;
    DWORD_PTR systemAffinityMask;
    
    /* Get current affinity masks */
    if (!GetProcessAffinityMask(hProcess, 
                                 &processAffinityMask, 
                                 &systemAffinityMask)) {
        printf("GetProcessAffinityMask failed: %lu\n", GetLastError());
        return 1;
    }
    
    printf("System mask: 0x%llx\n", (unsigned long long)systemAffinityMask);
    printf("Process mask: 0x%llx\n", (unsigned long long)processAffinityMask);
    
    /* Set process to run on CPUs 0 and 1 only */
    DWORD_PTR newMask = 0x3;  /* Binary: 11 = CPUs 0 and 1 */
    
    if (!SetProcessAffinityMask(hProcess, newMask)) {
        printf("SetProcessAffinityMask failed: %lu\n", GetLastError());
        return 1;
    }
    
    printf("Process affinity set to: 0x%llx\n", (unsigned long long)newMask);
    
    /* Verify change */
    GetProcessAffinityMask(hProcess, &processAffinityMask, &systemAffinityMask);
    printf("New process mask: 0x%llx\n", (unsigned long long)processAffinityMask);
    
    return 0;
}

Thread Affinity:

Windows thread affinity
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <windows.h>
#include <stdio.h>
 
DWORD WINAPI WorkerThread(LPVOID lpParam) {
    int threadId = (int)(intptr_t)lpParam;
    HANDLE hThread = GetCurrentThread();
    
    /* Pin this thread to a specific CPU */
    DWORD_PTR mask = 1ULL << threadId;  /* CPU = thread ID */
    
    DWORD_PTR previousMask = SetThreadAffinityMask(hThread, mask);
    if (previousMask == 0) {
        printf("Thread %d: SetThreadAffinityMask failed\n", threadId);
        return 1;
    }
    
    /* For soft affinity hint (preferred CPU) */
    DWORD previousIdeal = SetThreadIdealProcessor(hThread, threadId);
    
    /* Verify current processor */
    printf("Thread %d: running, ideal processor set\n", threadId);
    
    /* Do work */
    volatile long sum = 0;
    for (long i = 0; i < 1000000000L; i++) {
        sum += i;
    }
    
    return 0;
}
 
int main() {
    HANDLE threads[4];
    SYSTEM_INFO sysInfo;
    
    GetSystemInfo(&sysInfo);
    printf("System has %lu processors\n", sysInfo.dwNumberOfProcessors);
    
    /* Create worker threads */
    for (int i = 0; i < 4; i++) {
        threads[i] = CreateThread(
            NULL,                /* Default security */
            0,                   /* Default stack size */
            WorkerThread,        /* Thread function */
            (LPVOID)(intptr_t)i, /* Thread argument */
            0,                   /* Start immediately */
            NULL                 /* Don't need thread ID */
        );
        
        if (threads[i] == NULL) {
            printf("CreateThread failed: %lu\n", GetLastError());
            return 1;
        }
    }
    
    /* Wait for threads */
    WaitForMultipleObjects(4, threads, TRUE, INFINITE);
    
    /* Clean up */
    for (int i = 0; i < 4; i++) {
        CloseHandle(threads[i]);
    }
    
    return 0;
}

PowerShell Affinity Management:

PowerShell affinity commands
PowerShell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Get all processes and their affinity
Get-Process | Select-Object Name, Id, ProcessorAffinity
 
# Set affinity for a specific process
$process = Get-Process -Id 1234
$process.ProcessorAffinity = 0x0F  # CPUs 0-3
 
# Set affinity when starting a new process
$proc = Start-Process -FilePath "C:\app.exe" -PassThru
$proc.ProcessorAffinity = 0x03  # CPUs 0-1
 
# Function to set affinity by name
function Set-ProcessAffinity {
    param(
        [string]$ProcessName,
        [int]$AffinityMask
    )
    
    Get-Process -Name $ProcessName -ErrorAction SilentlyContinue |
    ForEach-Object {
        $_.ProcessorAffinity = $AffinityMask
        Write-Host "Set $($_.Name) (PID: $($_.Id)) to mask: $AffinityMask"
    }
}
 
# Usage
Set-ProcessAffinity -ProcessName "notepad" -AffinityMask 0x01

Windows Processor Groups

Language-Specific Approaches

Different programming languages provide varying levels of affinity control. Let's examine common approaches.

Java: JNI or External Configuration

Java doesn't provide built-in affinity APIs (the JVM manages threads). Options include:

Java affinity options
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Option 1: Launch JVM with taskset (simplest)
// taskset -c 0-7 java -jar myapp.jar
 
// Option 2: Use Java-Thread-Affinity library (OpenHFT)
// Maven: net.openhft:affinity
 
import net.openhft.affinity.AffinityLock;
 
public class AffinityExample {
    public void pinnedWorker() {
        // Acquire lock on a CPU (thread pinned while lock held)
        try (AffinityLock lock = AffinityLock.acquireLock()) {
            System.out.println("Thread on CPU: " + lock.cpuId());
            
            // Do latency-sensitive work
            doWork();
            
        } // Lock released, thread can migrate again
    }
    
    public void specificCpu() {
        // Pin to specific CPU
        try (AffinityLock lock = AffinityLock.acquireLock(3)) {
            System.out.println("Pinned to CPU 3");
            doWork();
        }
    }
    
    // Option 3: JNI for direct sched_setaffinity access
    // (Requires native library)
    public native void setAffinity(long mask);
}

Go: Runtime and CGO

Go's scheduler (goroutines on OS threads) complicates affinity. The runtime may migrate goroutines between OS threads. For strict affinity:

affinity.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
package main
 
/*
#define _GNU_SOURCE
#include <sched.h>
#include <pthread.h>
 
void pin_thread(int cpu) {
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(cpu, &cpuset);
    pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
}
*/
import "C"
import (
    "fmt"
    "runtime"
)
 
func pinnedWorker(cpu int, done chan bool) {
    // Lock this goroutine to its current OS thread
    runtime.LockOSThread()
    defer runtime.UnlockOSThread()
    
    // Pin the OS thread to specific CPU
    C.pin_thread(C.int(cpu))
    
    fmt.Printf("Goroutine pinned to CPU %d\n", cpu)
    
    // Do work (goroutine won't migrate to different OS thread)
    for i := 0; i < 1000000000; i++ {
        _ = i * 2
    }
    
    done <- true
}
 
func main() {
    numCPU := runtime.NumCPU()
    fmt.Printf("NumCPU: %d\n", numCPU)
    
    // Set GOMAXPROCS to use all CPUs
    runtime.GOMAXPROCS(numCPU)
    
    done := make(chan bool, 4)
    
    // Create pinned workers
    for i := 0; i < 4; i++ {
        go pinnedWorker(i, done)
    }
    
    // Wait for completion
    for i := 0; i < 4; i++ {
        <-done
    }
}

Python: os module and ctypes

affinity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import os
import ctypes
import ctypes.util
 
# Method 1: Using os.sched_setaffinity (Python 3.3+)
def set_affinity_simple(cpus):
    """
    Set affinity to a set of CPUs.
    cpus: set or list of CPU numbers
    """
    pid = 0  # 0 = current process
    os.sched_setaffinity(pid, set(cpus))
    
def get_affinity():
    """Get current affinity as a set of CPU numbers."""
    return os.sched_getaffinity(0)
 
# Example
print(f"Initial affinity: {get_affinity()}")
set_affinity_simple([0, 2, 4])
print(f"After setting: {get_affinity()}")
 
# Method 2: For threads, using threading + ctypes
import threading
import multiprocessing
 
def pinned_worker(cpu_id):
    """Worker that pins itself to a specific CPU."""
    # Get thread ID (Linux-specific)
    libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
    SYS_gettid = 186  # syscall number for gettid
    tid = libc.syscall(SYS_gettid)
    
    # Set affinity using sched_setaffinity
    # Note: This affects the current thread
    os.sched_setaffinity(0, {cpu_id})
    
    print(f"Thread {threading.current_thread().name} (TID {tid}) "
          f"pinned to CPU {cpu_id}")
    
    # Do work
    total = sum(range(100000000))
    
    print(f"Thread finished, sum={total}")
 
# Create pinned threads
threads = []
num_cpus = multiprocessing.cpu_count()
for i in range(4):
    t = threading.Thread(target=pinned_worker, args=(i % num_cpus,))
    threads.append(t)
    t.start()
 
for t in threads:
    t.join()

GIL and CPU Affinity

Container and Virtualization Considerations

In containerized and virtualized environments, affinity has additional layers and considerations.

Docker CPU Affinity:

Docker CPU configuration
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# ===== DOCKER CPU AFFINITY =====
 
# Pin container to specific CPUs
docker run --cpuset-cpus="0,1,2,3" myimage
 
# Pin to CPU range
docker run --cpuset-cpus="0-3" myimage
 
# Limit number of CPUs (not specific binding)
docker run --cpus="2.5" myimage
 
# In docker-compose.yml
services:
  database:
    image: postgres:latest
    cpuset: "0-3"           # Bind to CPUs 0-3
    cpu_count: 4
    
  webserver:
    image: nginx:latest
    cpuset: "4-7"           # Bind to CPUs 4-7
 
# ===== KUBERNETES CPU PINNING =====
 
# In pod spec (requires CPUManager policy = static)
# kubelet flag: --cpu-manager-policy=static
 
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: myapp
    resources:
      requests:
        cpu: "4"           # Request whole CPUs
        memory: "8Gi"
      limits:
        cpu: "4"           # Same as request for guaranteed QoS
        memory: "8Gi"
 
# The kubelet's CPU Manager will assign exclusive CPUs
# View assignment:
kubectl get pod mypod -o yaml | grep -A5 "cpuManagerPolicy"

Virtual Machine CPU Pinning:

VM hypervisors allow pinning virtual CPUs (vCPUs) to physical CPUs:

VM CPU pinning examples
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# ===== LIBVIRT/KVM =====
 
# In VM XML definition
<vcpu placement='static'>4</vcpu>
<cputune>
  <vcpupin vcpu='0' cpuset='0'/>
  <vcpupin vcpu='1' cpuset='1'/>
  <vcpupin vcpu='2' cpuset='2'/>
  <vcpupin vcpu='3' cpuset='3'/>
  <emulatorpin cpuset='4'/>  <!-- Emulator on separate CPU -->
</cputune>
 
# Apply at runtime
virsh vcpupin myvm 0 0       # vCPU 0 -> pCPU 0
virsh vcpupin myvm 1 1       # vCPU 1 -> pCPU 1
 
# View current pinning
virsh vcpuinfo myvm
 
# ===== VMWARE VSPHERE =====
 
# In VM settings:
# CPU -> Scheduling Affinity -> Set processor mask
# Or via PowerCLI:
$vm = Get-VM -Name "MyVM"
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
$spec.CpuAffinity = New-Object VMware.Vim.VirtualMachineAffinityInfo
$spec.CpuAffinity.AffinitySet = @(0, 1, 2, 3)  # pCPUs to use
$vm.ExtensionData.ReconfigVM($spec)
 
# ===== HYPER-V =====
 
# PowerShell:
Set-VMProcessor -VMName "MyVM" -CompatibilityForMigrationEnabled $false
Set-VMProcessor -VMName "MyVM" -Count 4
# Note: Hyper-V uses dynamic scheduling by default
# For strict pinning, use processor groups and NUMA configuration

vCPU-to-pCPU Mapping

Best Practices for Production Affinity

Affinity configuration in production requires careful planning. Here are key best practices:

1. Understand Your Workload First

Workload Analysis Checklist

•Working Set Size: Is it small enough to fit in L1/L2/L3? Larger working sets benefit less from pinning.
•Communication Pattern: Do threads share data? Consider co-locating them on same LLC domain.
•Memory Access Pattern: Random vs. sequential? NUMA-local vs. cross-node?
•Latency vs. Throughput: Latency-sensitive → strong pinning. Throughput-focused → let scheduler balance.
•Burst vs. Steady: Bursty workloads may benefit from access to all CPUs for absorption.

2. Leave Headroom for System Processes

Never pin application threads to all CPUs. Reserve some for:

Kernel threads (kworker, ksoftirqd)
System services (systemd, sshd)
Monitoring agents
Interrupt handling

CPU allocation strategy
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Example: 16 CPU system
 
# Reserve CPUs 0-1 for system
# This is also where most interrupts are handled by default
 
# Application CPUs: 2-15
taskset -c 2-15 ./my_application
 
# For maximum isolation, use isolcpus boot parameter
# (in /etc/default/grub: GRUB_CMDLINE_LINUX="isolcpus=2-15")
# Then only explicitly pinned processes run on 2-15
 
# Alternatively, use cgroups for system reservation
# Create cgroup for system processes limited to CPUs 0-1
# Application gets CPUs 2-15 via cpuset cgroup

3. Align with NUMA Topology

NUMA-aligned affinity
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Discover NUMA topology
numactl --hardware
 
# Example output (2-node system):
# node 0 cpus: 0 1 2 3 4 5 6 7
# node 1 cpus: 8 9 10 11 12 13 14 15
# node distances:
# node   0   1
#   0:  10  21
#   1:  21  10
 
# GOOD: Align CPU and memory to same node
numactl --cpunodebind=0 --membind=0 ./database
 
# BAD: CPUs on node 0, but memory on node 1
numactl --cpunodebind=0 --membind=1 ./database  # Don't do this!
 
# For multi-instance deployments:
# Instance 1: NUMA node 0
numactl --cpunodebind=0 --membind=0 ./instance1 &
 
# Instance 2: NUMA node 1  
numactl --cpunodebind=1 --membind=1 ./instance2 &

4. Use Systemd for Persistent Configuration

Systemd affinity configuration
INI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# /etc/systemd/system/myapp.service
[Unit]
Description=My Application
After=network.target
 
[Service]
Type=simple
ExecStart=/opt/myapp/bin/myapp
Restart=always
 
# CPU Affinity - list of CPUs or ranges
CPUAffinity=2-7
 
# NUMA binding (alternative to numactl)
NUMAPolicy=bind
NUMAMask=0
 
# Nice priority (for additional control)
Nice=-10
 
# Memory locking (for RT workloads)
LimitMEMLOCK=infinity
 
[Install]
WantedBy=multi-user.target

Always Measure Impact

Summary: Your Affinity Toolkit

You now have a comprehensive toolkit for setting processor affinity across platforms and contexts. Let's consolidate:

Key Takeaways

•Linux system calls (sched_setaffinity, sched_getaffinity) provide low-level control with full error handling.
•Pthreads APIs (pthread_setaffinity_np) enable per-thread affinity for multithreaded applications.
•Command-line tools (taskset, numactl) allow affinity control without code modifications.
•Windows APIs (SetProcessAffinityMask, SetThreadAffinityMask) provide equivalent control on Windows.
•Language-specific solutions vary—Java and Go require runtime considerations; Python has built-in support.
•Containers and VMs have their own affinity layers—Docker cpuset, K8s CPUManager, hypervisor vCPU pinning.
•Production best practices include workload analysis, NUMA alignment, system CPU reservation, and measurement.

What's Next:

Page Complete

4 / 5