Shared Memory Via Virtual Memory - Learning Module

Loading content...

0/227

Protection Considerations

The Security Imperative

Shared memory is a double-edged sword. The same mechanism that enables zero-copy, high-performance IPC also creates potential security vulnerabilities. When two processes share memory:

What prevents a malicious process from corrupting the shared data?
What stops a process from accessing memory it shouldn't see?
How do we ensure read-only mappings are truly immutable?
What happens when a privileged process shares memory with an unprivileged one?

These questions aren't academic—they're the difference between a secure system and one vulnerable to exploitation. Shared memory vulnerabilities have been at the heart of major security incidents, from privilege escalation attacks to data leakage between containers.

In this page, we'll systematically examine the protection mechanisms that make shared memory safe: the hardware foundations, the kernel policies, security vulnerabilities and mitigations, and best practices for production systems.

What You Will Learn

By the end of this page, you will understand: how page table protection bits enforce access permissions, kernel-level access control for shared memory objects, the principle of least privilege in shared memory design, common vulnerabilities and their mitigations, and secure patterns for building shared memory systems.

Hardware Protection Mechanisms

The foundation of memory protection is hardware-enforced. The CPU's Memory Management Unit (MMU) checks every memory access against protection bits in the page table entry (PTE). This enforcement happens at processor speed—there's no performance penalty for protection.

Page Table Entry Protection Bits (x86-64):

Page Table Entry Protection Fields
Bit	Name	Meaning When Set	Protection Effect
Bit 0	Present	Page is in physical memory	Access to non-present page triggers page fault
Bit 1	Read/Write	Page is writable	Write to read-only page triggers protection fault
Bit 2	User/Supervisor	Page accessible from user mode	User access to supervisor page triggers protection fault
Bit 63	NX (No Execute)	Execution disabled (when EFER.NXE=1)	Execute from non-executable page triggers protection fault
Bit 5	Accessed	Page has been read	No protection; used for page replacement algorithms
Bit 6	Dirty	Page has been written	No protection; used for write-back decisions

How Hardware Protection Works:

CPU executes: mov rax, [0x7f00001000]    ; Read from user virtual address

1. TLB lookup:
   - If TLB hit with matching ASID, check protection
   - If TLB miss, walk page table

2. Page table walk (if needed):
   - Traverse PML4 → PDP → PD → PT
   - Each level must have Present bit set

3. Protection check:
   - If CPL (Current Privilege Level) = 3 (user mode):
     - Check User/Supervisor bit (must be 1)
   - If instruction is write:
     - Check Read/Write bit (must be 1)
   - If fetching instruction:
     - Check NX bit (must be 0)

4. Outcome:
   - All checks pass: Translate and access memory
   - Any check fails: Raise exception (#PF page fault)

Key insight: This enforcement is non-bypassable for user-space code. There's no syscall, no API, no privilege level that allows user code to skip page table checks.

Protection Keys (PKU) on Modern CPUs

Intel's Memory Protection Keys (PKU, from Skylake) add another protection layer. Each page can be tagged with a 4-bit key (16 domains), and a user-accessible register (PKRU) controls access rights per domain. This enables fast, fine-grained protection changes without modifying page tables. Use case: temporarily disable write access to a region during security-sensitive operations, then re-enable with a single register write.

hardware_protection_demo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <stdio.h>
#include <signal.h>
#include <sys/mman.h>
#include <unistd.h>
 
void segfault_handler(int sig, siginfo_t *info, void *context) {
    printf("Protection fault at address: %p\n", info->si_addr);
    printf("Fault type: %s\n", 
           (info->si_code == SEGV_MAPERR) ? "No mapping" :
           (info->si_code == SEGV_ACCERR) ? "Permission denied" : "Unknown");
    _exit(1);
}
 
int main() {
    // Set up signal handler for protection faults
    struct sigaction sa = {
        .sa_sigaction = segfault_handler,
        .sa_flags = SA_SIGINFO
    };
    sigaction(SIGSEGV, &sa, NULL);
 
    // Allocate read-only memory
    char *readonly = mmap(NULL, 4096, 
                          PROT_READ,        // Read-only permission
                          MAP_PRIVATE | MAP_ANONYMOUS, 
                          -1, 0);
 
    printf("Read succeeds: %c\n", readonly[0]);  // OK
    printf("Attempting write to read-only page...\n");
    readonly[0] = 'X';  // Will trigger SIGSEGV (SEGV_ACCERR)
    
    return 0;  // Never reached
}

Per-Mapping Protection

A critical feature of virtual memory is that the same physical page can have different permissions in different mappings. This enables powerful protection patterns for shared memory.

Asymmetric Permission Example:

Process A (Producer): Maps shared region as Read/Write Process B (Consumer): Maps same physical pages as Read-Only

Producer can modify the data; consumer can only observe. If consumer tries to write, hardware triggers a fault.

                Physical Frame 0x1234
                ┌──────────────────┐
                │   Shared Data    │
                │                  │
                └──────────────────┘
                        ▲
                        │
        ┌───────────────┴───────────────┐
        │                               │
   Process A PTE                   Process B PTE
   Frame: 0x1234                   Frame: 0x1234
   R/W: 1 (writable)               R/W: 0 (read-only)
   User: 1                         User: 1
   NX: 1 (no exec)                 NX: 1 (no exec)

asymmetric_permissions.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
 
#define SHM_NAME "/asymmetric_demo"
#define SHM_SIZE 4096
 
// Process A: Creates shared memory with read-write access
void producer() {
    int fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0644);  // Note: 0644 permissions
    ftruncate(fd, SHM_SIZE);
    
    char *data = mmap(NULL, SHM_SIZE, 
                      PROT_READ | PROT_WRITE,  // Producer has write access
                      MAP_SHARED, fd, 0);
    close(fd);
    
    // Producer can write
    strcpy(data, "Hello from producer");
    printf("Producer wrote data\n");
    
    munmap(data, SHM_SIZE);
}
 
// Process B: Opens shared memory read-only
void consumer() {
    int fd = shm_open(SHM_NAME, O_RDONLY, 0);  // Open read-only
    if (fd == -1) {
        perror("shm_open");
        return;
    }
    
    char *data = mmap(NULL, SHM_SIZE,
                      PROT_READ,  // Consumer has read-only access
                      MAP_SHARED, fd, 0);
    close(fd);
    
    if (data == MAP_FAILED) {
        perror("mmap");
        return;
    }
    
    // Consumer can read
    printf("Consumer reads: %s\n", data);
    
    // Consumer cannot write - this would cause SIGSEGV!
    // data[0] = 'X';  // CRASH: Hardware protection fault
    
    munmap(data, SHM_SIZE);
}
 
// Key point: The kernel enforces that Process B cannot request
// PROT_WRITE on an O_RDONLY file descriptor. Even if it tried,
// mmap would fail or the hardware protection would enforce read-only.

Protection Permission Combinations

•PROT_NONE — No access allowed. Any read, write, or execute triggers a fault. Useful for guard pages.
•PROT_READ — Read-only access. Common for consumers of shared data.
•PROT_WRITE — Write access. Typically combined with PROT_READ.
•PROT_EXEC — Execute access. Usually with PROT_READ for loading code.
•PROT_READ | PROT_WRITE — Standard read-write access for data.
•PROT_READ | PROT_EXEC — Executable code (no write; W^X policy).
•PROT_READ | PROT_WRITE | PROT_EXEC — Full access. Dangerous! Only for JIT compilers, etc.

W^X Policy (Write XOR Execute)

Modern security practice mandates that no memory region should be both writable AND executable simultaneously. This prevents attackers from injecting code (requires write) and then executing it (requires execute). Enforce this by never using PROT_WRITE | PROT_EXEC together. For JIT compilers, use mprotect() to switch between write-mode (for compilation) and execute-mode (for running).

Kernel Access Control

Beyond hardware protection, the kernel enforces access control policies that determine which processes can create, open, and map shared memory objects. This is the first line of defense—before a process can even attempt to access shared memory, the kernel verifies authorization.

POSIX Shared Memory Access Control

POSIX shared memory objects use filesystem-like permissions. On Linux, they exist in /dev/shm and have standard Unix permission bits.

// Create shared memory with permissions 0640:
//   Owner: read + write (6)
//   Group: read only (4)  
//   Others: no access (0)
int fd = shm_open("/my_data", O_CREAT | O_RDWR, 0640);

// In /dev/shm:
// -rw-r----- 1 user group 4096 Jan 15 10:00 my_data

Permission Checks:

shm_open(): Kernel checks if calling process's UID/GID allows requested access (read/write) according to permission bits.
mmap(): Kernel verifies that the file descriptor's open mode (O_RDONLY, O_RDWR) is compatible with requested protection (PROT_* flags).

Common patterns:

Scenario	Permissions	Rationale
Single-user app	0600	Only owner can access
System daemon + clients	0660	Daemon and group members
World-readable config	0644	Anyone can read, owner writes
Exclusive IPC	0600 + flock	Owner only, with locking

Principle of Least Privilege

When designing shared memory systems: (1) Grant the minimum permissions needed—if a process only reads, use 0444 or PROT_READ. (2) Prefer capability passing (FD) over named objects. (3) Avoid world-readable shared memory for sensitive data. (4) Consider separate shared regions for different trust levels.

Security Vulnerabilities and Mitigations

Shared memory introduces unique security challenges that have been exploited in real-world attacks. Understanding these vulnerabilities is essential for building secure systems.

Vulnerability: Time-of-Check to Time-of-Use (TOCTOU)

•Attack: Process A validates data in shared memory, then uses it. But between check and use, Process B modifies the data.
•Example: A checks that shared buffer length ≤ max, then copies that many bytes. B increases length after check → buffer overflow.
•Mitigation: Copy data to private memory before validation, then use the private copy. Or use sealing (memfd) to guarantee immutability.

toctou_vulnerability.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// VULNERABLE CODE
typedef struct {
    size_t length;
    char data[1024];
} SharedBuffer;
 
void process_message(SharedBuffer *shared, char *output) {
    // CHECK: Validate length
    if (shared->length > 1024) {
        return;  // Invalid
    }
    
    // VULNERABILITY: Between check and use, attacker modifies shared->length
    // USE: Copy data based on (now modified) length
    memcpy(output, shared->data, shared->length);  // Buffer overflow!
}
 
// SECURE CODE
void process_message_safe(SharedBuffer *shared, char *output) {
    // Copy length to LOCAL variable
    size_t len = shared->length;
    
    // CHECK: Validate local copy
    if (len > 1024) {
        return;
    }
    
    // USE: Use local variable (attacker cannot modify)
    memcpy(output, shared->data, len);  // Safe!
    
    // Even better: copy entire structure to private memory first
    // SharedBuffer local_copy = *shared;  // Then validate and use local_copy
}

Vulnerability: Information Disclosure

•Attack: A privileged process (e.g., setuid, system service) shares memory with an unprivileged process. The shared region contains sensitive data.
•Example: A sudo-like process caches credentials in shared memory that a local attacker maps.
•Mitigation: Never share memory between different privilege levels unless absolutely necessary. Use separate shared regions for different security domains. Clear sensitive data before sharing or process exit.

Vulnerability: Symlink/Race Attacks on Named Shared Memory

•Attack: Attacker creates symlink or pre-creates shared memory object before victim, controlling its permissions or content.
•Example: Attacker races shm_open() to create /dev/shm/victim_app with attacker-controlled content before the real app.
•Mitigation: Use O_EXCL with O_CREAT to fail if object exists. Use unique, unpredictable names. Prefer memfd (anonymous, no namespace). Check ownership before using existing objects.

Vulnerability: Memory Mapping Side Channels

•Attack: Attacker in one process learns about another process's memory access patterns through shared resources (e.g., cache lines, TLB, page table entries).
•Example: Flush+Reload attack on shared libraries reveals crypto key-dependent memory accesses.
•Mitigation: Constant-time algorithms for security-sensitive code. Avoid sharing between mutually distrusting processes. Use process isolation (separate VMs, containers with memory isolation).

Real-World Attack: Dirty COW (CVE-2016-5195)

Dirty COW exploited a race condition in the Linux kernel's handling of copy-on-write mappings. An attacker could write to files they only had read access to (like /etc/passwd) by racing the COW page fault handler. The fix required careful synchronization in the page fault path. This attack demonstrates that even kernel-enforced protections can have vulnerabilities.

Container and Namespace Isolation

Containers use Linux namespaces to isolate shared memory between groups of processes. This is crucial for multi-tenant systems where different customers' containers run on the same host.

Namespace Impact on Shared Memory
Namespace	Effect on Shared Memory	Isolation Level
IPC namespace	Separate System V IPC ID space; /dev/shm isolated	Strong: Containers cannot see each other's shm
Mount namespace	Separate /dev/shm filesystems	Strong: POSIX shm isolated
PID namespace	Different PID views; affects IPC tools	Indirect: ipcs shows only local view
User namespace	UID/GID mapping; affects permission checks	Variable: Depends on mapping

namespace_isolation.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Demonstrate IPC namespace isolation
 
# Create shared memory in host namespace
$ ipcmk -M 4096
Shared memory id: 12345
 
# Verify it exists
$ ipcs -m | grep 12345
0x... 12345 user 666 4096 0
 
# Run a new shell in a NEW IPC namespace
$ sudo unshare --ipc bash
 
# In the new namespace, the segment is NOT visible!
$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch
# (empty!)
 
# Create a segment in the new namespace
$ ipcmk -M 2048
Shared memory id: 0  # IDs start fresh in new namespace
 
# Exit and check host namespace - new segment not visible there
$ exit
$ ipcs -m | grep "2048"
# (nothing - the segment exists only in the isolated namespace)
 
# Docker containers use IPC namespace isolation by default:
$ docker run --ipc=host ...     # Share host IPC namespace (less secure)
$ docker run --ipc=private ...  # Isolated IPC namespace (default, secure)
$ docker run --ipc=container:X  # Share with container X's namespace

Container Shared Memory Best Practices

•Use private IPC namespaces — Docker/Podman default. Prevents cross-container shared memory access.
•Size limits for /dev/shm — Containers should limit shm size to prevent resource exhaustion: docker run --shm-size=256m ...
•Avoid --ipc=host — Only for specific use cases (GPU sharing, legacy apps). Breaks isolation.
•Separate namespaces per trust boundary — Don't share IPC namespace between mutually distrusting containers.
•Audit IPC resource usage — In containerized environments, orphaned shared memory can accumulate.

GPU and Shared Memory

GPU workloads often require large shared memory regions for inter-process GPU buffer sharing. This creates tension with container isolation—some GPU use cases require --ipc=host or carefully crafted shared IPC namespaces. Evaluate the security trade-offs: is GPU sharing more important than container isolation?

Secure Coding Patterns for Shared Memory

Building secure shared memory systems requires disciplined patterns. Here are battle-tested approaches used in production systems.

Pattern 1: Defense in Depth

•Layer 1: Kernel access control (permissions on shm_open/shmget)
•Layer 2: Hardware protection (PROT_READ vs PROT_WRITE)
•Layer 3: Data validation (never trust shared memory content)
•Layer 4: Cryptographic integrity (HMAC on critical data)
•Principle: An attacker must compromise multiple layers to succeed

defense_in_depth.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <openssl/hmac.h>
#include <string.h>
 
// Critical shared data includes integrity protection
typedef struct {
    uint32_t sequence;
    uint32_t data_length;
    uint8_t data[1024];
    uint8_t hmac[32];  // SHA-256 HMAC
} SecureMessage;
 
static const uint8_t SECRET_KEY[32] = { /* ... */ };
 
// Writer: Compute and store HMAC
void write_secure(SecureMessage *msg, const uint8_t *data, size_t len) {
    msg->sequence++;
    msg->data_length = len;
    memcpy(msg->data, data, len);
    
    // Compute HMAC over sequence + length + data
    unsigned int hmac_len;
    HMAC(EVP_sha256(), SECRET_KEY, sizeof(SECRET_KEY),
         (uint8_t *)msg, offsetof(SecureMessage, hmac),
         msg->hmac, &hmac_len);
}
 
// Reader: Verify HMAC before trusting data
bool read_secure(SecureMessage *msg, uint8_t *out, size_t *len) {
    // FIRST: Copy to local storage (prevent TOCTOU)
    SecureMessage local = *msg;
    
    // SECOND: Validate length
    if (local.data_length > sizeof(local.data)) {
        return false;  // Invalid length
    }
    
    // THIRD: Verify integrity
    uint8_t expected_hmac[32];
    unsigned int hmac_len;
    HMAC(EVP_sha256(), SECRET_KEY, sizeof(SECRET_KEY),
         (uint8_t *)&local, offsetof(SecureMessage, hmac),
         expected_hmac, &hmac_len);
    
    if (memcmp(local.hmac, expected_hmac, 32) != 0) {
        return false;  // Integrity check failed!
    }
    
    // FOURTH: Data is trusted, copy out
    memcpy(out, local.data, local.data_length);
    *len = local.data_length;
    return true;
}

Pattern 2: Immutable Shared Configuration

•Create and populate shared memory in a trusted init process
•Seal the memory using memfd F_SEAL_WRITE to guarantee immutability
•Distribute FD to worker processes via Unix socket
•Workers map read-only and can trust the data won't change
•Update pattern: Create new memfd, seal, distribute, workers switch, old one unmapped

Pattern 3: Guard Pages

•Surround shared regions with PROT_NONE pages
•Purpose: Detect buffer overflows/underflows that would escape the region
•Implementation: Allocate extra pages, mprotect the edges to PROT_NONE
•Cost: One page (4KB) wasted per boundary, but catches corruption early
•Use case: High-security environments, debugging shared memory issues

guard_pages.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <sys/mman.h>
 
void *create_guarded_shared_memory(size_t data_size) {
    size_t page_size = sysconf(_SC_PAGESIZE);
    
    // Allocate: guard + data + guard
    size_t total_size = page_size + data_size + page_size;
    
    void *region = mmap(NULL, total_size, 
                        PROT_READ | PROT_WRITE,
                        MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    if (region == MAP_FAILED) return NULL;
    
    // First page: No access (guard)
    if (mprotect(region, page_size, PROT_NONE) != 0) {
        munmap(region, total_size);
        return NULL;
    }
    
    // Last page: No access (guard)
    void *last_page = (char *)region + page_size + data_size;
    if (mprotect(last_page, page_size, PROT_NONE) != 0) {
        munmap(region, total_size);
        return NULL;
    }
    
    // Return pointer to the usable region (between guards)
    return (char *)region + page_size;
}
 
// Any access outside [data_ptr, data_ptr + data_size) triggers SIGSEGV

Auditing and Monitoring Shared Memory

In production systems, visibility into shared memory usage is essential for security monitoring, debugging, and capacity planning.

audit_shared_memory.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/bin/bash
# Comprehensive shared memory audit script
 
echo "=== POSIX Shared Memory (/dev/shm) ==="
ls -la /dev/shm/
du -sh /dev/shm/
 
echo ""
echo "=== System V Shared Memory ==="
ipcs -m -t  # With timestamps
echo ""
echo "Orphaned segments (nattch = 0):"
ipcs -m | awk 'NR>3 && $6==0 {print $0}'
 
echo ""
echo "=== Per-Process Shared Mappings ==="
for pid in $(pgrep -f "my_application"); do
    echo "PID $pid:"
    grep "shm\|shared" /proc/$pid/maps 2>/dev/null || echo "  (no shared mappings)"
    echo "  Shared memory totals:"
    grep -E 'Shared_(Clean|Dirty)' /proc/$pid/smaps 2>/dev/null |         awk '{sum += $2} END {printf "    Shared: %d KB\n", sum}'
done
 
echo ""
echo "=== Large Shared Memory Consumers ==="
# Find processes with most shared memory
for pid in /proc/[0-9]*/smaps; do
    grep "Shared" $pid 2>/dev/null |         awk -v pid=$(dirname $pid | cut -d'/' -f3)             '{sum += $2} END {if (sum > 0) print sum " KB " pid}'
done | sort -rn | head -10
 
echo ""
echo "=== Security Concerns ==="
# Shared memory with world-read/write
echo "World-accessible in /dev/shm:"
find /dev/shm -perm -006 -ls 2>/dev/null
 
# Suspiciously named shared memory
echo "Hidden files in /dev/shm:"
ls -la /dev/shm/.*  2>/dev/null | grep -v "^total\|^d"

Monitoring Metrics for Shared Memory
Metric	Source	Alert Threshold
/dev/shm usage	df /dev/shm	80% capacity
Orphaned System V segments	ipcs -m (nattch=0)	10 segments or growing
Shared memory per process	/proc/PID/smaps	Unusual growth pattern
World-writable shm objects	find /dev/shm -perm	Any occurrence
shm_open/shmget syscalls	auditd, strace, eBPF	From unexpected processes

eBPF for Shared Memory Tracing

Modern Linux systems can use eBPF to trace shared memory operations with minimal overhead. Tools like bpftrace can hook shmget, shmat, shm_open, mmap, and report in real-time which processes access which shared memory. This is invaluable for security monitoring and debugging complex multi-process systems.

Summary: Protection Considerations

We've explored the comprehensive landscape of protection mechanisms for shared memory. Let's consolidate the key takeaways:

Key Takeaways

•Hardware protection is the foundation — Page table bits (R/W/NX) are enforced by the CPU on every memory access. This is non-bypassable and has zero performance cost.
•Per-mapping permissions enable asymmetric access — The same physical page can be read-write for one process and read-only for another. Use this for producer/consumer patterns.
•Kernel access control gates access — Permissions on shared memory objects (POSIX or System V) determine who can open and map them. Follow principle of least privilege.
•TOCTOU is a critical vulnerability — Always copy shared data to private memory before validating and using. Never trust in-place shared memory content.
•Container namespaces provide isolation — IPC namespaces create separate shared memory domains. Essential for multi-tenant security.
•Defense in depth is mandatory — Layer kernel ACLs, hardware protection, data validation, and cryptographic integrity. No single layer is sufficient.
•Audit and monitor continuously — Know what shared memory exists, who's using it, and alert on anomalies. Orphaned segments and world-writable objects are red flags.

What's Next:

Now that we've covered the theoretical foundations and security aspects of shared memory, the final page will bring everything together with implementation details — how real operating systems like Linux implement shared memory, the kernel data structures involved, and how all the pieces we've studied fit together in practice.

Page Complete

You now understand shared memory protection comprehensively: hardware mechanisms (page table protection bits), kernel access control (permissions), common vulnerabilities (TOCTOU, disclosure, races), container isolation (namespaces), and secure coding patterns. This knowledge enables you to build secure shared memory systems and audit existing ones.