Operating SystemsCapabilities

Capabilities

LevelAdvanced

Duration90 mins

TopicCapabilities

2 / 5

Capability Lists

Organizing Access Rights Around Subjects

We have established that capabilities are unforgeable tokens encoding access rights. But where do these tokens live? How does a process keep track of the capabilities it holds? How does the operating system organize and protect this critical security information?\n\nThe answer lies in Capability Lists, often abbreviated as C-lists—the data structures that store a subject's complete set of capabilities. If the access matrix is the abstract model of all access rights in a system, then a capability list is a concrete implementation of one row of that matrix: all the objects a particular subject can access and the associated rights.\n\nUnderstanding capability lists takes us from theory to implementation. We will examine how C-lists are structured, where they are stored, how they are protected, and how different systems have approached these design challenges.

What You Will Learn

By the end of this page, you will understand the structure and organization of capability lists, how they are protected from unauthorized access and modification, different implementation approaches (kernel-managed vs. process-managed), capability addressing modes, and the performance and security trade-offs in C-list design.

Capability Lists Defined

A Capability List (C-list) is an ordered collection of capabilities associated with a particular subject (typically a process or protection domain). It represents the complete authority of that subject—everything it can access and how.\n\nFormal Definition\n\nMathematically, if we have an access matrix A where A[s,o] represents the rights that subject s has to object o, then the capability list for subject s is:\n\n\nC-list(s) = { (o, A[s,o]) | A[s,o] ≠ ∅ for all objects o }\n\n\nIn other words, the C-list contains all (object, rights) pairs where the subject has non-empty rights.\n\nKey Characteristics

Capability List Properties

•Subject-Centric — Each C-list belongs to exactly one subject and contains all its capabilities. Different subjects have different C-lists.
•Dynamic — C-lists grow when capabilities are acquired (through delegation, creation, or capability-returning operations) and shrink when capabilities are relinquished or revoked.
•Protected — The C-list itself must be protected from unauthorized access. If a process could modify its own C-list, it could grant itself arbitrary capabilities.
•Indexed — Capabilities within a C-list are typically addressed by index or slot number, providing fast lookup during access control checks.
•Sparse — A process typically holds capabilities to only a small fraction of system objects, so C-lists are usually much smaller than the full access matrix row.

c-list-structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Conceptual structure of a Capability List
// In real systems, this is managed by the kernel
 
#define MAX_CAPABILITIES 256   // Typical limit per process
 
typedef struct {
    bool        valid;          // Is this slot occupied?
    uint32_t    rights;         // Access rights bitmask
    ObjectRef   object;         // Reference to the protected object
    uint64_t    generation;     // For revocation checking
    uint32_t    flags;          // Additional metadata
} CapabilitySlot;
 
typedef struct {
    uint32_t       owner_pid;           // Process that owns this C-list
    uint32_t       num_capabilities;    // Current count of valid capabilities
    uint32_t       max_capabilities;    // Maximum allowed
    CapabilitySlot slots[MAX_CAPABILITIES];
    
    // Capability list metadata
    bool           sealed;              // Can new capabilities be added?
    uint64_t       created_time;
    uint64_t       last_modified;
} CapabilityList;
 
// User processes never see CapabilityList directly
// They only hold indices (capability handles)
typedef uint32_t CapHandle;  // Actually just an index into the C-list
 
// When a process wants to access an object:
// 1. It provides a CapHandle (index)
// 2. Kernel looks up C-list[index]
// 3. Kernel checks if slot is valid and has required rights
// 4. Kernel performs the operation on behalf of the process

Relationship to the Access Matrix\n\nRecall that the access matrix can be viewed in two ways:\n\n- Column-wise (ACLs): For each object, list all subjects that can access it\n- Row-wise (C-lists): For each subject, list all objects it can access\n\nC-lists are the row-wise decomposition of the access matrix. This structural choice has profound implications:\n\n- Easy to determine all objects a subject can access: Simply enumerate the C-list\n- Hard to determine all subjects that can access an object: Must search all C-lists in the system\n- Natural support for least privilege: A subject's authority is explicit and bounded\n- Delegation is straightforward: Copy capabilities between C-lists

C-List Storage and Protection

The security of a capability system fundamentally depends on protecting capability lists from unauthorized access and modification. If a process could directly manipulate its C-list, it could grant itself arbitrary access rights, defeating the entire purpose of the capability model.\n\nThe Core Protection Problem\n\nC-lists must satisfy two seemingly contradictory requirements:\n\n1. Accessibility: The system must be able to quickly locate and read a process's capabilities during access control checks\n2. Unmodifiability: Processes must not be able to directly add, modify, or forge capabilities in their C-lists\n\nDifferent systems resolve this tension through different architectural choices.

The Unforgeability Imperative

C-list protection is not optional security hardening—it is essential to capability semantics. A system where processes can modify their own C-lists is not a capability system at all; it is an honor system with no security guarantees. Every C-list implementation must provide absolute protection against unauthorized modification.

Kernel-Space Storage\n\nThe most common approach stores C-lists entirely in kernel memory, inaccessible to user processes:\n\n- C-lists reside in kernel address space\n- Processes hold only opaque handles (indices)\n- All capability operations go through system calls\n- Kernel validates every capability use\n\nThis approach provides strong protection but requires kernel involvement for every capability operation.

kernel-clist-management.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
// Kernel-managed capability list implementation
 
// === KERNEL SIDE ===
// C-lists stored in protected kernel memory
 
struct clist_table {
    struct capability_list* lists[MAX_PROCESSES];
    spinlock_t lock;
};
 
static struct clist_table global_clist_table;
 
// Create capability for a process (kernel internal)
static int kernel_add_capability(pid_t pid, struct object* obj, uint32_t rights) {
    struct capability_list* clist = global_clist_table.lists[pid];
    
    spin_lock(&clist->lock);
    
    // Find empty slot
    for (int i = 0; i < clist->max_slots; i++) {
        if (!clist->slots[i].valid) {
            clist->slots[i].valid = true;
            clist->slots[i].object = obj;
            clist->slots[i].rights = rights;
            clist->slots[i].generation = obj->generation;
            
            spin_unlock(&clist->lock);
            return i;  // Return slot index as handle
        }
    }
    
    spin_unlock(&clist->lock);
    return -ENOMEM;  // No free slots
}
 
// Validate capability use (called on every capability operation)
static int validate_capability(pid_t pid, int cap_handle, uint32_t required_rights) {
    struct capability_list* clist = global_clist_table.lists[pid];
    
    // Bounds check
    if (cap_handle < 0 || cap_handle >= clist->max_slots)
        return -EINVAL;
    
    struct capability_slot* slot = &clist->slots[cap_handle];
    
    // Validity check
    if (!slot->valid)
        return -EBADF;  // Invalid capability
    
    // Revocation check (generational)
    if (slot->generation != slot->object->generation)
        return -ESTALE;  // Capability has been revoked
    
    // Rights check
    if ((slot->rights & required_rights) != required_rights)
        return -EACCES;  // Insufficient rights
    
    return 0;  // Success
}
 
// === USER SIDE ===
// User processes only see opaque handles
 
// System call to read through a capability
ssize_t cap_read(int cap_handle, void* buffer, size_t count) {
    // Kernel:
    // 1. Validate cap_handle in current process's C-list
    // 2. Check CAP_READ right
    // 3. Perform read on the object
    // 4. Return result to user
    return syscall(SYS_cap_read, cap_handle, buffer, count);
}

Tagged Memory Storage\n\nHardware capability systems can store C-lists in user-accessible memory by using tagged memory—each memory location has an associated tag bit that indicates whether the location contains a capability.\n\n- Capabilities can be stored anywhere in process memory\n- Tag bits are maintained by hardware, not accessible via normal loads/stores\n- Only special capability instructions can set tag bits\n- Loss of tag bit when touched by non-capability instructions ("capability tainting")\n\nThis approach provides better performance (no kernel trap for capability operations) while maintaining unforgeability through hardware protection.

C-List Storage Approaches
Approach	Storage Location	Protection Mechanism	Trade-offs
Kernel Tables	Kernel memory only	Address space isolation	System call overhead, strong isolation
Tagged Memory	User memory with tags	Hardware tag bits	Fast access, requires special hardware
Encrypted Caps	User memory	Cryptographic MACs	Portable, computational overhead
Segmented Memory	Protected segments	Segment descriptors	Historic approach, complex addressing
Typed Objects	Language runtime	Type system enforcement	Language-specific, garbage collection

tagged-memory-clist.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Tagged Memory Capability Storage (CHERI-style)
 
// In hardware capability systems, capabilities ARE pointers
// They can be stored anywhere but have special hardware properties
 
// A CHERI capability is 128 or 256 bits wide, containing:
// - Virtual address (pointer value)
// - Base address (lower bound of valid access)
// - Length (size of valid region)
// - Permissions (read, write, execute, etc.)
// - Object type (for sealing)
// - Tag bit (set by hardware, indicates validity)
 
// User code can have arrays of capabilities
void* __capability my_capabilities[100];  // Legal!
 
// But user code CANNOT forge a capability
void* __capability attempt_forge() {
    // This creates an integer, not a capability
    uintptr_t addr = 0x12345678;
    
    // This does NOT create a valid capability
    void* __capability forged = (void* __capability)addr;
    
    // The tag bit is NOT set
    // Any attempt to dereference 'forged' will cause hardware trap
    
    // Only way to get capabilities:
    // 1. Derive from existing capability (bounds can only shrink)
    // 2. Receive from another holder (IPC, memory sharing)
    // 3. Get from kernel (system calls that return capabilities)
}
 
// Capability derivation - parent to child relationship
void* __capability derive_subset(void* __capability parent, 
                                  size_t offset, size_t len) {
    // Create capability to subset of parent's region
    // Hardware ensures: derived.base >= parent.base
    //                   derived.base + derived.len <= parent.base + parent.len
    
    void* __capability child = cheri_offset_set(parent, offset);
    child = cheri_bounds_set(child, len);
    
    // Child can access [parent.base+offset, parent.base+offset+len)
    // Parent still can access [parent.base, parent.base+parent.len)
    
    return child;
}
 
// Storing capabilities - tag bits follow the data
void capability_array_example() {
    void* __capability caps[10];
    
    // Store a capability (hardware sets tag bit in memory)
    caps[0] = get_some_capability();  // Tag bit = 1
    
    // Overwrite with non-capability (hardware clears tag bit)
    *(uintptr_t*)&caps[0] = 0;        // Tag bit = 0 (tainted)
    
    // caps[0] is now invalid; dereferencing will trap
}

Capability Addressing and Lookup

When a process wants to exercise a capability, it must identify which capability to use. The mechanism by which processes refer to their capabilities is called capability addressing. Different addressing schemes have different performance, security, and usability characteristics.\n\nIndexed Addressing (Capability Slots)\n\nThe most common approach assigns each capability a slot number within the C-list. Processes refer to capabilities by their slot indices—small integers that serve as opaque handles.

Indexed Addressing Properties

•Fast lookup — O(1) access to capability given slot index
•Fixed size — Indices are typically 8-32 bits regardless of capability size
•Position-dependent — Capability meaning depends on which slot it occupies
•Reuse issues — Slot reuse can cause confusion if processes cache indices
•Simple implementation — Array indexing in kernel tables

capability-addressing.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// Different capability addressing approaches
 
// === APPROACH 1: Simple Indexed Addressing ===
// Used by: Unix file descriptors, Windows handles
 
typedef int CapIndex;  // Simple slot number
 
#define MAX_CAPS 256
 
struct simple_clist {
    struct capability slots[MAX_CAPS];
    uint64_t valid_bitmap[MAX_CAPS / 64];  // Which slots are in use
};
 
// Fast O(1) lookup
struct capability* lookup_indexed(struct simple_clist* clist, CapIndex idx) {
    if (idx < 0 || idx >= MAX_CAPS)
        return NULL;
    if (!(clist->valid_bitmap[idx/64] & (1ULL << (idx % 64))))
        return NULL;
    return &clist->slots[idx];
}
 
// === APPROACH 2: Hierarchical Addressing (CSpace) ===
// Used by: seL4
 
// Capabilities organized in a tree structure
// Address is a path through the tree: (guard, depth, slot_bits...)
 
typedef struct {
    uint64_t guard;      // Prefix bits that must match
    uint32_t guard_size; // How many bits in guard
    uint32_t radix;      // Branching factor (2^radix children per node)
    uint32_t depth;      // Current depth in tree
    uint64_t path;       // Path bits used for navigation
} CSpaceAddress;
 
// seL4 CNode structure (capability node in the tree)
struct cnode {
    struct cnode_slot {
        bool is_cnode;  // true = points to another CNode, false = leaf cap
        union {
            struct cnode* next_level;
            struct capability cap;
        };
    } slots[/* 2^radix */];
};
 
// Lookup: walk the tree following path bits
struct capability* lookup_hierarchical(struct cnode* root, CSpaceAddress* addr) {
    struct cnode* current = root;
    
    while (addr->depth > 0) {
        // Extract next slot index from path
        uint32_t slot_idx = (addr->path >> (addr->depth - addr->radix)) 
                            & ((1 << addr->radix) - 1);
        
        struct cnode_slot* slot = &current->slots[slot_idx];
        
        if (slot->is_cnode) {
            current = slot->next_level;
            addr->depth -= addr->radix;
        } else {
            return &slot->cap;  // Found leaf capability
        }
    }
    return NULL;  // Not found
}
 
// === APPROACH 3: Sparse Table with Hashing ===
// Used when capability counts vary widely
 
struct sparse_clist {
    struct hash_table ht;  // Object ID -> capability mapping
};
 
struct capability* lookup_sparse(struct sparse_clist* clist, uint64_t object_id) {
    return hash_table_lookup(&clist->ht, object_id);
}

seL4's Capability Space (CSpace)\n\nseL4 uses a sophisticated hierarchical capability addressing scheme called CSpace. The CSpace is a tree structure where:\n\n- CNodes are internal nodes containing fixed-size arrays of capability slots\n- Leaf capabilities are stored in CNode slots\n- Addresses are bit strings that encode the path through the tree\n- Guards allow efficient representation of sparse address spaces\n\nThis design enables:\n- Flexible capability organization (can add more CNodes dynamically)\n- Efficient sparse representation (most slots empty)\n- Delegation of subtrees (transfer entire CNode)\n- Per-address-space capability lookup isolation

Why Hierarchical Addressing?

Hierarchical addressing solves the problem of scaling. With flat tables, the kernel must pre-allocate slots for every process. With hierarchical structures, capability storage grows on demand. A process needing 10 capabilities uses ~10 slots; a process needing 10,000 capabilities extends its CSpace to accommodate them. This is analogous to page tables for virtual memory—sparse structures with demand allocation.

Direct Capability References (Hardware Systems)\n\nIn hardware capability systems like CHERI, capabilities are often referenced directly as values rather than through indices:\n\n- A capability is a fat pointer (128-256 bits)\n- The capability value itself contains all information\n- No table lookup needed—just use the capability directly\n- Capabilities can be stored in registers or memory\n\nThis provides the best performance but requires hardware support and larger pointer sizes.

Capability Addressing Comparison
Method	Lookup Cost	Space Efficiency	Max Capabilities	Best For
Simple Index	O(1)	Fixed table size	Fixed limit	Small, known sets (file descriptors)
Hierarchical (CSpace)	O(depth)	Demand-allocated	Very large	Systems needing flexibility
Hash Table	O(1) amortized	Proportional to count	Memory-limited	Sparse, dynamic sets
Direct (Hardware)	O(1)	Per-capability (large)	Memory-limited	Hardware capability systems

Capability List Operations

C-lists are not static—they change as processes acquire, transfer, duplicate, and release capabilities. The operating system provides well-defined operations on C-lists that maintain security invariants while enabling flexible capability management.\n\nFundamental C-List Operations

Core Operations

•Acquire — Add a new capability to the C-list. This occurs when receiving a capability through IPC, creating a new object, or being granted a capability by another holder.
•Duplicate — Create a copy of an existing capability in the same C-list. Both original and copy provide the same access. Useful for passing capabilities to multiple subsystems.
•Attenuate — Create a weaker version of a capability with reduced rights. For example, deriving a read-only capability from a read-write capability. Rights can only be removed, never added.
•Transfer — Move a capability from one process's C-list to another's. After transfer, the sender no longer holds the capability.
•Grant — Give a copy of a capability to another process while retaining the original. Requires the 'grant' right on the capability.
•Delete — Remove a capability from the C-list. The capability is no longer valid for this process (though copies may exist elsewhere).
•Invoke — Use a capability to perform an operation on its object. This is the most common operation—actually exercising the access right.

clist-operations.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
// Capability List Operations - Kernel Implementation
 
// Duplicate: Copy capability within same C-list
int cap_dup(pid_t pid, CapHandle src, CapHandle* dest) {
    struct clist* cl = get_clist(pid);
    struct cap_slot* src_slot = get_slot(cl, src);
    
    if (!src_slot->valid)
        return -EBADF;
    
    // Find free slot
    CapHandle new_slot = find_free_slot(cl);
    if (new_slot < 0)
        return -ENOMEM;
    
    // Copy capability (same rights, same object)
    cl->slots[new_slot] = *src_slot;
    *dest = new_slot;
    
    return 0;
}
 
// Attenuate: Create capability with reduced rights
int cap_attenuate(pid_t pid, CapHandle src, uint32_t new_rights, CapHandle* dest) {
    struct clist* cl = get_clist(pid);
    struct cap_slot* src_slot = get_slot(cl, src);
    
    if (!src_slot->valid)
        return -EBADF;
    
    // CRITICAL: Can only REMOVE rights, never ADD
    if ((new_rights & ~src_slot->rights) != 0)
        return -EACCES;  // Attempt to add rights - DENIED
    
    CapHandle new_slot = find_free_slot(cl);
    if (new_slot < 0)
        return -ENOMEM;
    
    // Copy with reduced rights
    cl->slots[new_slot] = *src_slot;
    cl->slots[new_slot].rights = new_rights;  // Subset of original
    *dest = new_slot;
    
    return 0;
}
 
// Grant: Transfer capability copy to another process
int cap_grant(pid_t sender, CapHandle cap, pid_t receiver, CapHandle* recv_slot) {
    struct clist* sender_cl = get_clist(sender);
    struct cap_slot* src = get_slot(sender_cl, cap);
    
    if (!src->valid)
        return -EBADF;
    
    // Check if sender has GRANT right
    if (!(src->rights & CAP_RIGHT_GRANT))
        return -EACCES;  // No permission to grant
    
    struct clist* recv_cl = get_clist(receiver);
    CapHandle new_slot = find_free_slot(recv_cl);
    if (new_slot < 0)
        return -ENOMEM;
    
    // Copy capability to receiver (sender keeps original)
    recv_cl->slots[new_slot] = *src;
    
    // Note: Receiver gets same rights as sender had (minus GRANT if not inheritable)
    *recv_slot = new_slot;
    
    return 0;
}
 
// Transfer: Move capability to another process (sender loses it)
int cap_transfer(pid_t sender, CapHandle cap, pid_t receiver, CapHandle* recv_slot) {
    int ret = cap_grant(sender, cap, receiver, recv_slot);
    if (ret < 0)
        return ret;
    
    // Remove from sender
    cap_delete(sender, cap);
    
    return 0;
}
 
// Delete: Remove capability from C-list
int cap_delete(pid_t pid, CapHandle cap) {
    struct clist* cl = get_clist(pid);
    struct cap_slot* slot = get_slot(cl, cap);
    
    if (!slot->valid)
        return -EBADF;
    
    // Mark slot as invalid
    slot->valid = false;
    
    // May trigger revocation tracking updates
    notify_revocation_system(slot->object, slot);
    
    return 0;
}

The Monotonicity Principle\n\nA crucial invariant in capability systems is rights monotonicity: capabilities can only be weakened (attenuated), never strengthened. This means:\n\n- If you have read-only access, you cannot derive read-write\n- If you have access to a subregion, you cannot derive access to the full region\n- If your capability expires at time T, you cannot derive one that expires later\n\nThis principle ensures that the capability graph forms a partial order: authority can only decrease as capabilities are derived, never increase. Combined with the unforgeability requirement, this means processes cannot escalate their privileges through capability manipulation.

No Rights Amplification Except by Design

Standard capability operations never increase rights. However, some systems intentionally provide controlled 'rights amplification' for specific purposes—like the setuid bit in Unix, or Hydra's amplification templates. These exceptions are carefully designed and tightly controlled, never available to arbitrary code.

Capability Invocation\n\nThe most common C-list operation is invocation—using a capability to perform an operation on its object. The invocation process typically involves:\n\n1. Addressing: Process specifies which capability to use (handle, index, or direct reference)\n2. Validation: Kernel retrieves capability and checks validity\n3. Authorization: Kernel verifies the capability has the required rights\n4. Execution: Kernel performs the operation on the object\n5. Result: Operation result returned to the process\n\nIn high-performance systems, steps 2-4 must be extremely fast, as they occur on nearly every I/O operation.

Kernel C-List Management

The operating system kernel bears ultimate responsibility for managing capability lists. This includes creating C-lists for new processes, populating them with initial capabilities, maintaining their integrity, and cleaning up when processes terminate.\n\nProcess Creation and Initial Capabilities\n\nWhen a process is created, it needs an initial set of capabilities to do useful work. Where do these come from?

Initial Capability Sources

•Parent Process Delegation — Parent grants capabilities to child during fork/spawn. Child starts with subset of parent's capabilities.
•Kernel Provision — Kernel provides essential capabilities (e.g., capability to communicate with essential services).
•Inheritance Rules — Some capabilities may be automatically inherited across fork (like open file descriptors in Unix).
•Bootstrap Capabilities — At system startup, the initial process receives capabilities to all system resources, which it distributes to subsystems.
•Late Binding — Process starts with minimal capabilities and acquires more through IPC with capability servers.

process-capability-creation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Process creation with capability inheritance
 
struct new_process_caps {
    CapHandle caps[MAX_INHERITED_CAPS];
    int count;
    uint32_t rights_mask[MAX_INHERITED_CAPS];  // May be attenuated
};
 
pid_t spawn_process(const char* program, struct new_process_caps* init_caps) {
    // 1. Create new process structure
    struct process* child = create_process();
    child->clist = allocate_clist();
    
    // 2. Populate C-list with granted capabilities
    for (int i = 0; i < init_caps->count; i++) {
        struct cap_slot* parent_cap = get_current_cap(init_caps->caps[i]);
        
        if (!parent_cap->valid)
            continue;
        
        // Check if we have grant rights
        if (!(parent_cap->rights & CAP_RIGHT_GRANT))
            continue;
        
        // Copy to child with optional attenuation
        struct cap_slot* child_cap = &child->clist->slots[i];
        child_cap->valid = true;
        child_cap->object = parent_cap->object;
        child_cap->rights = parent_cap->rights & init_caps->rights_mask[i];
        child_cap->generation = parent_cap->generation;
    }
    
    // 3. Kernel may add essential capabilities
    add_essential_caps(child);  // e.g., IPC endpoint to parent
    
    // 4. Start execution
    schedule_process(child);
    
    return child->pid;
}
 
// Essential capabilities typically include:
void add_essential_caps(struct process* proc) {
    // Capability to communicate with logging service
    grant_cap(proc, get_log_endpoint_cap(), CAP_SEND);
    
    // Capability to own memory allocator pool (for malloc)
    grant_cap(proc, get_memory_pool_cap(), CAP_WRITE);
    
    // Capability to its own TCB (for thread control)
    grant_cap(proc, proc->tcb_cap, CAP_OWNER);
    
    // In pure capability systems, there's NO ambient file system access
    // Any file access must be through explicit capabilities
}

C-List Cleanup on Process Termination\n\nWhen a process terminates, its C-list must be properly cleaned up. This involves:\n\n1. Capability Release: Decrement reference counts on held objects\n2. Revocation Tracking: Update any revocation metadata\n3. Resource Reclamation: Free C-list memory\n4. Notification: Inform relevant subsystems of capability loss\n\nIn systems with capability-based lifetime management (like EROS/CapROS), releasing the last capability to an object may trigger garbage collection of the object itself.

C-Lists as Resource Management

C-lists naturally track what resources a process is using. When the last capability to an object is deleted, the object may become garbage collectible. This unifies access control with resource management—a powerful property that simplifies kernel design.

C-List Quota Management\n\nProcesses typically have limits on how many capabilities they can hold (C-list size limits). This prevents:\n\n- Denial-of-service through C-list exhaustion\n- Unbounded kernel memory consumption\n- Resource hoarding\n\nQuotas can be static (fixed maximum) or dynamic (allocated from a pool shared with other resources). seL4, for example, uses explicit memory objects as the backing store for CNodes, so C-list growth consumes visible resources.\n\nC-List Persistence\n\nSome capability systems (KeyKOS, EROS) support persistent capabilities—capabilities that survive system shutdown and restart. This requires:\n\n- Serializing C-lists to stable storage\n- Maintaining object identity across restarts\n- Handling revocation state persistence\n- Checkpointing capability graphs\n\nPersistent capabilities enable systems that can resume exactly where they left off, including all security relationships.

Implementation Case Studies

Examining real-world C-list implementations reveals the practical trade-offs and design decisions that shape capability systems. Let us explore how several influential systems organize their capability lists.\n\nUnix File Descriptor Tables\n\nWhile Unix is not a pure capability system, its file descriptor mechanism exhibits many C-list properties:

unix-fd-table.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Unix file descriptor table - a limited capability list
 
// Per-process file descriptor table (simplified)
struct files_struct {
    atomic_t count;           // Reference count
    struct fdtable* fdt;      // Pointer to fd table
    spinlock_t file_lock;
    int next_fd;              // Next likely free slot (optimization)
    
    // The actual table
    struct file* fd_array[NR_OPEN_DEFAULT];  // Typically 64
    // Expanded dynamically if more fds needed
};
 
struct file {
    struct path     f_path;       // File path (for display)
    const struct file_operations* f_op;  // Operations
    void*           private_data;
    f_mode          mode;         // Access mode: READ, WRITE, etc.
    // ... more fields
};
 
// File descriptor properties making them capability-like:
// 1. Unforgeable: fds are indices into kernel-protected table
// 2. Access rights: mode encodes permitted operations
// 3. Object reference: points to specific file structure
// 4. Transferable: can be passed via SCM_RIGHTS
 
// But NOT pure capabilities because:
// 1. Ambient authority: can open("/etc/passwd") by name
// 2. Identity-based: permission checks use process UID
// 3. No fine-grained delegation without sendmsg complexity

seL4 CSpace Implementation\n\nseL4's CSpace is a carefully designed hierarchical C-list structure optimized for embedded and high-assurance systems. Key characteristics:\n\n- CNodes are fixed-size arrays of capability slots (2^n entries for configurable n)\n- Guards allow skipping levels in sparse spaces\n- Memory-backed: CNode memory explicitly allocated from untyped memory\n- Formally verified: CSpace implementation proven correct mathematically

sel4-cspace.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// seL4 Capability Space structure (simplified)
 
// CTE: Capability Table Entry
typedef struct cte {
    cap_t cap;                    // The capability itself
    mdb_node_t cteMDBNode;       // For revocation: doubly-linked list
} cte_t;
 
// cap_t: The capability value
typedef struct cap {
    uint64_t words[2];  // 128-bit capability encoding:
    // - Type tag (endpoint, memory frame, cnode, etc.)
    // - Object pointer
    // - Rights bitmask
    // - Type-specific fields
} cap_t;
 
// CNode: A node in the CSpace tree
// Contains 2^radix capability slots
// Addressed by 'radix' bits of the capability address
 
// Example CSpace structure:
//
//  Root CNode (radix=4, 16 slots)
//   |
//   +--[0] -> Endpoint cap (to IPC partner)
//   +--[1] -> Frame cap (some memory page)
//   +--[2] -> CNode cap (to nested CNode) ---> Child CNode (16 more slots)
//   +--[3] -> ...                                 |
//   ...                                          +--[0] -> Frame cap
//                                                 +--[1] -> Endpoint cap
//                                                 ...
 
// Lookup: resolve capability path
// Address: 0x21 = 0b0010_0001
//   bits [7:4] = 2 -> slot 2 in root (CNode cap)
//   bits [3:0] = 1 -> slot 1 in child (Endpoint cap)
 
cte_t* resolve_address_bits(cte_t* root, word_t cap_ptr, word_t n_bits) {
    cte_t* slot = root + (cap_ptr >> (n_bits - root->cap.radix));
    n_bits -= root->cap.radix;
    
    if (n_bits == 0 || !isCNodeCap(slot->cap))
        return slot;  // Found it, or can't continue
    
    // Recurse into nested CNode
    return resolve_address_bits(
        getCNodeSlots(slot->cap),
        cap_ptr,
        n_bits
    );
}

CHERI: Capabilities in Memory\n\nCHERI takes a radically different approach: capabilities are stored throughout memory, protected by hardware rather than kernel isolation:

cheri-memory-caps.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// CHERI: Capabilities stored in regular memory with hardware protection
 
// A CHERI capability (simplified 128-bit format)
typedef struct {
    uint64_t cursor;      // Current pointer value (virtual address)
    
    // Compressed bounds (base + length reconstructed from this)
    unsigned int E:6;     // Exponent for bounds
    unsigned int B:14;    // Base adjustment
    unsigned int T:14;    // Top (limit) adjustment  
    
    // Permissions and metadata
    unsigned int perms:16;  // LOAD, STORE, EXECUTE, etc.
    unsigned int otype:18;  // Object type (for sealing)
    unsigned int flags:2;   // Special flags
    unsigned int reserved:6;
    
    // Tag bit: 1 bit stored separately by hardware
    // NOT in this struct - maintained in hidden tag memory
} cheri_capability_t;
 
// Key insight: This IS the C-list
// Every pointer a process can use is a capability
// There's no separate C-list structure in kernel
// The set of valid capabilities a process holds =
//   all tagged values in its registers and memory
 
// Memory layout example:
// Address      Tag  Content
// 0x1000       0    Integer data
// 0x1008       0    Integer data  
// 0x1010       1    Capability to buffer     <- Valid capability!
// 0x1018       0    Integer data
// 0x1020       1    Capability to function   <- Valid capability!
 
// Process's authority = all words with tag=1 it can reach
 
// Stack frame with capabilities:
struct stack_frame {
    uint64_t saved_regs[10];              // Plain integers (tag=0)
    void* __capability return_cap;         // Return capability (tag=1)
    void* __capability caller_frame;       // Caller's frame (tag=1)
    void* __capability local_buffer;       // Buffer cap (tag=1)
}

C-List Implementation Comparison
Aspect	Unix FD Table	seL4 CSpace	CHERI Memory
Storage	Kernel memory	Kernel CNodes	Tagged user memory
Addressing	Integer index	Hierarchical path	Direct pointer
Max Capabilities	~1024 per process	Bounded by CNodes	Bounded by memory
Lookup Cost	O(1) array	O(depth) tree walk	O(1) direct
Protection	Kernel isolation	Kernel isolation	Hardware tags
Delegation	SCM_RIGHTS	Cap grant syscall	Memory sharing

Performance Considerations

C-list design has significant performance implications. Since capability lookups occur on nearly every I/O operation and object interaction, even small inefficiencies multiply across millions of operations.\n\nLookup Performance\n\nThe most critical operation is looking up a capability given its address. This happens on every capability invocation, so it must be fast.

Lookup Optimization Techniques

•Direct Indexing — For small, fixed-size tables, array indexing is O(1) and very cache-friendly. Unix file descriptors use this approach.
•Cached Lookups — Cache recently-used capability resolutions. seL4 caches CSpace walk results for hot paths.
•Hardware Assistance — CHERI eliminates lookup entirely; the capability IS the pointer. No indirection needed.
•Batch Operations — Some systems allow specifying multiple capabilities in a single syscall to amortize lookup overhead.
•Lazy Validation — Delay full validation until actually needed. Validate on first use, cache result.

Memory Overhead\n\nC-lists consume memory: each capability requires storage for object reference, rights, and metadata. For systems with many processes and many capabilities per process, this adds up.\n\nPer-Capability Overhead:\n- Unix fd table entry: ~8-16 bytes\n- seL4 capability slot: ~16 bytes\n- CHERI capability: 16-32 bytes (but replaces pointers)\n\nMitigation strategies:\n- Compress rights and metadata into capability value\n- Share common capabilities across processes (copy-on-write)\n- Use sparse data structures for processes with few capabilities\n- Hardware: CHERI encodes bounds efficiently using floating-point-like representation

CHERI's Space Trade-off

CHERI doubles pointer size (64→128 or 128→256 bits), increasing memory footprint for pointer-heavy data structures. However, this replaces separate capability tables entirely, and CHERI enables memory safety benefits that often justify the overhead. Measurement shows 10-50% memory increase depending on workload—significant but manageable.

System Call Overhead\n\nIn kernel-managed C-lists, every capability operation requires a system call—a context switch into kernel mode, which is expensive (hundreds to thousands of cycles). Hardware capability systems avoid this:\n\n- CHERI capability operations execute in user mode\n- Bounds checks happen in hardware, no trap required\n- Invalid capability dereference traps, but valid operations don't\n\nThis performance difference can be 100x or more for tight loops with many pointer dereferences (and thus bounds checks).

C-List Performance Characteristics
Metric	Kernel Tables	Hardware Capabilities	Difference
Capability Lookup	~100-500 cycles (syscall)	~1-5 cycles (inline)	50-500x faster
Bounds Check	Kernel validation	Hardware inline	Similar to lookup
Memory per Cap	8-32 bytes	16-32 bytes	Similar or slightly larger
Creation Cost	Syscall + allocation	Derive from parent	Faster derivation
Transfer Cost	IPC message	Memory copy	Depends on mechanism
Revocation Cost	Table scan or indirect	Per-object or table scan	Similar complexity

Summary: Capability Lists

We have explored capability lists in depth—from their formal definition as rows of the access matrix to their practical implementation in real operating systems. Let us consolidate the key concepts:

Key Takeaways

•C-lists are per-subject capability containers — Each process has its own C-list containing all the capabilities it holds. This represents one row of the access matrix.
•C-list protection is essential — If processes could modify their own C-lists, they could grant themselves arbitrary access rights. Protection mechanisms include kernel isolation, tagged memory, and cryptographic enforcement.
•Capability addressing determines lookup performance — Simple indices provide O(1) access; hierarchical schemes (like seL4 CSpace) provide flexibility for large, sparse capability sets.
•Core operations include acquire, duplicate, attenuate, grant, transfer, and delete — These maintain the invariant that rights can only decrease, never increase.
•Initial capabilities come from parents, kernel, or capability servers — Pure capability systems have no ambient authority; all access requires explicit capabilities.
•Real systems implement C-lists differently — Unix fd tables, seL4 CSpace, and CHERI tagged memory each make different trade-offs between simplicity, performance, and security.
•Performance matters — Capability operations happen frequently; efficient lookup and minimal system call overhead are critical for usable systems.
•Hardware capabilities eliminate indirection — CHERI turns every pointer into a capability, removing separate C-list overhead but requiring hardware support.

What's Next\n\nNow that we understand how capabilities are stored and organized, we need to examine how they move between subjects. The next page explores Capability Delegation—how capabilities are transferred, shared, and propagated through a system while maintaining security invariants.

Page Complete

You now understand capability lists—how they organize a subject's authority, how they are protected, addressed, and managed, and how real systems implement them. Continue to the next page to learn how capabilities are delegated between subjects.

2 / 5

Loading learning content...

Operating SystemsCapabilities

Capabilities

LevelAdvanced

Duration90 mins

TopicCapabilities

2 / 5

Capability Lists

Organizing Access Rights Around Subjects

What You Will Learn

Capability Lists Defined

Capability List Properties

•Subject-Centric — Each C-list belongs to exactly one subject and contains all its capabilities. Different subjects have different C-lists.
•Dynamic — C-lists grow when capabilities are acquired (through delegation, creation, or capability-returning operations) and shrink when capabilities are relinquished or revoked.
•Protected — The C-list itself must be protected from unauthorized access. If a process could modify its own C-list, it could grant itself arbitrary capabilities.
•Indexed — Capabilities within a C-list are typically addressed by index or slot number, providing fast lookup during access control checks.
•Sparse — A process typically holds capabilities to only a small fraction of system objects, so C-lists are usually much smaller than the full access matrix row.

c-list-structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Conceptual structure of a Capability List
// In real systems, this is managed by the kernel
 
#define MAX_CAPABILITIES 256   // Typical limit per process
 
typedef struct {
    bool        valid;          // Is this slot occupied?
    uint32_t    rights;         // Access rights bitmask
    ObjectRef   object;         // Reference to the protected object
    uint64_t    generation;     // For revocation checking
    uint32_t    flags;          // Additional metadata
} CapabilitySlot;
 
typedef struct {
    uint32_t       owner_pid;           // Process that owns this C-list
    uint32_t       num_capabilities;    // Current count of valid capabilities
    uint32_t       max_capabilities;    // Maximum allowed
    CapabilitySlot slots[MAX_CAPABILITIES];
    
    // Capability list metadata
    bool           sealed;              // Can new capabilities be added?
    uint64_t       created_time;
    uint64_t       last_modified;
} CapabilityList;
 
// User processes never see CapabilityList directly
// They only hold indices (capability handles)
typedef uint32_t CapHandle;  // Actually just an index into the C-list
 
// When a process wants to access an object:
// 1. It provides a CapHandle (index)
// 2. Kernel looks up C-list[index]
// 3. Kernel checks if slot is valid and has required rights
// 4. Kernel performs the operation on behalf of the process

C-List Storage and Protection

The Unforgeability Imperative

kernel-clist-management.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
// Kernel-managed capability list implementation
 
// === KERNEL SIDE ===
// C-lists stored in protected kernel memory
 
struct clist_table {
    struct capability_list* lists[MAX_PROCESSES];
    spinlock_t lock;
};
 
static struct clist_table global_clist_table;
 
// Create capability for a process (kernel internal)
static int kernel_add_capability(pid_t pid, struct object* obj, uint32_t rights) {
    struct capability_list* clist = global_clist_table.lists[pid];
    
    spin_lock(&clist->lock);
    
    // Find empty slot
    for (int i = 0; i < clist->max_slots; i++) {
        if (!clist->slots[i].valid) {
            clist->slots[i].valid = true;
            clist->slots[i].object = obj;
            clist->slots[i].rights = rights;
            clist->slots[i].generation = obj->generation;
            
            spin_unlock(&clist->lock);
            return i;  // Return slot index as handle
        }
    }
    
    spin_unlock(&clist->lock);
    return -ENOMEM;  // No free slots
}
 
// Validate capability use (called on every capability operation)
static int validate_capability(pid_t pid, int cap_handle, uint32_t required_rights) {
    struct capability_list* clist = global_clist_table.lists[pid];
    
    // Bounds check
    if (cap_handle < 0 || cap_handle >= clist->max_slots)
        return -EINVAL;
    
    struct capability_slot* slot = &clist->slots[cap_handle];
    
    // Validity check
    if (!slot->valid)
        return -EBADF;  // Invalid capability
    
    // Revocation check (generational)
    if (slot->generation != slot->object->generation)
        return -ESTALE;  // Capability has been revoked
    
    // Rights check
    if ((slot->rights & required_rights) != required_rights)
        return -EACCES;  // Insufficient rights
    
    return 0;  // Success
}
 
// === USER SIDE ===
// User processes only see opaque handles
 
// System call to read through a capability
ssize_t cap_read(int cap_handle, void* buffer, size_t count) {
    // Kernel:
    // 1. Validate cap_handle in current process's C-list
    // 2. Check CAP_READ right
    // 3. Perform read on the object
    // 4. Return result to user
    return syscall(SYS_cap_read, cap_handle, buffer, count);
}

C-List Storage Approaches
Approach	Storage Location	Protection Mechanism	Trade-offs
Kernel Tables	Kernel memory only	Address space isolation	System call overhead, strong isolation
Tagged Memory	User memory with tags	Hardware tag bits	Fast access, requires special hardware
Encrypted Caps	User memory	Cryptographic MACs	Portable, computational overhead
Segmented Memory	Protected segments	Segment descriptors	Historic approach, complex addressing
Typed Objects	Language runtime	Type system enforcement	Language-specific, garbage collection

tagged-memory-clist.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Tagged Memory Capability Storage (CHERI-style)
 
// In hardware capability systems, capabilities ARE pointers
// They can be stored anywhere but have special hardware properties
 
// A CHERI capability is 128 or 256 bits wide, containing:
// - Virtual address (pointer value)
// - Base address (lower bound of valid access)
// - Length (size of valid region)
// - Permissions (read, write, execute, etc.)
// - Object type (for sealing)
// - Tag bit (set by hardware, indicates validity)
 
// User code can have arrays of capabilities
void* __capability my_capabilities[100];  // Legal!
 
// But user code CANNOT forge a capability
void* __capability attempt_forge() {
    // This creates an integer, not a capability
    uintptr_t addr = 0x12345678;
    
    // This does NOT create a valid capability
    void* __capability forged = (void* __capability)addr;
    
    // The tag bit is NOT set
    // Any attempt to dereference 'forged' will cause hardware trap
    
    // Only way to get capabilities:
    // 1. Derive from existing capability (bounds can only shrink)
    // 2. Receive from another holder (IPC, memory sharing)
    // 3. Get from kernel (system calls that return capabilities)
}
 
// Capability derivation - parent to child relationship
void* __capability derive_subset(void* __capability parent, 
                                  size_t offset, size_t len) {
    // Create capability to subset of parent's region
    // Hardware ensures: derived.base >= parent.base
    //                   derived.base + derived.len <= parent.base + parent.len
    
    void* __capability child = cheri_offset_set(parent, offset);
    child = cheri_bounds_set(child, len);
    
    // Child can access [parent.base+offset, parent.base+offset+len)
    // Parent still can access [parent.base, parent.base+parent.len)
    
    return child;
}
 
// Storing capabilities - tag bits follow the data
void capability_array_example() {
    void* __capability caps[10];
    
    // Store a capability (hardware sets tag bit in memory)
    caps[0] = get_some_capability();  // Tag bit = 1
    
    // Overwrite with non-capability (hardware clears tag bit)
    *(uintptr_t*)&caps[0] = 0;        // Tag bit = 0 (tainted)
    
    // caps[0] is now invalid; dereferencing will trap
}

Capability Addressing and Lookup

Indexed Addressing Properties

•Fast lookup — O(1) access to capability given slot index
•Fixed size — Indices are typically 8-32 bits regardless of capability size
•Position-dependent — Capability meaning depends on which slot it occupies
•Reuse issues — Slot reuse can cause confusion if processes cache indices
•Simple implementation — Array indexing in kernel tables

capability-addressing.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// Different capability addressing approaches
 
// === APPROACH 1: Simple Indexed Addressing ===
// Used by: Unix file descriptors, Windows handles
 
typedef int CapIndex;  // Simple slot number
 
#define MAX_CAPS 256
 
struct simple_clist {
    struct capability slots[MAX_CAPS];
    uint64_t valid_bitmap[MAX_CAPS / 64];  // Which slots are in use
};
 
// Fast O(1) lookup
struct capability* lookup_indexed(struct simple_clist* clist, CapIndex idx) {
    if (idx < 0 || idx >= MAX_CAPS)
        return NULL;
    if (!(clist->valid_bitmap[idx/64] & (1ULL << (idx % 64))))
        return NULL;
    return &clist->slots[idx];
}
 
// === APPROACH 2: Hierarchical Addressing (CSpace) ===
// Used by: seL4
 
// Capabilities organized in a tree structure
// Address is a path through the tree: (guard, depth, slot_bits...)
 
typedef struct {
    uint64_t guard;      // Prefix bits that must match
    uint32_t guard_size; // How many bits in guard
    uint32_t radix;      // Branching factor (2^radix children per node)
    uint32_t depth;      // Current depth in tree
    uint64_t path;       // Path bits used for navigation
} CSpaceAddress;
 
// seL4 CNode structure (capability node in the tree)
struct cnode {
    struct cnode_slot {
        bool is_cnode;  // true = points to another CNode, false = leaf cap
        union {
            struct cnode* next_level;
            struct capability cap;
        };
    } slots[/* 2^radix */];
};
 
// Lookup: walk the tree following path bits
struct capability* lookup_hierarchical(struct cnode* root, CSpaceAddress* addr) {
    struct cnode* current = root;
    
    while (addr->depth > 0) {
        // Extract next slot index from path
        uint32_t slot_idx = (addr->path >> (addr->depth - addr->radix)) 
                            & ((1 << addr->radix) - 1);
        
        struct cnode_slot* slot = &current->slots[slot_idx];
        
        if (slot->is_cnode) {
            current = slot->next_level;
            addr->depth -= addr->radix;
        } else {
            return &slot->cap;  // Found leaf capability
        }
    }
    return NULL;  // Not found
}
 
// === APPROACH 3: Sparse Table with Hashing ===
// Used when capability counts vary widely
 
struct sparse_clist {
    struct hash_table ht;  // Object ID -> capability mapping
};
 
struct capability* lookup_sparse(struct sparse_clist* clist, uint64_t object_id) {
    return hash_table_lookup(&clist->ht, object_id);
}

Why Hierarchical Addressing?

Capability Addressing Comparison
Method	Lookup Cost	Space Efficiency	Max Capabilities	Best For
Simple Index	O(1)	Fixed table size	Fixed limit	Small, known sets (file descriptors)
Hierarchical (CSpace)	O(depth)	Demand-allocated	Very large	Systems needing flexibility
Hash Table	O(1) amortized	Proportional to count	Memory-limited	Sparse, dynamic sets
Direct (Hardware)	O(1)	Per-capability (large)	Memory-limited	Hardware capability systems

Capability List Operations

Core Operations

•Acquire — Add a new capability to the C-list. This occurs when receiving a capability through IPC, creating a new object, or being granted a capability by another holder.
•Duplicate — Create a copy of an existing capability in the same C-list. Both original and copy provide the same access. Useful for passing capabilities to multiple subsystems.
•Attenuate — Create a weaker version of a capability with reduced rights. For example, deriving a read-only capability from a read-write capability. Rights can only be removed, never added.
•Transfer — Move a capability from one process's C-list to another's. After transfer, the sender no longer holds the capability.
•Grant — Give a copy of a capability to another process while retaining the original. Requires the 'grant' right on the capability.
•Delete — Remove a capability from the C-list. The capability is no longer valid for this process (though copies may exist elsewhere).
•Invoke — Use a capability to perform an operation on its object. This is the most common operation—actually exercising the access right.

clist-operations.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
// Capability List Operations - Kernel Implementation
 
// Duplicate: Copy capability within same C-list
int cap_dup(pid_t pid, CapHandle src, CapHandle* dest) {
    struct clist* cl = get_clist(pid);
    struct cap_slot* src_slot = get_slot(cl, src);
    
    if (!src_slot->valid)
        return -EBADF;
    
    // Find free slot
    CapHandle new_slot = find_free_slot(cl);
    if (new_slot < 0)
        return -ENOMEM;
    
    // Copy capability (same rights, same object)
    cl->slots[new_slot] = *src_slot;
    *dest = new_slot;
    
    return 0;
}
 
// Attenuate: Create capability with reduced rights
int cap_attenuate(pid_t pid, CapHandle src, uint32_t new_rights, CapHandle* dest) {
    struct clist* cl = get_clist(pid);
    struct cap_slot* src_slot = get_slot(cl, src);
    
    if (!src_slot->valid)
        return -EBADF;
    
    // CRITICAL: Can only REMOVE rights, never ADD
    if ((new_rights & ~src_slot->rights) != 0)
        return -EACCES;  // Attempt to add rights - DENIED
    
    CapHandle new_slot = find_free_slot(cl);
    if (new_slot < 0)
        return -ENOMEM;
    
    // Copy with reduced rights
    cl->slots[new_slot] = *src_slot;
    cl->slots[new_slot].rights = new_rights;  // Subset of original
    *dest = new_slot;
    
    return 0;
}
 
// Grant: Transfer capability copy to another process
int cap_grant(pid_t sender, CapHandle cap, pid_t receiver, CapHandle* recv_slot) {
    struct clist* sender_cl = get_clist(sender);
    struct cap_slot* src = get_slot(sender_cl, cap);
    
    if (!src->valid)
        return -EBADF;
    
    // Check if sender has GRANT right
    if (!(src->rights & CAP_RIGHT_GRANT))
        return -EACCES;  // No permission to grant
    
    struct clist* recv_cl = get_clist(receiver);
    CapHandle new_slot = find_free_slot(recv_cl);
    if (new_slot < 0)
        return -ENOMEM;
    
    // Copy capability to receiver (sender keeps original)
    recv_cl->slots[new_slot] = *src;
    
    // Note: Receiver gets same rights as sender had (minus GRANT if not inheritable)
    *recv_slot = new_slot;
    
    return 0;
}
 
// Transfer: Move capability to another process (sender loses it)
int cap_transfer(pid_t sender, CapHandle cap, pid_t receiver, CapHandle* recv_slot) {
    int ret = cap_grant(sender, cap, receiver, recv_slot);
    if (ret < 0)
        return ret;
    
    // Remove from sender
    cap_delete(sender, cap);
    
    return 0;
}
 
// Delete: Remove capability from C-list
int cap_delete(pid_t pid, CapHandle cap) {
    struct clist* cl = get_clist(pid);
    struct cap_slot* slot = get_slot(cl, cap);
    
    if (!slot->valid)
        return -EBADF;
    
    // Mark slot as invalid
    slot->valid = false;
    
    // May trigger revocation tracking updates
    notify_revocation_system(slot->object, slot);
    
    return 0;
}

No Rights Amplification Except by Design

Kernel C-List Management

Initial Capability Sources

•Parent Process Delegation — Parent grants capabilities to child during fork/spawn. Child starts with subset of parent's capabilities.
•Kernel Provision — Kernel provides essential capabilities (e.g., capability to communicate with essential services).
•Inheritance Rules — Some capabilities may be automatically inherited across fork (like open file descriptors in Unix).
•Bootstrap Capabilities — At system startup, the initial process receives capabilities to all system resources, which it distributes to subsystems.
•Late Binding — Process starts with minimal capabilities and acquires more through IPC with capability servers.

process-capability-creation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Process creation with capability inheritance
 
struct new_process_caps {
    CapHandle caps[MAX_INHERITED_CAPS];
    int count;
    uint32_t rights_mask[MAX_INHERITED_CAPS];  // May be attenuated
};
 
pid_t spawn_process(const char* program, struct new_process_caps* init_caps) {
    // 1. Create new process structure
    struct process* child = create_process();
    child->clist = allocate_clist();
    
    // 2. Populate C-list with granted capabilities
    for (int i = 0; i < init_caps->count; i++) {
        struct cap_slot* parent_cap = get_current_cap(init_caps->caps[i]);
        
        if (!parent_cap->valid)
            continue;
        
        // Check if we have grant rights
        if (!(parent_cap->rights & CAP_RIGHT_GRANT))
            continue;
        
        // Copy to child with optional attenuation
        struct cap_slot* child_cap = &child->clist->slots[i];
        child_cap->valid = true;
        child_cap->object = parent_cap->object;
        child_cap->rights = parent_cap->rights & init_caps->rights_mask[i];
        child_cap->generation = parent_cap->generation;
    }
    
    // 3. Kernel may add essential capabilities
    add_essential_caps(child);  // e.g., IPC endpoint to parent
    
    // 4. Start execution
    schedule_process(child);
    
    return child->pid;
}
 
// Essential capabilities typically include:
void add_essential_caps(struct process* proc) {
    // Capability to communicate with logging service
    grant_cap(proc, get_log_endpoint_cap(), CAP_SEND);
    
    // Capability to own memory allocator pool (for malloc)
    grant_cap(proc, get_memory_pool_cap(), CAP_WRITE);
    
    // Capability to its own TCB (for thread control)
    grant_cap(proc, proc->tcb_cap, CAP_OWNER);
    
    // In pure capability systems, there's NO ambient file system access
    // Any file access must be through explicit capabilities
}

C-Lists as Resource Management

Implementation Case Studies

unix-fd-table.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Unix file descriptor table - a limited capability list
 
// Per-process file descriptor table (simplified)
struct files_struct {
    atomic_t count;           // Reference count
    struct fdtable* fdt;      // Pointer to fd table
    spinlock_t file_lock;
    int next_fd;              // Next likely free slot (optimization)
    
    // The actual table
    struct file* fd_array[NR_OPEN_DEFAULT];  // Typically 64
    // Expanded dynamically if more fds needed
};
 
struct file {
    struct path     f_path;       // File path (for display)
    const struct file_operations* f_op;  // Operations
    void*           private_data;
    f_mode          mode;         // Access mode: READ, WRITE, etc.
    // ... more fields
};
 
// File descriptor properties making them capability-like:
// 1. Unforgeable: fds are indices into kernel-protected table
// 2. Access rights: mode encodes permitted operations
// 3. Object reference: points to specific file structure
// 4. Transferable: can be passed via SCM_RIGHTS
 
// But NOT pure capabilities because:
// 1. Ambient authority: can open("/etc/passwd") by name
// 2. Identity-based: permission checks use process UID
// 3. No fine-grained delegation without sendmsg complexity

sel4-cspace.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// seL4 Capability Space structure (simplified)
 
// CTE: Capability Table Entry
typedef struct cte {
    cap_t cap;                    // The capability itself
    mdb_node_t cteMDBNode;       // For revocation: doubly-linked list
} cte_t;
 
// cap_t: The capability value
typedef struct cap {
    uint64_t words[2];  // 128-bit capability encoding:
    // - Type tag (endpoint, memory frame, cnode, etc.)
    // - Object pointer
    // - Rights bitmask
    // - Type-specific fields
} cap_t;
 
// CNode: A node in the CSpace tree
// Contains 2^radix capability slots
// Addressed by 'radix' bits of the capability address
 
// Example CSpace structure:
//
//  Root CNode (radix=4, 16 slots)
//   |
//   +--[0] -> Endpoint cap (to IPC partner)
//   +--[1] -> Frame cap (some memory page)
//   +--[2] -> CNode cap (to nested CNode) ---> Child CNode (16 more slots)
//   +--[3] -> ...                                 |
//   ...                                          +--[0] -> Frame cap
//                                                 +--[1] -> Endpoint cap
//                                                 ...
 
// Lookup: resolve capability path
// Address: 0x21 = 0b0010_0001
//   bits [7:4] = 2 -> slot 2 in root (CNode cap)
//   bits [3:0] = 1 -> slot 1 in child (Endpoint cap)
 
cte_t* resolve_address_bits(cte_t* root, word_t cap_ptr, word_t n_bits) {
    cte_t* slot = root + (cap_ptr >> (n_bits - root->cap.radix));
    n_bits -= root->cap.radix;
    
    if (n_bits == 0 || !isCNodeCap(slot->cap))
        return slot;  // Found it, or can't continue
    
    // Recurse into nested CNode
    return resolve_address_bits(
        getCNodeSlots(slot->cap),
        cap_ptr,
        n_bits
    );
}

CHERI: Capabilities in Memory\n\nCHERI takes a radically different approach: capabilities are stored throughout memory, protected by hardware rather than kernel isolation:

cheri-memory-caps.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// CHERI: Capabilities stored in regular memory with hardware protection
 
// A CHERI capability (simplified 128-bit format)
typedef struct {
    uint64_t cursor;      // Current pointer value (virtual address)
    
    // Compressed bounds (base + length reconstructed from this)
    unsigned int E:6;     // Exponent for bounds
    unsigned int B:14;    // Base adjustment
    unsigned int T:14;    // Top (limit) adjustment  
    
    // Permissions and metadata
    unsigned int perms:16;  // LOAD, STORE, EXECUTE, etc.
    unsigned int otype:18;  // Object type (for sealing)
    unsigned int flags:2;   // Special flags
    unsigned int reserved:6;
    
    // Tag bit: 1 bit stored separately by hardware
    // NOT in this struct - maintained in hidden tag memory
} cheri_capability_t;
 
// Key insight: This IS the C-list
// Every pointer a process can use is a capability
// There's no separate C-list structure in kernel
// The set of valid capabilities a process holds =
//   all tagged values in its registers and memory
 
// Memory layout example:
// Address      Tag  Content
// 0x1000       0    Integer data
// 0x1008       0    Integer data  
// 0x1010       1    Capability to buffer     <- Valid capability!
// 0x1018       0    Integer data
// 0x1020       1    Capability to function   <- Valid capability!
 
// Process's authority = all words with tag=1 it can reach
 
// Stack frame with capabilities:
struct stack_frame {
    uint64_t saved_regs[10];              // Plain integers (tag=0)
    void* __capability return_cap;         // Return capability (tag=1)
    void* __capability caller_frame;       // Caller's frame (tag=1)
    void* __capability local_buffer;       // Buffer cap (tag=1)
}

C-List Implementation Comparison
Aspect	Unix FD Table	seL4 CSpace	CHERI Memory
Storage	Kernel memory	Kernel CNodes	Tagged user memory
Addressing	Integer index	Hierarchical path	Direct pointer
Max Capabilities	~1024 per process	Bounded by CNodes	Bounded by memory
Lookup Cost	O(1) array	O(depth) tree walk	O(1) direct
Protection	Kernel isolation	Kernel isolation	Hardware tags
Delegation	SCM_RIGHTS	Cap grant syscall	Memory sharing

Performance Considerations

Lookup Optimization Techniques

•Direct Indexing — For small, fixed-size tables, array indexing is O(1) and very cache-friendly. Unix file descriptors use this approach.
•Cached Lookups — Cache recently-used capability resolutions. seL4 caches CSpace walk results for hot paths.
•Hardware Assistance — CHERI eliminates lookup entirely; the capability IS the pointer. No indirection needed.
•Batch Operations — Some systems allow specifying multiple capabilities in a single syscall to amortize lookup overhead.
•Lazy Validation — Delay full validation until actually needed. Validate on first use, cache result.

CHERI's Space Trade-off

C-List Performance Characteristics
Metric	Kernel Tables	Hardware Capabilities	Difference
Capability Lookup	~100-500 cycles (syscall)	~1-5 cycles (inline)	50-500x faster
Bounds Check	Kernel validation	Hardware inline	Similar to lookup
Memory per Cap	8-32 bytes	16-32 bytes	Similar or slightly larger
Creation Cost	Syscall + allocation	Derive from parent	Faster derivation
Transfer Cost	IPC message	Memory copy	Depends on mechanism
Revocation Cost	Table scan or indirect	Per-object or table scan	Similar complexity

Summary: Capability Lists

Key Takeaways

•C-lists are per-subject capability containers — Each process has its own C-list containing all the capabilities it holds. This represents one row of the access matrix.
•C-list protection is essential — If processes could modify their own C-lists, they could grant themselves arbitrary access rights. Protection mechanisms include kernel isolation, tagged memory, and cryptographic enforcement.
•Capability addressing determines lookup performance — Simple indices provide O(1) access; hierarchical schemes (like seL4 CSpace) provide flexibility for large, sparse capability sets.
•Core operations include acquire, duplicate, attenuate, grant, transfer, and delete — These maintain the invariant that rights can only decrease, never increase.
•Initial capabilities come from parents, kernel, or capability servers — Pure capability systems have no ambient authority; all access requires explicit capabilities.
•Real systems implement C-lists differently — Unix fd tables, seL4 CSpace, and CHERI tagged memory each make different trade-offs between simplicity, performance, and security.
•Performance matters — Capability operations happen frequently; efficient lookup and minimal system call overhead are critical for usable systems.
•Hardware capabilities eliminate indirection — CHERI turns every pointer into a capability, removing separate C-list overhead but requiring hardware support.

Page Complete

2 / 5