Operating SystemsInterview Problem Types

Interview Problem Types

LevelAdvanced

Duration180 mins

TopicInterview Problem Types

5 / 5

System Design

Designing Operating System Components

OS system design problems are the most open-ended and challenging category of operating systems interview questions. Unlike numerical problems with exact answers, system design questions evaluate your ability to make architectural decisions, reason about tradeoffs, and communicate complex designs clearly.

These problems simulate real engineering work: you're given requirements and constraints, and must design a working system. Interviewers assess not just correctness, but also your design methodology, ability to handle ambiguity, and consideration of edge cases.

System design questions reveal how you think. They expose whether you can decompose problems systematically, identify key abstractions, and anticipate failure modes. This page provides frameworks and detailed examples for every common OS system design question.

What You Will Master

By the end of this page, you will: (1) Apply a systematic methodology for OS design problems, (2) Design schedulers with specific properties, (3) Design memory allocators and virtual memory systems, (4) Design file systems meeting given constraints, (5) Design synchronization mechanisms and concurrent data structures.

System Design Methodology for OS Problems

Before diving into specific designs, establish a systematic approach. This methodology applies to any OS design question.

The DRIVE Framework:

D - Define Requirements
- Clarify functional requirements (what must the system do?)
- Clarify non-functional requirements (performance, scalability, fairness)
- Identify constraints (memory, time, hardware assumptions)
- Ask clarifying questions—don't assume!
R - Review Existing Approaches
- Briefly mention known solutions
- Identify their strengths and weaknesses
- Use them as building blocks or contrast points
I - Identify Core Abstractions
- What are the key data structures?
- What are the key algorithms?
- What are the interfaces (APIs) between components?
V - Validate with Examples
- Walk through concrete scenarios
- Test edge cases
- Verify invariants are maintained
E - Evaluate Tradeoffs
- Analyze time/space complexity
- Discuss performance characteristics
- Consider alternative designs and why yours is preferred

Common OS Design Problem Types
Category	Example Problems	Key Considerations
Scheduling	Thread pool, Task scheduler, Fair scheduler	Fairness, priority, starvation, overhead
Memory	malloc/free, Object pool, Memory-mapped files	Fragmentation, speed, metadata overhead
File Systems	Log-structured FS, Distributed FS	Consistency, crash recovery, performance
Synchronization	Custom locks, Lock-free structures	Correctness, progress, performance
IPC	Message queue, Shared memory system	Latency, copying overhead, ordering

The 5-Minute Clarification

Spend the first 5 minutes asking questions and restating requirements. This prevents designing the wrong system. Good questions: What's the expected load? What are the failure modes we care about? What's the priority ordering for tradeoffs? Is there existing infrastructure we can leverage?

Design a CPU Scheduler

Problem Statement:

Design a CPU scheduler for a general-purpose operating system. The scheduler should provide good interactive response time while maintaining throughput for batch processes. It should prevent starvation and adapt to process behavior.

Clarifying Questions:

How many priority levels? (Let's say configurable, default 40)
Single or multi-processor? (Single for now, discuss SMP later)
Time quantum? (Configurable, different per priority level)
Are there real-time requirements? (Soft real-time support desired)

Design: Multi-Level Feedback Queue (MLFQ)

After analyzing requirements, MLFQ is the best fit because:

Approximates SJF without knowing burst times (good turnaround)
Gives interactive processes high priority (good response time)
Prevents starvation through periodic boosting
Adapts to process behavior automatically

Core Data Structures:

// Per-priority-level queue
struct run_queue {
    struct list_head tasks;  // Linked list of runnable tasks
    int time_quantum;        // Time slice for this level
    int num_tasks;           // Count for O(1) length check
};

// Global scheduler state
struct scheduler {
    struct run_queue queues[NUM_PRIORITY_LEVELS];  // e.g., 40 levels
    int highest_nonempty;  // Optimization: track highest priority queue
    spinlock_t lock;       // Protect scheduler state
    uint64_t boost_timer;  // Time until next boost
};

// Per-task scheduling information
struct task_sched_info {
    int priority;            // Current priority level
    int time_remaining;      // Time left in current quantum
    int time_used_at_level;  // Used for demotion decisions
    enum { CPU_BOUND, IO_BOUND, UNKNOWN } behavior;
};

Key Algorithms:

1. Task Selection (pick_next_task):

struct task* pick_next_task(struct scheduler* sched) {
    // Start from highest priority
    for (int i = sched->highest_nonempty; i >= 0; i--) {
        if (!list_empty(&sched->queues[i].tasks)) {
            // Round-robin within priority level
            struct task* t = list_first_entry(
                &sched->queues[i].tasks, struct task, run_list);
            list_move_tail(&t->run_list, &sched->queues[i].tasks);
            t->sched.time_remaining = sched->queues[i].time_quantum;
            return t;
        }
    }
    return idle_task;  // No runnable tasks
}

2. Timer Tick Handler:

void scheduler_tick(struct task* current) {
    current->sched.time_remaining--;
    current->sched.time_used_at_level++;
    
    if (current->sched.time_remaining == 0) {
        // Quantum expired: demote (CPU-bound behavior)
        if (current->sched.priority > 0) {
            demote_task(current);
        }
        schedule();  // Trigger reschedule
    }
    
    // Check for periodic boost
    if (should_boost()) {
        boost_all_tasks();
    }
}

3. I/O Return Handler:

void wake_from_io(struct task* t) {
    // I/O-bound behavior: keep or boost priority
    if (t->sched.time_used_at_level < QUANTUM / 2) {
        // Used little CPU before blocking: likely interactive
        if (t->sched.priority < MAX_PRIORITY) {
            promote_task(t);
        }
    }
    t->sched.time_used_at_level = 0;
    enqueue_task(t);
}

Game-the-Scheduler Attack

A malicious process could game MLFQ by yielding just before quantum expiration to avoid demotion. Mitigation: Track total CPU time at each level, not just current quantum. Demote after cumulative threshold, regardless of individual quantum usage.

Evaluation:

Requirement	How Design Satisfies It
Interactive response	High priority for I/O-bound behavior
Batch throughput	Lower overhead than per-process timing
No starvation	Periodic boost brings all to top
Adapts to behavior	Demotion on CPU use, promotion on I/O
Configurable	Per-level quantum, boost interval adjustable

SMP Extension:

Per-CPU run queues (avoid global lock contention)
Load balancing daemon periodically migrates tasks
Affinity hints to reduce cache invalidation
Global highest-priority tracking for preemption decisions

Design a Memory Allocator (malloc/free)

Problem Statement:

Design malloc() and free() functions for heap memory management. The allocator should minimize fragmentation, provide fast allocation/deallocation, and handle a wide range of allocation sizes.

Requirements Clarification:

Allocation sizes: 1 byte to many megabytes
Must handle interleaved allocations and frees
Thread-safe? (Yes, but discuss overhead)
Memory overhead constraints? (Metadata should be minimal)

Design: Segregated Free Lists with Buddy System for Large Blocks

Hybrid approach to handle different size classes optimally:

Small allocations (≤4KB): Segregated free lists by size class
Large allocations (>4KB): Buddy allocator for low fragmentation
Very large (>256KB): Direct mmap() from OS

Small Allocation: Segregated Free Lists

Size classes: 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 bytes

#define NUM_SIZE_CLASSES 10

struct free_list_node {
    struct free_list_node* next;
};

struct allocator {
    struct free_list_node* free_lists[NUM_SIZE_CLASSES];
    size_t size_class_sizes[NUM_SIZE_CLASSES];  // 8, 16, 32...
    spinlock_t locks[NUM_SIZE_CLASSES];  // Fine-grained locking
};

int size_class_index(size_t size) {
    // Add header overhead, round up to power of 2
    size_t total = size + sizeof(block_header);
    if (total <= 8) return 0;
    if (total <= 16) return 1;
    // ... (or use bit manipulation: 63 - clz(total-1) - 2)
    return NUM_SIZE_CLASSES - 1;
}

Block Header (minimal metadata):

struct block_header {
    size_t size;       // Actual allocation size (for free)
    uint8_t size_class;  // Which free list this belongs to
    uint8_t flags;     // Allocated, large block, etc.
};

malloc() Implementation:

void* malloc(size_t size) {
    if (size == 0) return NULL;
    
    if (size > LARGE_THRESHOLD) {
        return large_alloc(size);  // Buddy or mmap
    }
    
    int class_idx = size_class_index(size);
    size_t actual_size = size_class_sizes[class_idx];
    
    lock(&allocator.locks[class_idx]);
    
    if (allocator.free_lists[class_idx] == NULL) {
        // Refill: get a page from OS, carve into blocks
        refill_free_list(class_idx);
    }
    
    struct free_list_node* block = allocator.free_lists[class_idx];
    allocator.free_lists[class_idx] = block->next;
    
    unlock(&allocator.locks[class_idx]);
    
    // Set up header
    struct block_header* header = (struct block_header*)block;
    header->size = size;
    header->size_class = class_idx;
    header->flags = ALLOCATED;
    
    return (void*)(header + 1);  // Return pointer after header
}

free() Implementation:

void free(void* ptr) {
    if (ptr == NULL) return;
    
    struct block_header* header = ((struct block_header*)ptr) - 1;
    
    if (header->flags & LARGE_BLOCK) {
        large_free(header);
        return;
    }
    
    int class_idx = header->size_class;
    
    lock(&allocator.locks[class_idx]);
    
    struct free_list_node* node = (struct free_list_node*)header;
    node->next = allocator.free_lists[class_idx];
    allocator.free_lists[class_idx] = node;
    
    unlock(&allocator.locks[class_idx]);
}

Per-Thread Caching (tcmalloc style)

For high-performance multi-threaded allocation, use thread-local caches. Each thread has its own small free lists. Only when thread cache is empty/full do we access shared global state. This eliminates lock contention for common cases.

Large Block Allocation: Buddy System

For blocks >4KB, use buddy allocation:

Maintain free lists for powers of 2: 4KB, 8KB, 16KB, ..., 256KB
Split larger blocks when smaller needed
Coalesce buddies on free

void* buddy_alloc(size_t size) {
    int order = get_order(size);  // Which power of 2
    
    // Find smallest available block >= required order
    for (int o = order; o < MAX_ORDER; o++) {
        if (buddy_free_lists[o] != NULL) {
            struct buddy_block* block = buddy_free_lists[o];
            buddy_free_lists[o] = block->next;
            
            // Split block down to required size
            while (o > order) {
                o--;
                // Create buddy and add to free list
                struct buddy_block* buddy = block + (1 << o);
                buddy->next = buddy_free_lists[o];
                buddy_free_lists[o] = buddy;
            }
            
            return block;
        }
    }
    return NULL;  // Out of memory
}

Coalescing on Free:

void buddy_free(void* ptr, int order) {
    struct buddy_block* block = ptr;
    
    // Try to coalesce with buddy
    while (order < MAX_ORDER - 1) {
        struct buddy_block* buddy = find_buddy(block, order);
        if (!is_free(buddy) || buddy->order != order) {
            break;  // Can't coalesce
        }
        remove_from_free_list(buddy, order);
        block = min(block, buddy);  // Take lower address
        order++;
    }
    
    add_to_free_list(block, order);
}

Allocator Design Evaluation
Metric	Our Design	Why
Small alloc time	O(1)	Direct free list lookup
Large alloc time	O(log n) worst	Buddy split/coalesce
External fragmentation	Low	Buddy coalescing
Internal fragmentation	~25% for small	Power-of-2 size classes
Thread scalability	Good	Per-size-class locks or TLS caches

Design a File System

Problem Statement:

Design a file system optimized for a modern SSD with high write throughput and crash consistency. The file system should support POSIX-like semantics and scale to millions of files.

Requirements Clarification:

Storage device: SSD (no seek penalty, write amplification matters)
Primary workload: Mixed reads and writes, many small files
Consistency: Must survive crashes without corruption
Scale: Multiple TBs, millions of files per directory

Design: Copy-on-Write B-tree File System (inspired by Btrfs/ZFS)

Key insight: SSDs change the tradeoff landscape:

Sequential vs random reads: similar performance
Random writes: still want to avoid (write amplification)
Crash consistency: can leverage CoW for atomic updates

On-Disk Structures:

1. Superblock:

Offset 0: Magic number, version, block size
Offset 32: Root tree location (block number)
Offset 48: Free space tree location  
Offset 64: Checksum tree location
Offset 80: Filesystem size, used blocks
Offset 128: Commit generation number

Two superblocks (mirrored) for reliability.

2. B-tree Nodes:

All data organized in B-trees:

FS Tree: Maps (inode, type, offset) → data location
Directory Tree: Maps (parent_inode, name_hash, name) → child_inode
Free Space Tree: Tracks free extents
Checksum Tree: Stores checksums for data blocks

struct btree_node {
    uint32_t magic;
    uint32_t generation;     // For CoW detection
    uint16_t num_items;
    uint16_t level;          // 0 = leaf
    uint8_t  checksum[32];   // Self-checksum
    // Keys and values/child pointers follow
};

struct btree_key {
    uint64_t objectid;  // Inode number
    uint8_t  type;      // INODE_ITEM, DIR_ITEM, EXTENT_DATA, etc.
    uint64_t offset;    // Interpretation depends on type
};

Key Operations:

1. File Write (Copy-on-Write):

1. Allocate new blocks for modified data
2. Write new data to new blocks
3. Update B-tree leaf with new extent pointers (CoW the leaf)
4. CoW up the tree to root
5. Atomically update root pointer in superblock
6. Old blocks become free after checkpoint

2. Directory Lookup:

1. Hash filename
2. Search directory B-tree with key (parent_inode, hash, name)
3. Return child inode number
4. Complexity: O(log n) for n directory entries

3. Crash Recovery:

1. Read both superblocks, use one with higher valid generation
2. Tree is always consistent (CoW ensures atomicity)
3. Roll forward any pending committed transactions
4. No fsck needed!

Copy-on-Write Consistency

CoW provides atomic updates without journaling overhead. A transaction is committed by atomically updating the root pointer. Either the new tree is visible (committed) or the old tree is visible (rolled back). No torn writes possible.

Free Space Management:

Extent-based Allocation:

struct extent {
    uint64_t start_block;
    uint64_t num_blocks;
};

Free space B-tree stores (start, length) extents. Allocation strategy: Best-fit or next-fit within preferred region.

Write Amplification Mitigation:

Delayed allocation: Buffer writes, allocate contiguously on flush
Large extent preference: Reduce metadata
SSD TRIM support: Proactively inform SSD of freed blocks

Scalability Features:

Challenge	Solution
Many small files	Inline data in B-tree leaves (< 2KB)
Large directories	B-tree with O(log n) lookup
Parallel access	Per-inode locking, concurrent B-tree
Large files	Extent-based allocation, sparse files

Design Evaluation vs Requirements
Requirement	Solution	Complexity
SSD optimization	CoW avoids random overwrites	O(log n) for updates
Crash consistency	Atomic root pointer update	Instant recovery
Scale to millions	B-tree directories	O(log n) lookup
POSIX semantics	Full inode/directory model	Standard API
Space efficiency	Inline data, extents	Minimal overhead

Design Synchronization Primitives

Problem Statement:

Design a reader-writer lock that provides writer preference (writers don't starve) while allowing high read concurrency.

Requirements:

Multiple readers can hold the lock simultaneously
Only one writer at a time, excludes all readers
Writers have priority: pending writers block new readers
No starvation for either party

Design: Fair Reader-Writer Lock with Writer Preference

Data Structure:

struct rw_lock {
    pthread_mutex_t mutex;       // Protects all state
    pthread_cond_t readers_cv;   // Readers wait here
    pthread_cond_t writers_cv;   // Writers wait here
    
    int readers_active;          // Currently reading
    int readers_waiting;         // Waiting to read
    int writers_active;          // 0 or 1
    int writers_waiting;         // Waiting to write
};

void rw_lock_init(struct rw_lock* lock) {
    pthread_mutex_init(&lock->mutex, NULL);
    pthread_cond_init(&lock->readers_cv, NULL);
    pthread_cond_init(&lock->writers_cv, NULL);
    lock->readers_active = 0;
    lock->readers_waiting = 0;
    lock->writers_active = 0;
    lock->writers_waiting = 0;
}

Read Lock:

void read_lock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    // Writer preference: wait if writers are waiting or active
    lock->readers_waiting++;
    while (lock->writers_active > 0 || lock->writers_waiting > 0) {
        pthread_cond_wait(&lock->readers_cv, &lock->mutex);
    }
    lock->readers_waiting--;
    lock->readers_active++;
    
    pthread_mutex_unlock(&lock->mutex);
}

void read_unlock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    lock->readers_active--;
    
    // If last reader and writers waiting, signal a writer
    if (lock->readers_active == 0 && lock->writers_waiting > 0) {
        pthread_cond_signal(&lock->writers_cv);
    }
    
    pthread_mutex_unlock(&lock->mutex);
}

Write Lock:

void write_lock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    lock->writers_waiting++;
    while (lock->readers_active > 0 || lock->writers_active > 0) {
        pthread_cond_wait(&lock->writers_cv, &lock->mutex);
    }
    lock->writers_waiting--;
    lock->writers_active = 1;
    
    pthread_mutex_unlock(&lock->mutex);
}

void write_unlock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    lock->writers_active = 0;
    
    // Prefer writers: signal another writer if waiting
    if (lock->writers_waiting > 0) {
        pthread_cond_signal(&lock->writers_cv);
    } else {
        // No writers waiting: wake all readers
        pthread_cond_broadcast(&lock->readers_cv);
    }
    
    pthread_mutex_unlock(&lock->mutex);
}

Starvation Analysis

With pure writer preference, continuous writer arrivals could starve readers. To guarantee bounded waiting: (1) Use a ticket system ordering arrivals, or (2) After N successive writers, allow waiting readers through. Our design above does starve readers under heavy write load—acknowledge this limitation!

Alternative: Fair Lock with Ticket System

struct fair_rw_lock {
    atomic_uint64_t next_ticket;  // Next ticket to issue
    atomic_uint64_t serving;      // Currently serving ticket
    atomic_int readers_active;
    bool writer_active;
    // Per-ticket type: READER or WRITER
    uint8_t ticket_types[MAX_PENDING];
};

Each arrival takes a ticket. Tickets are served in order. When a reader ticket is served, all consecutive reader tickets are served together. This provides FIFO fairness while maintaining read concurrency.

Lock-Free Readers (Optimistic Read):

For read-heavy workloads, optimistic reading can avoid locks entirely:

struct seqlock {
    atomic_uint64_t sequence;  // Odd = write in progress
    // Protected data follows
};

// Reader: retry if sequence changed during read
uint64_t seq1 = atomic_load(&lock->sequence);
if (seq1 & 1) { retry; }  // Write in progress
// Read data
uint64_t seq2 = atomic_load(&lock->sequence);
if (seq1 != seq2) { retry; }  // Data changed during read

Readers never block or take locks—ideal for frequently-read, rarely-written data.

Design a Virtual Memory System

Problem Statement:

Design the virtual memory subsystem for an OS. It should support demand paging, memory-mapped files, copy-on-write for fork(), and shared memory between processes.

Key Components:

Address Space Representation - Track virtual memory regions
Page Fault Handler - Handle access to unmapped/swapped pages
Page Replacement - Choose victims when memory is scarce
Page Table Management - Maintain translation structures

Address Space Data Structure:

// Virtual Memory Area - represents a contiguous region
struct vm_area {
    uint64_t start;           // Start virtual address
    uint64_t end;             // End virtual address
    uint32_t flags;           // VM_READ, VM_WRITE, VM_EXEC, VM_SHARED
    struct file* mapped_file; // NULL for anonymous memory
    uint64_t file_offset;     // Offset in file
    struct vm_area* next;     // Next VMA in address space
};

// Process address space
struct mm_struct {
    struct vm_area* vma_list;    // List of VMAs (sorted by address)
    struct rb_root vma_tree;     // Red-black tree for fast lookup
    pgd_t* page_table;           // Top-level page table
    atomic_int users;            // Reference count
    spinlock_t lock;
    // Statistics
    size_t total_vm, resident_vm, shared_vm;
};

VMA Operations:

find_vma(mm, addr): Find VMA containing address (O(log n) with tree)
insert_vma(mm, vma): Add new VMA, merge with adjacent if possible
split_vma(mm, vma, addr): Split VMA at address (for partial unmap)

Page Fault Handler:

int handle_page_fault(struct mm_struct* mm, uint64_t fault_addr, 
                      uint32_t error_code) {
    struct vm_area* vma = find_vma(mm, fault_addr);
    
    // Check if address is valid
    if (vma == NULL || fault_addr < vma->start) {
        return -SIGSEGV;  // Segmentation fault
    }
    
    // Check permissions
    if ((error_code & WRITE_FAULT) && !(vma->flags & VM_WRITE)) {
        // Check for copy-on-write
        if (is_cow_page(mm, fault_addr)) {
            return handle_cow_fault(mm, fault_addr);
        }
        return -SIGSEGV;  // Write to read-only
    }
    
    // Page not present - need to bring it in
    if (vma->mapped_file) {
        return handle_file_fault(mm, vma, fault_addr);
    } else {
        return handle_anonymous_fault(mm, vma, fault_addr);
    }
}

int handle_anonymous_fault(struct mm_struct* mm, struct vm_area* vma,
                           uint64_t fault_addr) {
    // Allocate physical page
    struct page* page = alloc_page(GFP_USER);
    if (!page) {
        return -ENOMEM;
    }
    
    // Zero the page (security)
    memset(page_address(page), 0, PAGE_SIZE);
    
    // Install in page table
    pte_t pte = mk_pte(page, vma_prot(vma));
    set_pte(mm->page_table, fault_addr, pte);
    
    return 0;
}

Copy-on-Write for fork():

struct mm_struct* copy_mm(struct mm_struct* parent) {
    struct mm_struct* child = alloc_mm();
    
    // Copy VMA list
    for (struct vm_area* vma = parent->vma_list; vma; vma = vma->next) {
        struct vm_area* new_vma = copy_vma(vma);
        insert_vma(child, new_vma);
    }
    
    // Mark all writable pages as COW (read-only + COW flag)
    for_each_pte(parent->page_table, pte) {
        if (pte_write(pte)) {
            pte_clear_write(pte);  // Make read-only
            pte_set_cow(pte);      // Mark as COW
            
            // Increment page reference count
            struct page* page = pte_page(pte);
            atomic_inc(&page->refcount);
        }
        
        // Copy PTE to child (pointing to same physical page)
        copy_pte(child->page_table, parent->page_table, pte_addr);
    }
    
    return child;
}

int handle_cow_fault(struct mm_struct* mm, uint64_t addr) {
    pte_t* pte = lookup_pte(mm->page_table, addr);
    struct page* old_page = pte_page(*pte);
    
    if (atomic_read(&old_page->refcount) == 1) {
        // We're the only user - just make writable
        pte_set_write(pte);
        pte_clear_cow(pte);
        return 0;
    }
    
    // Allocate new page and copy
    struct page* new_page = alloc_page(GFP_USER);
    memcpy(page_address(new_page), page_address(old_page), PAGE_SIZE);
    
    // Update PTE to point to new page
    set_pte(mm->page_table, addr, mk_pte(new_page, VM_READ | VM_WRITE));
    
    // Decrement old page refcount
    put_page(old_page);
    
    return 0;
}

Page Replacement Strategy

For page replacement, use Clock (Second-Chance) algorithm: maintain a circular list of pages with reference bits. On eviction, clear reference bit and advance; evict first page with cleared bit. This approximates LRU with O(1) amortized cost.

System Design Interview Strategies

Beyond the technical content, how you present your design matters enormously. Here are proven strategies for OS system design interviews.

Communication Strategies:

Think Out Loud
- Verbalize your thought process
- Say "I'm considering X because..." rather than thinking silently
- Interviewers want to understand your reasoning, not just the answer
Draw Diagrams
- Always sketch data structures and flows
- Visual representations clarify complex interactions
- Update diagrams as design evolves
Start Simple, Then Elaborate
- Begin with simplest working solution
- Add complexity when asked or when time permits
- "Let me start with a basic version, then we can optimize"
Acknowledge Tradeoffs
- No design is perfect—discuss pros and cons
- "This optimizes for X at the cost of Y"
- Shows maturity and real-world understanding

Common Follow-up Questions to Prepare For

•"How would this scale to multiple CPUs/cores?"
•"What happens when memory/disk is full?"
•"How do you handle crashes during this operation?"
•"What's the worst-case performance?"
•"How would you test this design?"
•"What metrics would you monitor in production?"
•"How would you make this lock-free?"
•"What's the memory overhead of your data structures?"

Red Flags vs Green Flags in System Design
❌ Red Flags	✅ Green Flags
Jumping to code immediately	Thoroughly understanding requirements first
Single solution without alternatives	Discussing multiple approaches
Ignoring edge cases	Proactively considering failure modes
Claiming design is perfect	Acknowledging limitations
Silent thinking	Continuous communication
Rigid design	Iterating based on feedback

The 45-Minute Structure

For a 45-minute design interview: 5 min requirements, 25 min core design with data structures and algorithms, 10 min deep dive on one component, 5 min tradeoffs and alternatives. Adjust based on interviewer cues.

Summary: System Design Mastery

Key Design Patterns

•Layered abstraction: Separate policy from mechanism, interfaces from implementation
•Caching: Cache frequently accessed data at every layer (TLB, dcache, buffer cache)
•Copy-on-write: Defer expensive copies until modification is actually needed
•Batching: Amortize fixed costs over multiple operations (delayed allocation)
•Fine-grained locking: Lock at the smallest scope possible for concurrency
•Atomic operations: Build complex atomicity from simple primitives (CAS, barriers)

Designs Every OS Engineer Should Know

•MLFQ scheduler with priority boost and gaming protection
•Slab allocator for fixed-size objects with caching
•Buddy allocator for variable-size power-of-two blocks
•Copy-on-write B-tree for crash-consistent file systems
•Two-phase commit for distributed transaction atomicity

Module Complete

Congratulations! You've completed the Interview Problem Types module. You now have comprehensive knowledge of scheduling, synchronization, memory, file system, and system design problems. Practice these patterns with real problems, and you'll be prepared for any OS technical interview.

5 / 5

Loading learning content...

Operating SystemsInterview Problem Types

Interview Problem Types

LevelAdvanced

Duration180 mins

TopicInterview Problem Types

5 / 5

System Design

Designing Operating System Components

What You Will Master

System Design Methodology for OS Problems

Before diving into specific designs, establish a systematic approach. This methodology applies to any OS design question.

The DRIVE Framework:

D - Define Requirements
- Clarify functional requirements (what must the system do?)
- Clarify non-functional requirements (performance, scalability, fairness)
- Identify constraints (memory, time, hardware assumptions)
- Ask clarifying questions—don't assume!
R - Review Existing Approaches
- Briefly mention known solutions
- Identify their strengths and weaknesses
- Use them as building blocks or contrast points
I - Identify Core Abstractions
- What are the key data structures?
- What are the key algorithms?
- What are the interfaces (APIs) between components?
V - Validate with Examples
- Walk through concrete scenarios
- Test edge cases
- Verify invariants are maintained
E - Evaluate Tradeoffs
- Analyze time/space complexity
- Discuss performance characteristics
- Consider alternative designs and why yours is preferred

Common OS Design Problem Types
Category	Example Problems	Key Considerations
Scheduling	Thread pool, Task scheduler, Fair scheduler	Fairness, priority, starvation, overhead
Memory	malloc/free, Object pool, Memory-mapped files	Fragmentation, speed, metadata overhead
File Systems	Log-structured FS, Distributed FS	Consistency, crash recovery, performance
Synchronization	Custom locks, Lock-free structures	Correctness, progress, performance
IPC	Message queue, Shared memory system	Latency, copying overhead, ordering

The 5-Minute Clarification

Design a CPU Scheduler

Problem Statement:

Clarifying Questions:

How many priority levels? (Let's say configurable, default 40)
Single or multi-processor? (Single for now, discuss SMP later)
Time quantum? (Configurable, different per priority level)
Are there real-time requirements? (Soft real-time support desired)

Design: Multi-Level Feedback Queue (MLFQ)

After analyzing requirements, MLFQ is the best fit because:

Approximates SJF without knowing burst times (good turnaround)
Gives interactive processes high priority (good response time)
Prevents starvation through periodic boosting
Adapts to process behavior automatically

Core Data Structures:

// Per-priority-level queue
struct run_queue {
    struct list_head tasks;  // Linked list of runnable tasks
    int time_quantum;        // Time slice for this level
    int num_tasks;           // Count for O(1) length check
};

// Global scheduler state
struct scheduler {
    struct run_queue queues[NUM_PRIORITY_LEVELS];  // e.g., 40 levels
    int highest_nonempty;  // Optimization: track highest priority queue
    spinlock_t lock;       // Protect scheduler state
    uint64_t boost_timer;  // Time until next boost
};

// Per-task scheduling information
struct task_sched_info {
    int priority;            // Current priority level
    int time_remaining;      // Time left in current quantum
    int time_used_at_level;  // Used for demotion decisions
    enum { CPU_BOUND, IO_BOUND, UNKNOWN } behavior;
};

Key Algorithms:

1. Task Selection (pick_next_task):

struct task* pick_next_task(struct scheduler* sched) {
    // Start from highest priority
    for (int i = sched->highest_nonempty; i >= 0; i--) {
        if (!list_empty(&sched->queues[i].tasks)) {
            // Round-robin within priority level
            struct task* t = list_first_entry(
                &sched->queues[i].tasks, struct task, run_list);
            list_move_tail(&t->run_list, &sched->queues[i].tasks);
            t->sched.time_remaining = sched->queues[i].time_quantum;
            return t;
        }
    }
    return idle_task;  // No runnable tasks
}

2. Timer Tick Handler:

void scheduler_tick(struct task* current) {
    current->sched.time_remaining--;
    current->sched.time_used_at_level++;
    
    if (current->sched.time_remaining == 0) {
        // Quantum expired: demote (CPU-bound behavior)
        if (current->sched.priority > 0) {
            demote_task(current);
        }
        schedule();  // Trigger reschedule
    }
    
    // Check for periodic boost
    if (should_boost()) {
        boost_all_tasks();
    }
}

3. I/O Return Handler:

void wake_from_io(struct task* t) {
    // I/O-bound behavior: keep or boost priority
    if (t->sched.time_used_at_level < QUANTUM / 2) {
        // Used little CPU before blocking: likely interactive
        if (t->sched.priority < MAX_PRIORITY) {
            promote_task(t);
        }
    }
    t->sched.time_used_at_level = 0;
    enqueue_task(t);
}

Game-the-Scheduler Attack

Evaluation:

Requirement	How Design Satisfies It
Interactive response	High priority for I/O-bound behavior
Batch throughput	Lower overhead than per-process timing
No starvation	Periodic boost brings all to top
Adapts to behavior	Demotion on CPU use, promotion on I/O
Configurable	Per-level quantum, boost interval adjustable

SMP Extension:

Per-CPU run queues (avoid global lock contention)
Load balancing daemon periodically migrates tasks
Affinity hints to reduce cache invalidation
Global highest-priority tracking for preemption decisions

Design a Memory Allocator (malloc/free)

Problem Statement:

Design malloc() and free() functions for heap memory management. The allocator should minimize fragmentation, provide fast allocation/deallocation, and handle a wide range of allocation sizes.

Requirements Clarification:

Allocation sizes: 1 byte to many megabytes
Must handle interleaved allocations and frees
Thread-safe? (Yes, but discuss overhead)
Memory overhead constraints? (Metadata should be minimal)

Design: Segregated Free Lists with Buddy System for Large Blocks

Hybrid approach to handle different size classes optimally:

Small allocations (≤4KB): Segregated free lists by size class
Large allocations (>4KB): Buddy allocator for low fragmentation
Very large (>256KB): Direct mmap() from OS

Small Allocation: Segregated Free Lists

Size classes: 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 bytes

#define NUM_SIZE_CLASSES 10

struct free_list_node {
    struct free_list_node* next;
};

struct allocator {
    struct free_list_node* free_lists[NUM_SIZE_CLASSES];
    size_t size_class_sizes[NUM_SIZE_CLASSES];  // 8, 16, 32...
    spinlock_t locks[NUM_SIZE_CLASSES];  // Fine-grained locking
};

int size_class_index(size_t size) {
    // Add header overhead, round up to power of 2
    size_t total = size + sizeof(block_header);
    if (total <= 8) return 0;
    if (total <= 16) return 1;
    // ... (or use bit manipulation: 63 - clz(total-1) - 2)
    return NUM_SIZE_CLASSES - 1;
}

Block Header (minimal metadata):

struct block_header {
    size_t size;       // Actual allocation size (for free)
    uint8_t size_class;  // Which free list this belongs to
    uint8_t flags;     // Allocated, large block, etc.
};

malloc() Implementation:

void* malloc(size_t size) {
    if (size == 0) return NULL;
    
    if (size > LARGE_THRESHOLD) {
        return large_alloc(size);  // Buddy or mmap
    }
    
    int class_idx = size_class_index(size);
    size_t actual_size = size_class_sizes[class_idx];
    
    lock(&allocator.locks[class_idx]);
    
    if (allocator.free_lists[class_idx] == NULL) {
        // Refill: get a page from OS, carve into blocks
        refill_free_list(class_idx);
    }
    
    struct free_list_node* block = allocator.free_lists[class_idx];
    allocator.free_lists[class_idx] = block->next;
    
    unlock(&allocator.locks[class_idx]);
    
    // Set up header
    struct block_header* header = (struct block_header*)block;
    header->size = size;
    header->size_class = class_idx;
    header->flags = ALLOCATED;
    
    return (void*)(header + 1);  // Return pointer after header
}

free() Implementation:

void free(void* ptr) {
    if (ptr == NULL) return;
    
    struct block_header* header = ((struct block_header*)ptr) - 1;
    
    if (header->flags & LARGE_BLOCK) {
        large_free(header);
        return;
    }
    
    int class_idx = header->size_class;
    
    lock(&allocator.locks[class_idx]);
    
    struct free_list_node* node = (struct free_list_node*)header;
    node->next = allocator.free_lists[class_idx];
    allocator.free_lists[class_idx] = node;
    
    unlock(&allocator.locks[class_idx]);
}

Per-Thread Caching (tcmalloc style)

Large Block Allocation: Buddy System

For blocks >4KB, use buddy allocation:

Maintain free lists for powers of 2: 4KB, 8KB, 16KB, ..., 256KB
Split larger blocks when smaller needed
Coalesce buddies on free

void* buddy_alloc(size_t size) {
    int order = get_order(size);  // Which power of 2
    
    // Find smallest available block >= required order
    for (int o = order; o < MAX_ORDER; o++) {
        if (buddy_free_lists[o] != NULL) {
            struct buddy_block* block = buddy_free_lists[o];
            buddy_free_lists[o] = block->next;
            
            // Split block down to required size
            while (o > order) {
                o--;
                // Create buddy and add to free list
                struct buddy_block* buddy = block + (1 << o);
                buddy->next = buddy_free_lists[o];
                buddy_free_lists[o] = buddy;
            }
            
            return block;
        }
    }
    return NULL;  // Out of memory
}

Coalescing on Free:

void buddy_free(void* ptr, int order) {
    struct buddy_block* block = ptr;
    
    // Try to coalesce with buddy
    while (order < MAX_ORDER - 1) {
        struct buddy_block* buddy = find_buddy(block, order);
        if (!is_free(buddy) || buddy->order != order) {
            break;  // Can't coalesce
        }
        remove_from_free_list(buddy, order);
        block = min(block, buddy);  // Take lower address
        order++;
    }
    
    add_to_free_list(block, order);
}

Allocator Design Evaluation
Metric	Our Design	Why
Small alloc time	O(1)	Direct free list lookup
Large alloc time	O(log n) worst	Buddy split/coalesce
External fragmentation	Low	Buddy coalescing
Internal fragmentation	~25% for small	Power-of-2 size classes
Thread scalability	Good	Per-size-class locks or TLS caches

Design a File System

Problem Statement:

Design a file system optimized for a modern SSD with high write throughput and crash consistency. The file system should support POSIX-like semantics and scale to millions of files.

Requirements Clarification:

Storage device: SSD (no seek penalty, write amplification matters)
Primary workload: Mixed reads and writes, many small files
Consistency: Must survive crashes without corruption
Scale: Multiple TBs, millions of files per directory

Design: Copy-on-Write B-tree File System (inspired by Btrfs/ZFS)

Key insight: SSDs change the tradeoff landscape:

Sequential vs random reads: similar performance
Random writes: still want to avoid (write amplification)
Crash consistency: can leverage CoW for atomic updates

On-Disk Structures:

1. Superblock:

Offset 0: Magic number, version, block size
Offset 32: Root tree location (block number)
Offset 48: Free space tree location  
Offset 64: Checksum tree location
Offset 80: Filesystem size, used blocks
Offset 128: Commit generation number

Two superblocks (mirrored) for reliability.

2. B-tree Nodes:

All data organized in B-trees:

FS Tree: Maps (inode, type, offset) → data location
Directory Tree: Maps (parent_inode, name_hash, name) → child_inode
Free Space Tree: Tracks free extents
Checksum Tree: Stores checksums for data blocks

struct btree_node {
    uint32_t magic;
    uint32_t generation;     // For CoW detection
    uint16_t num_items;
    uint16_t level;          // 0 = leaf
    uint8_t  checksum[32];   // Self-checksum
    // Keys and values/child pointers follow
};

struct btree_key {
    uint64_t objectid;  // Inode number
    uint8_t  type;      // INODE_ITEM, DIR_ITEM, EXTENT_DATA, etc.
    uint64_t offset;    // Interpretation depends on type
};

Key Operations:

1. File Write (Copy-on-Write):

1. Allocate new blocks for modified data
2. Write new data to new blocks
3. Update B-tree leaf with new extent pointers (CoW the leaf)
4. CoW up the tree to root
5. Atomically update root pointer in superblock
6. Old blocks become free after checkpoint

2. Directory Lookup:

1. Hash filename
2. Search directory B-tree with key (parent_inode, hash, name)
3. Return child inode number
4. Complexity: O(log n) for n directory entries

3. Crash Recovery:

1. Read both superblocks, use one with higher valid generation
2. Tree is always consistent (CoW ensures atomicity)
3. Roll forward any pending committed transactions
4. No fsck needed!

Copy-on-Write Consistency

Free Space Management:

Extent-based Allocation:

struct extent {
    uint64_t start_block;
    uint64_t num_blocks;
};

Free space B-tree stores (start, length) extents. Allocation strategy: Best-fit or next-fit within preferred region.

Write Amplification Mitigation:

Delayed allocation: Buffer writes, allocate contiguously on flush
Large extent preference: Reduce metadata
SSD TRIM support: Proactively inform SSD of freed blocks

Scalability Features:

Challenge	Solution
Many small files	Inline data in B-tree leaves (< 2KB)
Large directories	B-tree with O(log n) lookup
Parallel access	Per-inode locking, concurrent B-tree
Large files	Extent-based allocation, sparse files

Design Evaluation vs Requirements
Requirement	Solution	Complexity
SSD optimization	CoW avoids random overwrites	O(log n) for updates
Crash consistency	Atomic root pointer update	Instant recovery
Scale to millions	B-tree directories	O(log n) lookup
POSIX semantics	Full inode/directory model	Standard API
Space efficiency	Inline data, extents	Minimal overhead

Design Synchronization Primitives

Problem Statement:

Design a reader-writer lock that provides writer preference (writers don't starve) while allowing high read concurrency.

Requirements:

Multiple readers can hold the lock simultaneously
Only one writer at a time, excludes all readers
Writers have priority: pending writers block new readers
No starvation for either party

Design: Fair Reader-Writer Lock with Writer Preference

Data Structure:

struct rw_lock {
    pthread_mutex_t mutex;       // Protects all state
    pthread_cond_t readers_cv;   // Readers wait here
    pthread_cond_t writers_cv;   // Writers wait here
    
    int readers_active;          // Currently reading
    int readers_waiting;         // Waiting to read
    int writers_active;          // 0 or 1
    int writers_waiting;         // Waiting to write
};

void rw_lock_init(struct rw_lock* lock) {
    pthread_mutex_init(&lock->mutex, NULL);
    pthread_cond_init(&lock->readers_cv, NULL);
    pthread_cond_init(&lock->writers_cv, NULL);
    lock->readers_active = 0;
    lock->readers_waiting = 0;
    lock->writers_active = 0;
    lock->writers_waiting = 0;
}

Read Lock:

void read_lock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    // Writer preference: wait if writers are waiting or active
    lock->readers_waiting++;
    while (lock->writers_active > 0 || lock->writers_waiting > 0) {
        pthread_cond_wait(&lock->readers_cv, &lock->mutex);
    }
    lock->readers_waiting--;
    lock->readers_active++;
    
    pthread_mutex_unlock(&lock->mutex);
}

void read_unlock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    lock->readers_active--;
    
    // If last reader and writers waiting, signal a writer
    if (lock->readers_active == 0 && lock->writers_waiting > 0) {
        pthread_cond_signal(&lock->writers_cv);
    }
    
    pthread_mutex_unlock(&lock->mutex);
}

Write Lock:

void write_lock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    lock->writers_waiting++;
    while (lock->readers_active > 0 || lock->writers_active > 0) {
        pthread_cond_wait(&lock->writers_cv, &lock->mutex);
    }
    lock->writers_waiting--;
    lock->writers_active = 1;
    
    pthread_mutex_unlock(&lock->mutex);
}

void write_unlock(struct rw_lock* lock) {
    pthread_mutex_lock(&lock->mutex);
    
    lock->writers_active = 0;
    
    // Prefer writers: signal another writer if waiting
    if (lock->writers_waiting > 0) {
        pthread_cond_signal(&lock->writers_cv);
    } else {
        // No writers waiting: wake all readers
        pthread_cond_broadcast(&lock->readers_cv);
    }
    
    pthread_mutex_unlock(&lock->mutex);
}

Starvation Analysis

Alternative: Fair Lock with Ticket System

struct fair_rw_lock {
    atomic_uint64_t next_ticket;  // Next ticket to issue
    atomic_uint64_t serving;      // Currently serving ticket
    atomic_int readers_active;
    bool writer_active;
    // Per-ticket type: READER or WRITER
    uint8_t ticket_types[MAX_PENDING];
};

Lock-Free Readers (Optimistic Read):

For read-heavy workloads, optimistic reading can avoid locks entirely:

struct seqlock {
    atomic_uint64_t sequence;  // Odd = write in progress
    // Protected data follows
};

// Reader: retry if sequence changed during read
uint64_t seq1 = atomic_load(&lock->sequence);
if (seq1 & 1) { retry; }  // Write in progress
// Read data
uint64_t seq2 = atomic_load(&lock->sequence);
if (seq1 != seq2) { retry; }  // Data changed during read

Readers never block or take locks—ideal for frequently-read, rarely-written data.

Design a Virtual Memory System

Problem Statement:

Design the virtual memory subsystem for an OS. It should support demand paging, memory-mapped files, copy-on-write for fork(), and shared memory between processes.

Key Components:

Address Space Representation - Track virtual memory regions
Page Fault Handler - Handle access to unmapped/swapped pages
Page Replacement - Choose victims when memory is scarce
Page Table Management - Maintain translation structures

Address Space Data Structure:

// Virtual Memory Area - represents a contiguous region
struct vm_area {
    uint64_t start;           // Start virtual address
    uint64_t end;             // End virtual address
    uint32_t flags;           // VM_READ, VM_WRITE, VM_EXEC, VM_SHARED
    struct file* mapped_file; // NULL for anonymous memory
    uint64_t file_offset;     // Offset in file
    struct vm_area* next;     // Next VMA in address space
};

// Process address space
struct mm_struct {
    struct vm_area* vma_list;    // List of VMAs (sorted by address)
    struct rb_root vma_tree;     // Red-black tree for fast lookup
    pgd_t* page_table;           // Top-level page table
    atomic_int users;            // Reference count
    spinlock_t lock;
    // Statistics
    size_t total_vm, resident_vm, shared_vm;
};

VMA Operations:

find_vma(mm, addr): Find VMA containing address (O(log n) with tree)
insert_vma(mm, vma): Add new VMA, merge with adjacent if possible
split_vma(mm, vma, addr): Split VMA at address (for partial unmap)

Page Fault Handler:

int handle_page_fault(struct mm_struct* mm, uint64_t fault_addr, 
                      uint32_t error_code) {
    struct vm_area* vma = find_vma(mm, fault_addr);
    
    // Check if address is valid
    if (vma == NULL || fault_addr < vma->start) {
        return -SIGSEGV;  // Segmentation fault
    }
    
    // Check permissions
    if ((error_code & WRITE_FAULT) && !(vma->flags & VM_WRITE)) {
        // Check for copy-on-write
        if (is_cow_page(mm, fault_addr)) {
            return handle_cow_fault(mm, fault_addr);
        }
        return -SIGSEGV;  // Write to read-only
    }
    
    // Page not present - need to bring it in
    if (vma->mapped_file) {
        return handle_file_fault(mm, vma, fault_addr);
    } else {
        return handle_anonymous_fault(mm, vma, fault_addr);
    }
}

int handle_anonymous_fault(struct mm_struct* mm, struct vm_area* vma,
                           uint64_t fault_addr) {
    // Allocate physical page
    struct page* page = alloc_page(GFP_USER);
    if (!page) {
        return -ENOMEM;
    }
    
    // Zero the page (security)
    memset(page_address(page), 0, PAGE_SIZE);
    
    // Install in page table
    pte_t pte = mk_pte(page, vma_prot(vma));
    set_pte(mm->page_table, fault_addr, pte);
    
    return 0;
}

Copy-on-Write for fork():

struct mm_struct* copy_mm(struct mm_struct* parent) {
    struct mm_struct* child = alloc_mm();
    
    // Copy VMA list
    for (struct vm_area* vma = parent->vma_list; vma; vma = vma->next) {
        struct vm_area* new_vma = copy_vma(vma);
        insert_vma(child, new_vma);
    }
    
    // Mark all writable pages as COW (read-only + COW flag)
    for_each_pte(parent->page_table, pte) {
        if (pte_write(pte)) {
            pte_clear_write(pte);  // Make read-only
            pte_set_cow(pte);      // Mark as COW
            
            // Increment page reference count
            struct page* page = pte_page(pte);
            atomic_inc(&page->refcount);
        }
        
        // Copy PTE to child (pointing to same physical page)
        copy_pte(child->page_table, parent->page_table, pte_addr);
    }
    
    return child;
}

int handle_cow_fault(struct mm_struct* mm, uint64_t addr) {
    pte_t* pte = lookup_pte(mm->page_table, addr);
    struct page* old_page = pte_page(*pte);
    
    if (atomic_read(&old_page->refcount) == 1) {
        // We're the only user - just make writable
        pte_set_write(pte);
        pte_clear_cow(pte);
        return 0;
    }
    
    // Allocate new page and copy
    struct page* new_page = alloc_page(GFP_USER);
    memcpy(page_address(new_page), page_address(old_page), PAGE_SIZE);
    
    // Update PTE to point to new page
    set_pte(mm->page_table, addr, mk_pte(new_page, VM_READ | VM_WRITE));
    
    // Decrement old page refcount
    put_page(old_page);
    
    return 0;
}

Page Replacement Strategy

System Design Interview Strategies

Beyond the technical content, how you present your design matters enormously. Here are proven strategies for OS system design interviews.

Communication Strategies:

Think Out Loud
- Verbalize your thought process
- Say "I'm considering X because..." rather than thinking silently
- Interviewers want to understand your reasoning, not just the answer
Draw Diagrams
- Always sketch data structures and flows
- Visual representations clarify complex interactions
- Update diagrams as design evolves
Start Simple, Then Elaborate
- Begin with simplest working solution
- Add complexity when asked or when time permits
- "Let me start with a basic version, then we can optimize"
Acknowledge Tradeoffs
- No design is perfect—discuss pros and cons
- "This optimizes for X at the cost of Y"
- Shows maturity and real-world understanding

Common Follow-up Questions to Prepare For

•"How would this scale to multiple CPUs/cores?"
•"What happens when memory/disk is full?"
•"How do you handle crashes during this operation?"
•"What's the worst-case performance?"
•"How would you test this design?"
•"What metrics would you monitor in production?"
•"How would you make this lock-free?"
•"What's the memory overhead of your data structures?"

Red Flags vs Green Flags in System Design
❌ Red Flags	✅ Green Flags
Jumping to code immediately	Thoroughly understanding requirements first
Single solution without alternatives	Discussing multiple approaches
Ignoring edge cases	Proactively considering failure modes
Claiming design is perfect	Acknowledging limitations
Silent thinking	Continuous communication
Rigid design	Iterating based on feedback

The 45-Minute Structure

Summary: System Design Mastery

Key Design Patterns

•Layered abstraction: Separate policy from mechanism, interfaces from implementation
•Caching: Cache frequently accessed data at every layer (TLB, dcache, buffer cache)
•Copy-on-write: Defer expensive copies until modification is actually needed
•Batching: Amortize fixed costs over multiple operations (delayed allocation)
•Fine-grained locking: Lock at the smallest scope possible for concurrency
•Atomic operations: Build complex atomicity from simple primitives (CAS, barriers)

Designs Every OS Engineer Should Know

•MLFQ scheduler with priority boost and gaming protection
•Slab allocator for fixed-size objects with caching
•Buddy allocator for variable-size power-of-two blocks
•Copy-on-write B-tree for crash-consistent file systems
•Two-phase commit for distributed transaction atomicity

Module Complete

5 / 5