Operating SystemsInterview Problem Types

Interview Problem Types

LevelAdvanced

Duration180 mins

TopicInterview Problem Types

4 / 5

File System Calculations

File System Mathematics

File system calculations test your understanding of how files and directories are organized on disk. These problems involve computing maximum file sizes, block allocation overhead, FAT table dimensions, inode requirements, and access patterns for different file operations.

Unlike memory calculations which deal with fixed-size pages, file systems introduce variable-size files, indirect block hierarchies, and the interplay between metadata and data storage. This adds layers of complexity that require careful accounting.

These problems reward systematic thinking. Each file system structure (FAT, indexed allocation, extent-based) has its own set of formulas. Mastering all three prepares you for any file system question.

What You Will Master

By the end of this page, you will: (1) Calculate maximum file sizes for indexed allocation with direct/indirect blocks, (2) Compute FAT table size and overhead, (3) Determine inode requirements and storage efficiency, (4) Analyze disk access patterns for file operations, (5) Evaluate tradeoffs between allocation strategies.

Block Allocation Fundamentals

Disk space is divided into fixed-size blocks (also called clusters or allocation units). All file systems manage allocation at the block granularity.

Key Definitions:

Block size: Fixed size of each allocation unit (512 B, 1 KB, 4 KB, 64 KB common)
Block address/number: Unique identifier for each block on disk
Block pointer size: Bytes needed to address any block = ⌈log₂(total blocks) / 8⌉

Fundamental Calculations:

Total blocks on disk = Disk capacity / Block size
Block pointer size = ⌈log₂(Total blocks) / 8⌉ bytes (round up)
Pointers per block = Block size / Block pointer size

Example:

Disk: 1 TB = 2^40 bytes
Block size: 4 KB = 2^12 bytes
Total blocks: 2^40 / 2^12 = 2^28 blocks
Block pointer: 28 bits → 4 bytes (32 bits, rounded up)
Pointers per block: 4 KB / 4 bytes = 1024 pointers

Common Block Configurations
Disk Size	Block Size	Total Blocks	Pointer Size	Pointers/Block
16 GB	4 KB	2²² = 4M	4 bytes	1024
1 TB	4 KB	2²⁸ = 256M	4 bytes	1024
16 TB	4 KB	2³² = 4G	4 bytes	1024
1 EB	4 KB	2⁴⁸	6 bytes	~682

Internal Fragmentation

Files rarely fill their last block exactly. On average, each file wastes half a block. With 4 KB blocks and 1000 files, expect ~2 MB wasted to internal fragmentation. Small block sizes reduce fragmentation but increase metadata overhead.

Storage Efficiency Analysis:

For a file of size S bytes with block size B:

Blocks required = ⌈S / B⌉
Actual space used = Blocks required × B
Internal fragmentation = Actual space - S
Average fragmentation per file = B / 2

Example:

File size: 10,000 bytes
Block size: 4 KB = 4,096 bytes
Blocks needed: ⌈10,000 / 4,096⌉ = 3 blocks
Space used: 3 × 4,096 = 12,288 bytes
Fragmentation: 12,288 - 10,000 = 2,288 bytes (18.6% overhead)

Contiguous Allocation Calculations

Contiguous allocation stores each file as a contiguous sequence of blocks. It's simple and provides excellent read performance but suffers from external fragmentation.

Directory Entry:

Filename | Starting Block | Length (in blocks)

Maximum File Size:

Limited by the largest contiguous free region on disk. Theoretically:

Max file size = Disk capacity (if entirely free and unfragmented)

Practically, fragmentation limits file size severely.

Access Time Analysis:

For a file starting at block B with length L:

First block access: 1 seek + 1 rotation + 1 transfer
Sequential read of L blocks: 1 seek + 1 rotation + L transfers (fully sequential)

Contiguous allocation is optimal for sequential access patterns.

Worked Example: Contiguous Allocation Overhead

Given:

Disk: 100 GB
Block size: 4 KB
10,000 files, average size 50 MB

Questions: a) Total data storage b) Directory entry size c) Total directory size d) Fragmentation loss estimate

Solutions:

a) Total data: 10,000 × 50 MB = 500 GB (doesn't fit in 100 GB disk!)

Let's revise: 10,000 files, average 1 MB each.

a) Total data: 10,000 × 1 MB = 10 GB

b) Directory entry:

Filename: ~32 bytes (assuming fixed-length)
Start block: 4 bytes (for 100 GB / 4 KB = 25M blocks, need 25 bits → 4 bytes)
Length: 4 bytes
Total: 40 bytes per entry

c) Directory size: 10,000 × 40 bytes = 400 KB

d) Fragmentation: ~10,000 files × 2 KB average = ~20 MB (internal only, external is filesystem-dependent)

External Fragmentation Problem

After many file creations and deletions, disk becomes fragmented with many small holes. A 10 MB file might not fit even with 50 MB total free space if no contiguous 10 MB region exists. This is why modern file systems don't use pure contiguous allocation.

Linked Allocation and FAT Calculations

Linked allocation eliminates external fragmentation by chaining blocks together. FAT (File Allocation Table) centralizes the chain pointers in a separate table.

Simple Linked Allocation:

Each block contains: [Data | Next block pointer]

Data per block = Block size - Pointer size

Problem: Random access is O(n) — must traverse chain from start.

FAT (File Allocation Table):

Centralized table with one entry per disk block:

Entry value = next block in chain, or
End-of-file marker (EOF)
Free block marker
Bad block marker

FAT Size Calculation:

FAT size = Number of blocks × Entry size
         = (Disk capacity / Block size) × Entry size

FAT Entry Size:

FAT12: 12 bits per entry (max 4096 clusters)
FAT16: 16 bits per entry (max 65,536 clusters)
FAT32: 32 bits per entry (but only 28 used, max ~268M clusters)

Worked Example: FAT32 Calculations

Given:

Disk: 1 TB
Block (cluster) size: 32 KB
FAT32: 4 bytes per FAT entry

Calculate: a) Number of clusters b) FAT size c) FAT copies (typically 2) total overhead d) Maximum file size limitation

Solutions:

a) Number of clusters: 1 TB / 32 KB = 2^40 / 2^15 = 2^25 = 33,554,432 clusters

b) FAT size: 2^25 × 4 bytes = 2^27 bytes = 128 MB

c) With 2 FAT copies: 2 × 128 MB = 256 MB overhead (0.025% of disk)

d) Maximum file size: FAT32 uses 28 bits for cluster numbers = 2^28 clusters max per chain But FAT32 specification limits file size to 2^32 - 1 bytes = ≈4 GB (independent of cluster limit)

FAT Variants Comparison
FAT Type	Entry Size	Max Clusters	Max Volume Size¹	Max File Size
FAT12	12 bits	4,078	32 MB (typical)	Volume size
FAT16	16 bits	65,524	2 GB	2 GB
FAT32	32 bits (28 used)	~268 million	2 TB (with 512B sectors)	≈4 GB

FAT Caching

The FAT is typically cached in memory for fast access. With 128 MB FAT, random file access requires only one disk I/O (for the data block) since chain traversal is in-memory. This makes FAT reasonably efficient despite the linked structure.

FAT Access Pattern Analysis:

For a file of N blocks, reading block k:

Without FAT caching:

Traverse k FAT entries (potentially k disk accesses if FAT not cached)
Read data block
Total: O(k) + 1 data access

With FAT cached in memory:

Traverse k FAT entries in memory: O(k) memory accesses
Read data block: 1 disk access
Total: 1 disk I/O (FAT traversal is negligible)

This is why FAT performance is reasonable for sequential access and even for random access when FAT is cached.

Indexed Allocation and inode Calculations

Indexed allocation uses an index block (inode in Unix) that contains pointers to all data blocks. This supports both sequential and random access efficiently.

Unix inode Structure (Traditional):

A typical inode contains:

File metadata (owner, permissions, timestamps, size)
Direct block pointers (e.g., 12 pointers)
Single indirect block pointer (1 pointer)
Double indirect block pointer (1 pointer)
Triple indirect block pointer (1 pointer)

Block Pointer Hierarchy:

Direct:         Points directly to data block
Single indirect: Points to block of direct pointers
Double indirect: Points to block of single indirect pointers
Triple indirect: Points to block of double indirect pointers

Maximum File Size Calculation:

Let:

B = Block size (bytes)
P = Pointer size (bytes)
n = Pointers per block = B / P
d = Number of direct pointers in inode

Max file size = (Direct) + (Single Indirect) + (Double Indirect) + (Triple Indirect)
             = d×B + n×B + n²×B + n³×B
             = B × (d + n + n² + n³)

Worked Example: Classic Unix (4 KB blocks, 4-byte pointers, 12 direct)

Parameters:

B = 4 KB = 4,096 bytes
P = 4 bytes
n = 4,096 / 4 = 1,024 pointers per block
d = 12 direct pointers

Direct:          12 × 4 KB = 48 KB
Single indirect: 1,024 × 4 KB = 4 MB
Double indirect: 1,024² × 4 KB = 4 GB
Triple indirect: 1,024³ × 4 KB = 4 TB

Maximum file size = 48 KB + 4 MB + 4 GB + 4 TB ≈ 4 TB

Practical Limits

The calculated maximum assumes the inode structure is the only limit. Real file systems have additional constraints: file size field width (32-bit limits to 4 GB), maximum blocks per file (ext2/3 limits), and filesystem specification limits.

Worked Example: Block Lookup Depth

Given the same parameters, how many disk accesses to read byte at position X?

Case 1: X = 10,000 (within direct blocks)

Block number: 10,000 / 4,096 = 2 (0-indexed: block 2)
Block 2 < 12, so direct pointer
Accesses: 1 (inode cached, read data block directly)

Case 2: X = 50,000 (within single indirect)

Block number: 50,000 / 4,096 = 12 (0-indexed)
Block 12 ≥ 12, Block 12 < 12 + 1024, so single indirect
Accesses: 1 (single indirect block) + 1 (data block) = 2 disk I/Os

Case 3: X = 5,000,000 (within double indirect)

Block number: 5,000,000 / 4,096 = 1220
Block 1220 ≥ 12 + 1024 = 1036, so double indirect
Accesses: 1 (first indirect) + 1 (second indirect) + 1 (data) = 3 disk I/Os

Case 4: Within triple indirect

Accesses: 1 + 1 + 1 + 1 = 4 disk I/Os

File Size Ranges by Indirection Level (4KB block, 4B pointers)
Level	Blocks Covered	Cumulative Range	Disk I/Os
Direct (12 ptrs)	0 - 11	0 - 48 KB	1
Single Indirect	12 - 1,035	48 KB - 4 MB	2
Double Indirect	1,036 - 1,049,611	4 MB - 4 GB	3
Triple Indirect	1,049,612 - ~10⁹	4 GB - 4 TB	4

inode_calculations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def calculate_inode_params(block_size: int, pointer_size: int, 
                            direct_count: int) -> dict:
    """Calculate inode structure parameters and maximum file size."""
    
    ptrs_per_block = block_size // pointer_size
    
    # Blocks covered by each level
    direct_blocks = direct_count
    single_blocks = ptrs_per_block
    double_blocks = ptrs_per_block ** 2
    triple_blocks = ptrs_per_block ** 3
    
    # Maximum file size
    total_blocks = direct_blocks + single_blocks + double_blocks + triple_blocks
    max_file_size = total_blocks * block_size
    
    return {
        "ptrs_per_block": ptrs_per_block,
        "direct_blocks": direct_blocks,
        "single_indirect_blocks": single_blocks,
        "double_indirect_blocks": double_blocks,
        "triple_indirect_blocks": triple_blocks,
        "total_blocks": total_blocks,
        "max_file_size_bytes": max_file_size,
        "max_file_size_human": format_size(max_file_size)
    }
 
def get_indirection_level(byte_offset: int, block_size: int, 
                          pointer_size: int, direct_count: int) -> tuple[str, int]:
    """
    Determine indirection level and disk I/Os for accessing a byte offset.
    Returns (level_name, disk_ios).
    """
    ptrs_per_block = block_size // pointer_size
    block_num = byte_offset // block_size
    
    # Direct
    if block_num < direct_count:
        return ("direct", 1)
    
    block_num -= direct_count
    
    # Single indirect
    if block_num < ptrs_per_block:
        return ("single_indirect", 2)
    
    block_num -= ptrs_per_block
    
    # Double indirect
    if block_num < ptrs_per_block ** 2:
        return ("double_indirect", 3)
    
    block_num -= ptrs_per_block ** 2
    
    # Triple indirect
    if block_num < ptrs_per_block ** 3:
        return ("triple_indirect", 4)
    
    return ("beyond_max", -1)
 
def format_size(size_bytes: int) -> str:
    """Format bytes to human readable."""
    for unit in ['B', 'KB', 'MB', 'GB', 'TB', 'PB']:
        if size_bytes < 1024:
            return f"{size_bytes:.1f} {unit}"
        size_bytes /= 1024
    return f"{size_bytes:.1f} EB"
 
# Example: Unix-style inode
result = calculate_inode_params(
    block_size=4096,      # 4 KB
    pointer_size=4,       # 4 bytes
    direct_count=12       # 12 direct pointers
)
print(f"Max file size: {result['max_file_size_human']}")
print(f"Pointers per block: {result['ptrs_per_block']}")

Extent-Based Allocation

Modern file systems (ext4, XFS, NTFS) use extents instead of individual block pointers. An extent describes a contiguous range of blocks.

Extent Structure:

Extent = (Starting block, Length in blocks)

Typically: 4+4 = 8 bytes per extent (or 4+2 = 6 bytes for smaller lengths)

Advantages:

One extent pointer can represent thousands of contiguous blocks
Fewer metadata updates for large sequential files
Better for large files and streaming workloads

Example Comparison:

100 MB contiguous file with 4 KB blocks:

Block pointer approach:

Blocks: 100 MB / 4 KB = 25,600 blocks
Pointers: 25,600 × 4 bytes = 102,400 bytes = 100 KB of metadata

Extent approach (single extent):

One extent: 8 bytes
Savings: 99.99%!

ext4 Extent Tree:

ext4 uses a B-tree-like structure for extents:

Extent Header: 12 bytes (magic, entries, max entries, depth, generation)
Extent Index (internal node): 12 bytes (logical block, physical leaf block)
Extent (leaf): 12 bytes (logical block, length, physical block)

Maximum Extents:

inode has space for 4 extents directly (60 bytes in inode). With extent tree depth of 5, theoretical max: billions of extents.

Maximum File Size in ext4:

Maximum extent length: 2^15 = 32,768 blocks (128 MB with 4 KB blocks)
Maximum extents per file: Effectively unlimited with extent tree
Maximum file size: 16 TB (limited by other factors, not extents)

Fragmentation Impact on Extents

Extent efficiency degrades with fragmentation. A perfectly contiguous file needs 1 extent. The same file fragmented into 1000 pieces needs 1000 extents. Heavy fragmentation can exhaust extent storage and hurt performance.

Allocation Strategy Comparison
Strategy	Metadata per Block	Random Access	Sequential Access	Fragmentation Sensitivity
Contiguous	None (2 values total)	O(1)	Optimal	High (external)
Linked List	1 pointer	O(n)	Good	None
FAT	1 FAT entry	O(n) or O(1) cached	Good	None
Indexed (inode)	1 pointer per block	O(1) to O(4)	Good	None
Extent-based	1 extent per range	O(log n) with tree	Excellent	Medium

Directory Structure and Path Lookup Calculations

Directories are special files containing mappings from filenames to file metadata (inode numbers or equivalent). Directory structure affects lookup performance.

Linear Directory Structure:

Simplest approach: array of (filename, inode number) entries.

Entry size (fixed-length names):

Filename: 14 bytes (classic Unix) or 255 bytes (modern)
Inode number: 2-4 bytes

Lookup: O(n) linear scan for n entries

Directory Size Calculation:

Directory size = Number of entries × Entry size

For variable-length names, entries are typically aligned to 4-byte boundaries.

Worked Example: Path Lookup Cost

Given:

Path: /home/user/documents/report.pdf
Directory structure: linear search
Average entries per directory: 100
Block size: 4 KB
Each directory fits in one block

How many disk I/Os to open the file?

Path components: / → home → user → documents → report.pdf

Read root (/) inode (assume cached or 1 I/O)
Read root directory block → search for "home" → 1 I/O
Read "home" inode → 1 I/O
Read "home" directory block → search for "user" → 1 I/O
Read "user" inode → 1 I/O
Read "user" directory block → search for "documents" → 1 I/O
Read "documents" inode → 1 I/O
Read "documents" directory block → search for "report.pdf" → 1 I/O
Read "report.pdf" inode → 1 I/O

Total: 9 disk I/Os (8 for path traversal + 1 for target inode)

With directory inode caching (common): ~4-5 disk I/Os

Directory Entry Name Cache (dcache)

Linux maintains a directory entry cache (dcache) that maps full paths to inodes. Repeated access to /home/user/documents requires only dcache lookup, not disk access. This cache is critical for performance.

B-tree Directories:

Modern filesystems (ext4 with htree, XFS) use hash trees or B-trees for large directories.

Linear vs B-tree Lookup:

Entries	Linear Lookup	B-tree Lookup
100	O(100) = 100 comparisons	O(log 100) ≈ 7 comparisons
10,000	O(10,000) = 10,000	O(log 10,000) ≈ 14
1,000,000	O(1,000,000) = 1M	O(log 1,000,000) ≈ 20

For directories with millions of files, B-tree is essential.

ext4 htree:

Uses hash of filename as key
Hash collisions resolved by linear search within bucket
Typical: 2-3 I/Os for lookup in any size directory

Disk Scheduling and Access Time Calculations

Disk I/O performance depends on mechanical seek and rotation for HDDs. Understanding access time components is essential for performance analysis.

Disk Access Time Components (HDD):

Access Time = Seek Time + Rotational Latency + Transfer Time

Seek Time: Time to move disk head to target track
- Average: ~5-10 ms (modern drives)
- Full stroke: ~15-20 ms
- Track-to-track: ~0.5-1 ms
Rotational Latency: Time for target sector to rotate under head
- Average: Half a rotation = 1 / (2 × RPM / 60) seconds
- 7200 RPM drive: 1 / (2 × 120) = 4.17 ms
- 15000 RPM drive: 1 / (2 × 250) = 2 ms
Transfer Time: Time to read/write data
- Transfer rate: 100-200 MB/s (outer tracks, modern drives)
- Time = Data size / Transfer rate

Worked Example: Sequential vs Random I/O

Given:

7200 RPM drive
Average seek: 8 ms
Transfer rate: 150 MB/s
Block size: 4 KB

Calculate: Read 100 random blocks vs 100 sequential blocks

Random I/O (100 blocks):

Each block requires full access:

Seek: 8 ms
Rotational latency: 4.17 ms
Transfer: 4 KB / 150 MB/s = 0.027 ms

Per block: 8 + 4.17 + 0.027 ≈ 12.2 ms Total: 100 × 12.2 ms = 1.22 seconds

Sequential I/O (100 blocks):

First block: full access = 12.2 ms Remaining 99 blocks: mostly just transfer (adjacent tracks, no seek)

Assume track switch every 500 blocks (short seek every few blocks):

Transfer 100 × 4 KB = 400 KB
Transfer time: 400 KB / 150 MB/s = 2.67 ms
Initial seek: 8 ms
Initial rotation: 4.17 ms

Total: 8 + 4.17 + 2.67 ≈ 15 ms

Ratio: Random / Sequential = 1220 / 15 ≈ 80×

Sequential I/O is 80× faster than random for this workload!

Disk Scheduling Algorithms Performance
Algorithm	Avg Seek Distance	Variance	Starvation Risk
FCFS	~1/3 cylinders	High	None
SSTF	Low	Medium	High (outer tracks)
SCAN (Elevator)	Medium	Low	None
C-SCAN	Medium	Very low	None
LOOK	Medium-low	Low	None

Worked Example: SCAN Scheduling

Given:

200 cylinders (0-199)
Head at cylinder 53, moving toward 0
Request queue: 98, 183, 37, 122, 14, 124, 65, 67

Calculate total head movement:

SCAN (go to 0, then reverse):

Order: 53 → 37 → 14 → 0 → 65 → 67 → 98 → 122 → 124 → 183

Movements:

53 → 37: 16
37 → 14: 23
14 → 0: 14
0 → 65: 65
65 → 67: 2
67 → 98: 31
98 → 122: 24
122 → 124: 2
124 → 183: 59

Total: 16 + 23 + 14 + 65 + 2 + 31 + 24 + 2 + 59 = 236 cylinders

Compare to FCFS: 640 cylinders (compute the jumps) Compare to SSTF: ~208 cylinders (greedy, but may starve)

SSD Differences

SSDs have no mechanical seek or rotation. Random vs sequential performance difference is much smaller (~2-4× vs 80× for HDD). Disk scheduling algorithms are mostly irrelevant for SSDs. Key SSD factors: write amplification, wear leveling, garbage collection.

Summary: File System Calculation Mastery

Essential Formulas

•Pointers per block = Block size / Pointer size
•FAT size = (Disk size / Block size) × FAT entry size
•Max file (indexed) = B × (d + n + n² + n³) for d direct, n=ptrs/block
•Disk access = Seek + Rotational latency + Transfer time
•Avg rotational latency = 0.5 / (RPM / 60) = 30/RPM seconds
•Directory lookup (linear) = O(entries)
•Path resolution I/Os ≈ 2 × path depth (inode + directory per level)

Key Insights

•Large blocks reduce metadata but increase fragmentation
•FAT32's 4 GB file limit is specification-defined, not structural
•Small files rarely use indirect blocks (most < 48 KB)
•Sequential I/O vastly outperforms random on HDDs
•Caching (FAT, dcache, inodes) is crucial for performance

Page Complete

You now have comprehensive knowledge of file system calculations for OS interviews. The next page covers system design problems—designing schedulers, file systems, virtual memory systems, and other OS components from scratch.

4 / 5

Loading learning content...

Operating SystemsInterview Problem Types

Interview Problem Types

LevelAdvanced

Duration180 mins

TopicInterview Problem Types

4 / 5

File System Calculations

File System Mathematics

What You Will Master

Block Allocation Fundamentals

Disk space is divided into fixed-size blocks (also called clusters or allocation units). All file systems manage allocation at the block granularity.

Key Definitions:

Block size: Fixed size of each allocation unit (512 B, 1 KB, 4 KB, 64 KB common)
Block address/number: Unique identifier for each block on disk
Block pointer size: Bytes needed to address any block = ⌈log₂(total blocks) / 8⌉

Fundamental Calculations:

Total blocks on disk = Disk capacity / Block size
Block pointer size = ⌈log₂(Total blocks) / 8⌉ bytes (round up)
Pointers per block = Block size / Block pointer size

Example:

Disk: 1 TB = 2^40 bytes
Block size: 4 KB = 2^12 bytes
Total blocks: 2^40 / 2^12 = 2^28 blocks
Block pointer: 28 bits → 4 bytes (32 bits, rounded up)
Pointers per block: 4 KB / 4 bytes = 1024 pointers

Common Block Configurations
Disk Size	Block Size	Total Blocks	Pointer Size	Pointers/Block
16 GB	4 KB	2²² = 4M	4 bytes	1024
1 TB	4 KB	2²⁸ = 256M	4 bytes	1024
16 TB	4 KB	2³² = 4G	4 bytes	1024
1 EB	4 KB	2⁴⁸	6 bytes	~682

Internal Fragmentation

Storage Efficiency Analysis:

For a file of size S bytes with block size B:

Blocks required = ⌈S / B⌉
Actual space used = Blocks required × B
Internal fragmentation = Actual space - S
Average fragmentation per file = B / 2

Example:

File size: 10,000 bytes
Block size: 4 KB = 4,096 bytes
Blocks needed: ⌈10,000 / 4,096⌉ = 3 blocks
Space used: 3 × 4,096 = 12,288 bytes
Fragmentation: 12,288 - 10,000 = 2,288 bytes (18.6% overhead)

Contiguous Allocation Calculations

Contiguous allocation stores each file as a contiguous sequence of blocks. It's simple and provides excellent read performance but suffers from external fragmentation.

Directory Entry:

Filename | Starting Block | Length (in blocks)

Maximum File Size:

Limited by the largest contiguous free region on disk. Theoretically:

Max file size = Disk capacity (if entirely free and unfragmented)

Practically, fragmentation limits file size severely.

Access Time Analysis:

For a file starting at block B with length L:

First block access: 1 seek + 1 rotation + 1 transfer
Sequential read of L blocks: 1 seek + 1 rotation + L transfers (fully sequential)

Contiguous allocation is optimal for sequential access patterns.

Worked Example: Contiguous Allocation Overhead

Given:

Disk: 100 GB
Block size: 4 KB
10,000 files, average size 50 MB

Questions: a) Total data storage b) Directory entry size c) Total directory size d) Fragmentation loss estimate

Solutions:

a) Total data: 10,000 × 50 MB = 500 GB (doesn't fit in 100 GB disk!)

Let's revise: 10,000 files, average 1 MB each.

a) Total data: 10,000 × 1 MB = 10 GB

b) Directory entry:

Filename: ~32 bytes (assuming fixed-length)
Start block: 4 bytes (for 100 GB / 4 KB = 25M blocks, need 25 bits → 4 bytes)
Length: 4 bytes
Total: 40 bytes per entry

c) Directory size: 10,000 × 40 bytes = 400 KB

d) Fragmentation: ~10,000 files × 2 KB average = ~20 MB (internal only, external is filesystem-dependent)

External Fragmentation Problem

Linked Allocation and FAT Calculations

Linked allocation eliminates external fragmentation by chaining blocks together. FAT (File Allocation Table) centralizes the chain pointers in a separate table.

Simple Linked Allocation:

Each block contains: [Data | Next block pointer]

Data per block = Block size - Pointer size

Problem: Random access is O(n) — must traverse chain from start.

FAT (File Allocation Table):

Centralized table with one entry per disk block:

Entry value = next block in chain, or
End-of-file marker (EOF)
Free block marker
Bad block marker

FAT Size Calculation:

FAT size = Number of blocks × Entry size
         = (Disk capacity / Block size) × Entry size

FAT Entry Size:

FAT12: 12 bits per entry (max 4096 clusters)
FAT16: 16 bits per entry (max 65,536 clusters)
FAT32: 32 bits per entry (but only 28 used, max ~268M clusters)

Worked Example: FAT32 Calculations

Given:

Disk: 1 TB
Block (cluster) size: 32 KB
FAT32: 4 bytes per FAT entry

Calculate: a) Number of clusters b) FAT size c) FAT copies (typically 2) total overhead d) Maximum file size limitation

Solutions:

a) Number of clusters: 1 TB / 32 KB = 2^40 / 2^15 = 2^25 = 33,554,432 clusters

b) FAT size: 2^25 × 4 bytes = 2^27 bytes = 128 MB

c) With 2 FAT copies: 2 × 128 MB = 256 MB overhead (0.025% of disk)

d) Maximum file size: FAT32 uses 28 bits for cluster numbers = 2^28 clusters max per chain But FAT32 specification limits file size to 2^32 - 1 bytes = ≈4 GB (independent of cluster limit)

FAT Variants Comparison
FAT Type	Entry Size	Max Clusters	Max Volume Size¹	Max File Size
FAT12	12 bits	4,078	32 MB (typical)	Volume size
FAT16	16 bits	65,524	2 GB	2 GB
FAT32	32 bits (28 used)	~268 million	2 TB (with 512B sectors)	≈4 GB

FAT Caching

FAT Access Pattern Analysis:

For a file of N blocks, reading block k:

Without FAT caching:

Traverse k FAT entries (potentially k disk accesses if FAT not cached)
Read data block
Total: O(k) + 1 data access

With FAT cached in memory:

Traverse k FAT entries in memory: O(k) memory accesses
Read data block: 1 disk access
Total: 1 disk I/O (FAT traversal is negligible)

This is why FAT performance is reasonable for sequential access and even for random access when FAT is cached.

Indexed Allocation and inode Calculations

Indexed allocation uses an index block (inode in Unix) that contains pointers to all data blocks. This supports both sequential and random access efficiently.

Unix inode Structure (Traditional):

A typical inode contains:

File metadata (owner, permissions, timestamps, size)
Direct block pointers (e.g., 12 pointers)
Single indirect block pointer (1 pointer)
Double indirect block pointer (1 pointer)
Triple indirect block pointer (1 pointer)

Block Pointer Hierarchy:

Direct:         Points directly to data block
Single indirect: Points to block of direct pointers
Double indirect: Points to block of single indirect pointers
Triple indirect: Points to block of double indirect pointers

Maximum File Size Calculation:

Let:

B = Block size (bytes)
P = Pointer size (bytes)
n = Pointers per block = B / P
d = Number of direct pointers in inode

Max file size = (Direct) + (Single Indirect) + (Double Indirect) + (Triple Indirect)
             = d×B + n×B + n²×B + n³×B
             = B × (d + n + n² + n³)

Worked Example: Classic Unix (4 KB blocks, 4-byte pointers, 12 direct)

Parameters:

B = 4 KB = 4,096 bytes
P = 4 bytes
n = 4,096 / 4 = 1,024 pointers per block
d = 12 direct pointers

Direct:          12 × 4 KB = 48 KB
Single indirect: 1,024 × 4 KB = 4 MB
Double indirect: 1,024² × 4 KB = 4 GB
Triple indirect: 1,024³ × 4 KB = 4 TB

Maximum file size = 48 KB + 4 MB + 4 GB + 4 TB ≈ 4 TB

Practical Limits

Worked Example: Block Lookup Depth

Given the same parameters, how many disk accesses to read byte at position X?

Case 1: X = 10,000 (within direct blocks)

Block number: 10,000 / 4,096 = 2 (0-indexed: block 2)
Block 2 < 12, so direct pointer
Accesses: 1 (inode cached, read data block directly)

Case 2: X = 50,000 (within single indirect)

Block number: 50,000 / 4,096 = 12 (0-indexed)
Block 12 ≥ 12, Block 12 < 12 + 1024, so single indirect
Accesses: 1 (single indirect block) + 1 (data block) = 2 disk I/Os

Case 3: X = 5,000,000 (within double indirect)

Block number: 5,000,000 / 4,096 = 1220
Block 1220 ≥ 12 + 1024 = 1036, so double indirect
Accesses: 1 (first indirect) + 1 (second indirect) + 1 (data) = 3 disk I/Os

Case 4: Within triple indirect

Accesses: 1 + 1 + 1 + 1 = 4 disk I/Os

File Size Ranges by Indirection Level (4KB block, 4B pointers)
Level	Blocks Covered	Cumulative Range	Disk I/Os
Direct (12 ptrs)	0 - 11	0 - 48 KB	1
Single Indirect	12 - 1,035	48 KB - 4 MB	2
Double Indirect	1,036 - 1,049,611	4 MB - 4 GB	3
Triple Indirect	1,049,612 - ~10⁹	4 GB - 4 TB	4

inode_calculations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def calculate_inode_params(block_size: int, pointer_size: int, 
                            direct_count: int) -> dict:
    """Calculate inode structure parameters and maximum file size."""
    
    ptrs_per_block = block_size // pointer_size
    
    # Blocks covered by each level
    direct_blocks = direct_count
    single_blocks = ptrs_per_block
    double_blocks = ptrs_per_block ** 2
    triple_blocks = ptrs_per_block ** 3
    
    # Maximum file size
    total_blocks = direct_blocks + single_blocks + double_blocks + triple_blocks
    max_file_size = total_blocks * block_size
    
    return {
        "ptrs_per_block": ptrs_per_block,
        "direct_blocks": direct_blocks,
        "single_indirect_blocks": single_blocks,
        "double_indirect_blocks": double_blocks,
        "triple_indirect_blocks": triple_blocks,
        "total_blocks": total_blocks,
        "max_file_size_bytes": max_file_size,
        "max_file_size_human": format_size(max_file_size)
    }
 
def get_indirection_level(byte_offset: int, block_size: int, 
                          pointer_size: int, direct_count: int) -> tuple[str, int]:
    """
    Determine indirection level and disk I/Os for accessing a byte offset.
    Returns (level_name, disk_ios).
    """
    ptrs_per_block = block_size // pointer_size
    block_num = byte_offset // block_size
    
    # Direct
    if block_num < direct_count:
        return ("direct", 1)
    
    block_num -= direct_count
    
    # Single indirect
    if block_num < ptrs_per_block:
        return ("single_indirect", 2)
    
    block_num -= ptrs_per_block
    
    # Double indirect
    if block_num < ptrs_per_block ** 2:
        return ("double_indirect", 3)
    
    block_num -= ptrs_per_block ** 2
    
    # Triple indirect
    if block_num < ptrs_per_block ** 3:
        return ("triple_indirect", 4)
    
    return ("beyond_max", -1)
 
def format_size(size_bytes: int) -> str:
    """Format bytes to human readable."""
    for unit in ['B', 'KB', 'MB', 'GB', 'TB', 'PB']:
        if size_bytes < 1024:
            return f"{size_bytes:.1f} {unit}"
        size_bytes /= 1024
    return f"{size_bytes:.1f} EB"
 
# Example: Unix-style inode
result = calculate_inode_params(
    block_size=4096,      # 4 KB
    pointer_size=4,       # 4 bytes
    direct_count=12       # 12 direct pointers
)
print(f"Max file size: {result['max_file_size_human']}")
print(f"Pointers per block: {result['ptrs_per_block']}")

Extent-Based Allocation

Modern file systems (ext4, XFS, NTFS) use extents instead of individual block pointers. An extent describes a contiguous range of blocks.

Extent Structure:

Extent = (Starting block, Length in blocks)

Typically: 4+4 = 8 bytes per extent (or 4+2 = 6 bytes for smaller lengths)

Advantages:

One extent pointer can represent thousands of contiguous blocks
Fewer metadata updates for large sequential files
Better for large files and streaming workloads

Example Comparison:

100 MB contiguous file with 4 KB blocks:

Block pointer approach:

Blocks: 100 MB / 4 KB = 25,600 blocks
Pointers: 25,600 × 4 bytes = 102,400 bytes = 100 KB of metadata

Extent approach (single extent):

One extent: 8 bytes
Savings: 99.99%!

ext4 Extent Tree:

ext4 uses a B-tree-like structure for extents:

Extent Header: 12 bytes (magic, entries, max entries, depth, generation)
Extent Index (internal node): 12 bytes (logical block, physical leaf block)
Extent (leaf): 12 bytes (logical block, length, physical block)

Maximum Extents:

inode has space for 4 extents directly (60 bytes in inode). With extent tree depth of 5, theoretical max: billions of extents.

Maximum File Size in ext4:

Maximum extent length: 2^15 = 32,768 blocks (128 MB with 4 KB blocks)
Maximum extents per file: Effectively unlimited with extent tree
Maximum file size: 16 TB (limited by other factors, not extents)

Fragmentation Impact on Extents

Allocation Strategy Comparison
Strategy	Metadata per Block	Random Access	Sequential Access	Fragmentation Sensitivity
Contiguous	None (2 values total)	O(1)	Optimal	High (external)
Linked List	1 pointer	O(n)	Good	None
FAT	1 FAT entry	O(n) or O(1) cached	Good	None
Indexed (inode)	1 pointer per block	O(1) to O(4)	Good	None
Extent-based	1 extent per range	O(log n) with tree	Excellent	Medium

Directory Structure and Path Lookup Calculations

Directories are special files containing mappings from filenames to file metadata (inode numbers or equivalent). Directory structure affects lookup performance.

Linear Directory Structure:

Simplest approach: array of (filename, inode number) entries.

Entry size (fixed-length names):

Filename: 14 bytes (classic Unix) or 255 bytes (modern)
Inode number: 2-4 bytes

Lookup: O(n) linear scan for n entries

Directory Size Calculation:

Directory size = Number of entries × Entry size

For variable-length names, entries are typically aligned to 4-byte boundaries.

Worked Example: Path Lookup Cost

Given:

Path: /home/user/documents/report.pdf
Directory structure: linear search
Average entries per directory: 100
Block size: 4 KB
Each directory fits in one block

How many disk I/Os to open the file?

Path components: / → home → user → documents → report.pdf

Read root (/) inode (assume cached or 1 I/O)
Read root directory block → search for "home" → 1 I/O
Read "home" inode → 1 I/O
Read "home" directory block → search for "user" → 1 I/O
Read "user" inode → 1 I/O
Read "user" directory block → search for "documents" → 1 I/O
Read "documents" inode → 1 I/O
Read "documents" directory block → search for "report.pdf" → 1 I/O
Read "report.pdf" inode → 1 I/O

Total: 9 disk I/Os (8 for path traversal + 1 for target inode)

With directory inode caching (common): ~4-5 disk I/Os

Directory Entry Name Cache (dcache)

B-tree Directories:

Modern filesystems (ext4 with htree, XFS) use hash trees or B-trees for large directories.

Linear vs B-tree Lookup:

Entries	Linear Lookup	B-tree Lookup
100	O(100) = 100 comparisons	O(log 100) ≈ 7 comparisons
10,000	O(10,000) = 10,000	O(log 10,000) ≈ 14
1,000,000	O(1,000,000) = 1M	O(log 1,000,000) ≈ 20

For directories with millions of files, B-tree is essential.

ext4 htree:

Uses hash of filename as key
Hash collisions resolved by linear search within bucket
Typical: 2-3 I/Os for lookup in any size directory

Disk Scheduling and Access Time Calculations

Disk I/O performance depends on mechanical seek and rotation for HDDs. Understanding access time components is essential for performance analysis.

Disk Access Time Components (HDD):

Access Time = Seek Time + Rotational Latency + Transfer Time

Seek Time: Time to move disk head to target track
- Average: ~5-10 ms (modern drives)
- Full stroke: ~15-20 ms
- Track-to-track: ~0.5-1 ms
Rotational Latency: Time for target sector to rotate under head
- Average: Half a rotation = 1 / (2 × RPM / 60) seconds
- 7200 RPM drive: 1 / (2 × 120) = 4.17 ms
- 15000 RPM drive: 1 / (2 × 250) = 2 ms
Transfer Time: Time to read/write data
- Transfer rate: 100-200 MB/s (outer tracks, modern drives)
- Time = Data size / Transfer rate

Worked Example: Sequential vs Random I/O

Given:

7200 RPM drive
Average seek: 8 ms
Transfer rate: 150 MB/s
Block size: 4 KB

Calculate: Read 100 random blocks vs 100 sequential blocks

Random I/O (100 blocks):

Each block requires full access:

Seek: 8 ms
Rotational latency: 4.17 ms
Transfer: 4 KB / 150 MB/s = 0.027 ms

Per block: 8 + 4.17 + 0.027 ≈ 12.2 ms Total: 100 × 12.2 ms = 1.22 seconds

Sequential I/O (100 blocks):

First block: full access = 12.2 ms Remaining 99 blocks: mostly just transfer (adjacent tracks, no seek)

Assume track switch every 500 blocks (short seek every few blocks):

Transfer 100 × 4 KB = 400 KB
Transfer time: 400 KB / 150 MB/s = 2.67 ms
Initial seek: 8 ms
Initial rotation: 4.17 ms

Total: 8 + 4.17 + 2.67 ≈ 15 ms

Ratio: Random / Sequential = 1220 / 15 ≈ 80×

Sequential I/O is 80× faster than random for this workload!

Disk Scheduling Algorithms Performance
Algorithm	Avg Seek Distance	Variance	Starvation Risk
FCFS	~1/3 cylinders	High	None
SSTF	Low	Medium	High (outer tracks)
SCAN (Elevator)	Medium	Low	None
C-SCAN	Medium	Very low	None
LOOK	Medium-low	Low	None

Worked Example: SCAN Scheduling

Given:

200 cylinders (0-199)
Head at cylinder 53, moving toward 0
Request queue: 98, 183, 37, 122, 14, 124, 65, 67

Calculate total head movement:

SCAN (go to 0, then reverse):

Order: 53 → 37 → 14 → 0 → 65 → 67 → 98 → 122 → 124 → 183

Movements:

53 → 37: 16
37 → 14: 23
14 → 0: 14
0 → 65: 65
65 → 67: 2
67 → 98: 31
98 → 122: 24
122 → 124: 2
124 → 183: 59

Total: 16 + 23 + 14 + 65 + 2 + 31 + 24 + 2 + 59 = 236 cylinders

Compare to FCFS: 640 cylinders (compute the jumps) Compare to SSTF: ~208 cylinders (greedy, but may starve)

SSD Differences

Summary: File System Calculation Mastery

Essential Formulas

•Pointers per block = Block size / Pointer size
•FAT size = (Disk size / Block size) × FAT entry size
•Max file (indexed) = B × (d + n + n² + n³) for d direct, n=ptrs/block
•Disk access = Seek + Rotational latency + Transfer time
•Avg rotational latency = 0.5 / (RPM / 60) = 30/RPM seconds
•Directory lookup (linear) = O(entries)
•Path resolution I/Os ≈ 2 × path depth (inode + directory per level)

Key Insights

•Large blocks reduce metadata but increase fragmentation
•FAT32's 4 GB file limit is specification-defined, not structural
•Small files rarely use indirect blocks (most < 48 KB)
•Sequential I/O vastly outperforms random on HDDs
•Caching (FAT, dcache, inodes) is crucial for performance

Page Complete

4 / 5