Loading learning content...
File system calculations test your understanding of how files and directories are organized on disk. These problems involve computing maximum file sizes, block allocation overhead, FAT table dimensions, inode requirements, and access patterns for different file operations.
Unlike memory calculations which deal with fixed-size pages, file systems introduce variable-size files, indirect block hierarchies, and the interplay between metadata and data storage. This adds layers of complexity that require careful accounting.
These problems reward systematic thinking. Each file system structure (FAT, indexed allocation, extent-based) has its own set of formulas. Mastering all three prepares you for any file system question.
By the end of this page, you will: (1) Calculate maximum file sizes for indexed allocation with direct/indirect blocks, (2) Compute FAT table size and overhead, (3) Determine inode requirements and storage efficiency, (4) Analyze disk access patterns for file operations, (5) Evaluate tradeoffs between allocation strategies.
Disk space is divided into fixed-size blocks (also called clusters or allocation units). All file systems manage allocation at the block granularity.
Key Definitions:
Fundamental Calculations:
Total blocks on disk = Disk capacity / Block size
Block pointer size = ⌈log₂(Total blocks) / 8⌉ bytes (round up)
Pointers per block = Block size / Block pointer size
Example:
| Disk Size | Block Size | Total Blocks | Pointer Size | Pointers/Block |
|---|---|---|---|---|
| 16 GB | 4 KB | 2²² = 4M | 4 bytes | 1024 |
| 1 TB | 4 KB | 2²⁸ = 256M | 4 bytes | 1024 |
| 16 TB | 4 KB | 2³² = 4G | 4 bytes | 1024 |
| 1 EB | 4 KB | 2⁴⁸ | 6 bytes | ~682 |
Files rarely fill their last block exactly. On average, each file wastes half a block. With 4 KB blocks and 1000 files, expect ~2 MB wasted to internal fragmentation. Small block sizes reduce fragmentation but increase metadata overhead.
Storage Efficiency Analysis:
For a file of size S bytes with block size B:
Blocks required = ⌈S / B⌉
Actual space used = Blocks required × B
Internal fragmentation = Actual space - S
Average fragmentation per file = B / 2
Example:
Contiguous allocation stores each file as a contiguous sequence of blocks. It's simple and provides excellent read performance but suffers from external fragmentation.
Directory Entry:
Filename | Starting Block | Length (in blocks)
Maximum File Size:
Limited by the largest contiguous free region on disk. Theoretically:
Max file size = Disk capacity (if entirely free and unfragmented)
Practically, fragmentation limits file size severely.
Access Time Analysis:
For a file starting at block B with length L:
Contiguous allocation is optimal for sequential access patterns.
Worked Example: Contiguous Allocation Overhead
Given:
Questions: a) Total data storage b) Directory entry size c) Total directory size d) Fragmentation loss estimate
Solutions:
a) Total data: 10,000 × 50 MB = 500 GB (doesn't fit in 100 GB disk!)
Let's revise: 10,000 files, average 1 MB each.
a) Total data: 10,000 × 1 MB = 10 GB
b) Directory entry:
c) Directory size: 10,000 × 40 bytes = 400 KB
d) Fragmentation: ~10,000 files × 2 KB average = ~20 MB (internal only, external is filesystem-dependent)
After many file creations and deletions, disk becomes fragmented with many small holes. A 10 MB file might not fit even with 50 MB total free space if no contiguous 10 MB region exists. This is why modern file systems don't use pure contiguous allocation.
Linked allocation eliminates external fragmentation by chaining blocks together. FAT (File Allocation Table) centralizes the chain pointers in a separate table.
Simple Linked Allocation:
Each block contains: [Data | Next block pointer]
Data per block = Block size - Pointer size
Problem: Random access is O(n) — must traverse chain from start.
FAT (File Allocation Table):
Centralized table with one entry per disk block:
FAT Size Calculation:
FAT size = Number of blocks × Entry size
= (Disk capacity / Block size) × Entry size
FAT Entry Size:
Worked Example: FAT32 Calculations
Given:
Calculate: a) Number of clusters b) FAT size c) FAT copies (typically 2) total overhead d) Maximum file size limitation
Solutions:
a) Number of clusters: 1 TB / 32 KB = 2^40 / 2^15 = 2^25 = 33,554,432 clusters
b) FAT size: 2^25 × 4 bytes = 2^27 bytes = 128 MB
c) With 2 FAT copies: 2 × 128 MB = 256 MB overhead (0.025% of disk)
d) Maximum file size: FAT32 uses 28 bits for cluster numbers = 2^28 clusters max per chain But FAT32 specification limits file size to 2^32 - 1 bytes = ≈4 GB (independent of cluster limit)
| FAT Type | Entry Size | Max Clusters | Max Volume Size¹ | Max File Size |
|---|---|---|---|---|
| FAT12 | 12 bits | 4,078 | 32 MB (typical) | Volume size |
| FAT16 | 16 bits | 65,524 | 2 GB | 2 GB |
| FAT32 | 32 bits (28 used) | ~268 million | 2 TB (with 512B sectors) | ≈4 GB |
The FAT is typically cached in memory for fast access. With 128 MB FAT, random file access requires only one disk I/O (for the data block) since chain traversal is in-memory. This makes FAT reasonably efficient despite the linked structure.
FAT Access Pattern Analysis:
For a file of N blocks, reading block k:
Without FAT caching:
With FAT cached in memory:
This is why FAT performance is reasonable for sequential access and even for random access when FAT is cached.
Indexed allocation uses an index block (inode in Unix) that contains pointers to all data blocks. This supports both sequential and random access efficiently.
Unix inode Structure (Traditional):
A typical inode contains:
Block Pointer Hierarchy:
Direct: Points directly to data block
Single indirect: Points to block of direct pointers
Double indirect: Points to block of single indirect pointers
Triple indirect: Points to block of double indirect pointers
Maximum File Size Calculation:
Let:
Max file size = (Direct) + (Single Indirect) + (Double Indirect) + (Triple Indirect)
= d×B + n×B + n²×B + n³×B
= B × (d + n + n² + n³)
Worked Example: Classic Unix (4 KB blocks, 4-byte pointers, 12 direct)
Parameters:
Direct: 12 × 4 KB = 48 KB
Single indirect: 1,024 × 4 KB = 4 MB
Double indirect: 1,024² × 4 KB = 4 GB
Triple indirect: 1,024³ × 4 KB = 4 TB
Maximum file size = 48 KB + 4 MB + 4 GB + 4 TB ≈ 4 TB
The calculated maximum assumes the inode structure is the only limit. Real file systems have additional constraints: file size field width (32-bit limits to 4 GB), maximum blocks per file (ext2/3 limits), and filesystem specification limits.
Worked Example: Block Lookup Depth
Given the same parameters, how many disk accesses to read byte at position X?
Case 1: X = 10,000 (within direct blocks)
Case 2: X = 50,000 (within single indirect)
Case 3: X = 5,000,000 (within double indirect)
Case 4: Within triple indirect
| Level | Blocks Covered | Cumulative Range | Disk I/Os |
|---|---|---|---|
| Direct (12 ptrs) | 0 - 11 | 0 - 48 KB | 1 |
| Single Indirect | 12 - 1,035 | 48 KB - 4 MB | 2 |
| Double Indirect | 1,036 - 1,049,611 | 4 MB - 4 GB | 3 |
| Triple Indirect | 1,049,612 - ~10⁹ | 4 GB - 4 TB | 4 |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
def calculate_inode_params(block_size: int, pointer_size: int, direct_count: int) -> dict: """Calculate inode structure parameters and maximum file size.""" ptrs_per_block = block_size // pointer_size # Blocks covered by each level direct_blocks = direct_count single_blocks = ptrs_per_block double_blocks = ptrs_per_block ** 2 triple_blocks = ptrs_per_block ** 3 # Maximum file size total_blocks = direct_blocks + single_blocks + double_blocks + triple_blocks max_file_size = total_blocks * block_size return { "ptrs_per_block": ptrs_per_block, "direct_blocks": direct_blocks, "single_indirect_blocks": single_blocks, "double_indirect_blocks": double_blocks, "triple_indirect_blocks": triple_blocks, "total_blocks": total_blocks, "max_file_size_bytes": max_file_size, "max_file_size_human": format_size(max_file_size) } def get_indirection_level(byte_offset: int, block_size: int, pointer_size: int, direct_count: int) -> tuple[str, int]: """ Determine indirection level and disk I/Os for accessing a byte offset. Returns (level_name, disk_ios). """ ptrs_per_block = block_size // pointer_size block_num = byte_offset // block_size # Direct if block_num < direct_count: return ("direct", 1) block_num -= direct_count # Single indirect if block_num < ptrs_per_block: return ("single_indirect", 2) block_num -= ptrs_per_block # Double indirect if block_num < ptrs_per_block ** 2: return ("double_indirect", 3) block_num -= ptrs_per_block ** 2 # Triple indirect if block_num < ptrs_per_block ** 3: return ("triple_indirect", 4) return ("beyond_max", -1) def format_size(size_bytes: int) -> str: """Format bytes to human readable.""" for unit in ['B', 'KB', 'MB', 'GB', 'TB', 'PB']: if size_bytes < 1024: return f"{size_bytes:.1f} {unit}" size_bytes /= 1024 return f"{size_bytes:.1f} EB" # Example: Unix-style inoderesult = calculate_inode_params( block_size=4096, # 4 KB pointer_size=4, # 4 bytes direct_count=12 # 12 direct pointers)print(f"Max file size: {result['max_file_size_human']}")print(f"Pointers per block: {result['ptrs_per_block']}")Modern file systems (ext4, XFS, NTFS) use extents instead of individual block pointers. An extent describes a contiguous range of blocks.
Extent Structure:
Extent = (Starting block, Length in blocks)
Typically: 4+4 = 8 bytes per extent (or 4+2 = 6 bytes for smaller lengths)
Advantages:
Example Comparison:
100 MB contiguous file with 4 KB blocks:
Block pointer approach:
Extent approach (single extent):
ext4 Extent Tree:
ext4 uses a B-tree-like structure for extents:
Maximum Extents:
inode has space for 4 extents directly (60 bytes in inode). With extent tree depth of 5, theoretical max: billions of extents.
Maximum File Size in ext4:
Extent efficiency degrades with fragmentation. A perfectly contiguous file needs 1 extent. The same file fragmented into 1000 pieces needs 1000 extents. Heavy fragmentation can exhaust extent storage and hurt performance.
| Strategy | Metadata per Block | Random Access | Sequential Access | Fragmentation Sensitivity |
|---|---|---|---|---|
| Contiguous | None (2 values total) | O(1) | Optimal | High (external) |
| Linked List | 1 pointer | O(n) | Good | None |
| FAT | 1 FAT entry | O(n) or O(1) cached | Good | None |
| Indexed (inode) | 1 pointer per block | O(1) to O(4) | Good | None |
| Extent-based | 1 extent per range | O(log n) with tree | Excellent | Medium |
Directories are special files containing mappings from filenames to file metadata (inode numbers or equivalent). Directory structure affects lookup performance.
Linear Directory Structure:
Simplest approach: array of (filename, inode number) entries.
Entry size (fixed-length names):
Lookup: O(n) linear scan for n entries
Directory Size Calculation:
Directory size = Number of entries × Entry size
For variable-length names, entries are typically aligned to 4-byte boundaries.
Worked Example: Path Lookup Cost
Given:
How many disk I/Os to open the file?
Path components: / → home → user → documents → report.pdf
Total: 9 disk I/Os (8 for path traversal + 1 for target inode)
With directory inode caching (common): ~4-5 disk I/Os
Linux maintains a directory entry cache (dcache) that maps full paths to inodes. Repeated access to /home/user/documents requires only dcache lookup, not disk access. This cache is critical for performance.
B-tree Directories:
Modern filesystems (ext4 with htree, XFS) use hash trees or B-trees for large directories.
Linear vs B-tree Lookup:
| Entries | Linear Lookup | B-tree Lookup |
|---|---|---|
| 100 | O(100) = 100 comparisons | O(log 100) ≈ 7 comparisons |
| 10,000 | O(10,000) = 10,000 | O(log 10,000) ≈ 14 |
| 1,000,000 | O(1,000,000) = 1M | O(log 1,000,000) ≈ 20 |
For directories with millions of files, B-tree is essential.
ext4 htree:
Disk I/O performance depends on mechanical seek and rotation for HDDs. Understanding access time components is essential for performance analysis.
Disk Access Time Components (HDD):
Access Time = Seek Time + Rotational Latency + Transfer Time
Seek Time: Time to move disk head to target track
Rotational Latency: Time for target sector to rotate under head
Transfer Time: Time to read/write data
Worked Example: Sequential vs Random I/O
Given:
Calculate: Read 100 random blocks vs 100 sequential blocks
Random I/O (100 blocks):
Each block requires full access:
Per block: 8 + 4.17 + 0.027 ≈ 12.2 ms Total: 100 × 12.2 ms = 1.22 seconds
Sequential I/O (100 blocks):
First block: full access = 12.2 ms Remaining 99 blocks: mostly just transfer (adjacent tracks, no seek)
Assume track switch every 500 blocks (short seek every few blocks):
Total: 8 + 4.17 + 2.67 ≈ 15 ms
Ratio: Random / Sequential = 1220 / 15 ≈ 80×
Sequential I/O is 80× faster than random for this workload!
| Algorithm | Avg Seek Distance | Variance | Starvation Risk |
|---|---|---|---|
| FCFS | ~1/3 cylinders | High | None |
| SSTF | Low | Medium | High (outer tracks) |
| SCAN (Elevator) | Medium | Low | None |
| C-SCAN | Medium | Very low | None |
| LOOK | Medium-low | Low | None |
Worked Example: SCAN Scheduling
Given:
Calculate total head movement:
SCAN (go to 0, then reverse):
Order: 53 → 37 → 14 → 0 → 65 → 67 → 98 → 122 → 124 → 183
Movements:
Total: 16 + 23 + 14 + 65 + 2 + 31 + 24 + 2 + 59 = 236 cylinders
Compare to FCFS: 640 cylinders (compute the jumps) Compare to SSTF: ~208 cylinders (greedy, but may starve)
SSDs have no mechanical seek or rotation. Random vs sequential performance difference is much smaller (~2-4× vs 80× for HDD). Disk scheduling algorithms are mostly irrelevant for SSDs. Key SSD factors: write amplification, wear leveling, garbage collection.
You now have comprehensive knowledge of file system calculations for OS interviews. The next page covers system design problems—designing schedulers, file systems, virtual memory systems, and other OS components from scratch.