Loading content...
Traditional free space management techniques—bitmaps, linked lists, counting—all share a fundamental vulnerability: they're modified in place. Every allocation or deallocation changes the existing data structure on disk. This creates a window where a crash can leave the free space metadata in an inconsistent state, potentially causing data loss or corruption.
Space maps, pioneered by the ZFS file system (Sun Microsystems, 2005), take a radically different approach: they treat free space changes as a log of operations rather than immediate updates to a bitmap or tree. Combined with ZFS's copy-on-write (COW) architecture, space maps achieve:
Space maps represent the state-of-the-art in free space management, synthesizing lessons from decades of file system development into an elegant, unified design. Understanding them reveals both the pinnacle of traditional techniques and the paradigm shift enabled by copy-on-write file systems.
By the end of this page, you will understand the copy-on-write foundation that enables space maps, how space maps log allocations and frees rather than updating data structures, the condensing process that converts logs into sorted trees, metaslab allocation for scalable management, crash consistency guarantees, and how space maps compare to traditional approaches.
Before diving into space maps specifically, we must understand the copy-on-write (COW) architecture that enables them.
Traditional (In-Place) Updates:
Before write:
Block 100: [old data]
During write:
Block 100: [partial new data] ← CRASH HERE = CORRUPTION!
After write:
Block 100: [new data]
If the system crashes mid-write, block 100 contains neither old data nor valid new data—it's corrupted.
Copy-on-Write Updates:
Before write:
Block 100: [old data]
During write:
Block 100: [old data] (unchanged)
Block 200: [new data] (write to NEW location)
Commit (atomic pointer update):
Parent now points to 200 instead of 100
After commit:
Block 100: [old data] (now free)
Block 200: [new data]
Key insight: The old data is never modified. We write new data to a fresh location, then atomically update the pointer. A crash at any point either leaves the old state intact (if before atomic commit) or the new state intact (if after commit).
How This Applies to Free Space:
With COW, we can't update a bitmap in place—that would violate the COW principle. Instead, ZFS logs free space changes as part of the transaction, and those changes become durable only when the entire transaction commits.
COW trades in-place updates for additional writes (writing to new locations plus updating parent pointers). However, this enables perfect crash consistency, instant snapshots (old blocks aren't freed while referenced by snapshots), and checksummed self-healing data structures. For ZFS, these benefits far outweigh the extra writes.
Transaction Group Model:
ZFS batches operations into transaction groups (TXGs). Every 5-30 seconds (configurable), a TXG commits:
Space maps exist within this transaction model—changes to free space are recorded as part of the TXG commit process, ensuring they're always consistent with the rest of the pool state.
A space map is a log-structured data structure that records free space changes over time. Rather than maintaining a current snapshot of free/allocated status, it records the history of allocations and frees.
Space Map Entry Format:
Each entry describes an operation (allocate or free) on a range of blocks:
┌────────────────────────────────────────────────────────────┐
│ SPACE MAP ENTRY (64 bits) │
├────────┬────────────────────────────────────────┬──────────┤
│ Type │ Offset (Block Number) │ Run │
│ A/F │ (variable bits) │ Length │
│ 1 bit │ (typically 47 bits) │ (16 bits)│
└────────┴────────────────────────────────────────┴──────────┘
Type: 0 = ALLOCATE, 1 = FREE
Offset: Starting block number of the range
Run Length: Number of contiguous blocks in this operation
Example Space Map Log:
Transaction 1: Create file, allocate blocks 1000-1099
Entry: [ALLOC, 1000, 100]
Transaction 2: Delete file, free blocks 1000-1099
Entry: [FREE, 1000, 100]
Transaction 3: Create new file, allocate blocks 500-549
Entry: [ALLOC, 500, 50]
Space Map after 3 transactions:
[ALLOC, 1000, 100]
[FREE, 1000, 100]
[ALLOC, 500, 50]
Key Observation: The log contains redundant information! The first two entries cancel out. Processing the log produces the actual free space state, but accumulating entries forever would be wasteful.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
#include <stdint.h>#include <stdbool.h> /** * Space Map Entry encoding * * ZFS uses variable-width encoding for efficiency. * Simplified here as fixed 64-bit entries. */ #define SM_ALLOC 0#define SM_FREE 1 typedef struct __attribute__((packed)) { uint64_t type : 1; // ALLOC (0) or FREE (1) uint64_t offset : 47; // Starting block (supports 128 PB at 1K blocks) uint64_t run : 16; // Length (up to 65535 blocks per entry)} SpaceMapEntry; /** * Encode an allocation/free operation */SpaceMapEntry sm_encode(bool is_free, uint64_t start, uint16_t count) { SpaceMapEntry entry; entry.type = is_free ? SM_FREE : SM_ALLOC; entry.offset = start; entry.run = count; return entry;} /** * Space Map on-disk structure (simplified) * In reality, ZFS uses a more complex block pointer structure. */#define SM_BLOCK_ENTRIES (4096 / sizeof(SpaceMapEntry)) // ~512 entries per 4KB block typedef struct { SpaceMapEntry entries[SM_BLOCK_ENTRIES];} SpaceMapBlock; typedef struct { uint64_t object_id; // ZFS object identifier uint64_t block_count; // Number of blocks in this space map uint64_t entry_count; // Total entries across all blocks uint64_t allocated_space; // Computed: total allocated bytes uint64_t free_space; // Computed: total free bytes // Block pointers follow...} SpaceMapHeader; /** * Space Map in-memory representation * After loading and processing, entries are converted to range trees */typedef struct { // In-memory range tree of free segments void *free_tree; // AVL tree of (offset, length) pairs // Pending operations not yet written to disk SpaceMapEntry *pending; uint32_t pending_count; uint32_t pending_capacity; // Statistics uint64_t alloc_count; uint64_t free_count; uint64_t histogram[64]; // Distribution of free segment sizes} SpaceMapLoaded;Real ZFS space maps use variable-width encoding: small runs (common case) use fewer bits, while large runs use more. This achieves better compression than the fixed 64-bit format shown. Additionally, runs of consecutive operations can be combined into single debug entries, further reducing size.
Space maps support three primary operations: recording allocations, recording frees, and loading (processing the log to determine current state).
Recording Allocations/Frees (O(1)):
When blocks are allocated or freed, the operation is simply appended to the in-memory pending list:
void sm_record(SpaceMapLoaded *sm, bool is_free,
uint64_t offset, uint32_t size) {
// Just append to pending list - O(1)
sm->pending[sm->pending_count++] = sm_encode(is_free, offset, size);
// Update statistics
if (is_free) {
sm->free_count++;
} else {
sm->alloc_count++;
}
}
No tree traversal, no bitmap updates—just an append. This is why allocations are so fast in ZFS.
Syncing to Disk:
At TXG commit time, pending entries are written as a new block appended to the space map object:
Loading (Processing the Log):
When mounting a pool or accessing a space map for the first time, the log must be "replayed" to determine current state:
for each entry in space_map_log:
if entry.type == ALLOC:
remove (entry.offset, entry.run) from free_tree
else: // FREE
add (entry.offset, entry.run) to free_tree
After processing, free_tree contains the current free space as a sorted tree of (offset, length) pairs—essentially the "counting" representation from the previous page.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
/** * Space Map Operations */ #include "avl_tree.h" // ZFS uses AVL trees for range management typedef struct { avl_node_t node; // AVL tree linkage uint64_t offset; // Starting offset uint64_t size; // Size in bytes/blocks} range_seg_t; /** * Load space map from disk and build in-memory free tree * Called once when metaslab is first accessed */int sm_load(SpaceMapLoaded *sm, SpaceMapHeader *header, void *disk_handle) { // Initialize empty AVL tree avl_tree_t *free_tree = avl_create(range_compare, NULL, NULL); // Process each block of the space map log for (uint64_t blk = 0; blk < header->block_count; blk++) { SpaceMapBlock block; disk_read(disk_handle, header->block_pointers[blk], &block, sizeof(block)); for (uint32_t i = 0; i < SM_BLOCK_ENTRIES; i++) { SpaceMapEntry *e = &block.entries[i]; // Check for end marker if (e->offset == 0 && e->run == 0 && e->type == 0) { break; } if (e->type == SM_ALLOC) { // Remove range from free tree range_tree_remove(free_tree, e->offset, e->run); } else { // Add range to free tree (with coalescing) range_tree_add(free_tree, e->offset, e->run); } } } sm->free_tree = free_tree; return 0;} /** * Range tree operations with automatic coalescing */void range_tree_add(avl_tree_t *tree, uint64_t offset, uint64_t size) { uint64_t end = offset + size; // Find adjacent ranges range_seg_t *left = avl_find_predecessor(tree, offset); range_seg_t *right = avl_find_successor(tree, offset); bool merge_left = (left && left->offset + left->size == offset); bool merge_right = (right && right->offset == end); if (merge_left && merge_right) { // Three-way merge left->size += size + right->size; avl_remove(tree, right); free(right); } else if (merge_left) { left->size += size; } else if (merge_right) { right->offset = offset; right->size += size; } else { // Insert new range range_seg_t *new_seg = malloc(sizeof(range_seg_t)); new_seg->offset = offset; new_seg->size = size; avl_insert(tree, new_seg); }} void range_tree_remove(avl_tree_t *tree, uint64_t offset, uint64_t size) { uint64_t end = offset + size; range_seg_t *seg = avl_find_containing(tree, offset); if (!seg) { // Range not in tree - might be partial overlap // Handle edge cases... return; } uint64_t seg_end = seg->offset + seg->size; if (seg->offset == offset && seg->size == size) { // Exact match - remove entirely avl_remove(tree, seg); free(seg); } else if (seg->offset == offset) { // Remove from start seg->offset = end; seg->size -= size; } else if (seg_end == end) { // Remove from end seg->size -= size; } else { // Split in middle range_seg_t *right = malloc(sizeof(range_seg_t)); right->offset = end; right->size = seg_end - end; avl_insert(tree, right); seg->size = offset - seg->offset; }} /** * Allocate from space map (using loaded free tree) */uint64_t sm_alloc(SpaceMapLoaded *sm, uint64_t size) { // Find suitable range in free tree (first-fit, best-fit, etc.) range_seg_t *seg = find_suitable_range(sm->free_tree, size); if (!seg) { return 0; // ENOSPC } uint64_t offset = seg->offset; // Remove allocated portion from tree range_tree_remove(sm->free_tree, offset, size); // Record in pending log - O(1) sm_record(sm, false, offset, size); return offset;}If we only appended entries to the space map log, it would grow without bound. Condensing is the process of periodically rewriting the space map to eliminate redundant entries.
When Condensing is Needed:
Condensing Algorithm:
1. Load current space map (process log to build range tree)
2. Walk range tree to enumerate all free ranges
3. Write new space map with single FREE entry per range
4. Update space map header to point to new blocks
5. Free old space map blocks
Before Condensing (10,000 entries):
[ALLOC 100, 50]
[FREE 100, 50]
[ALLOC 200, 100]
[ALLOC 500, 25]
[FREE 500, 25]
[ALLOC 300, 50]
[FREE 200, 100]
... (thousands more entries)
After Condensing (50 entries):
[FREE 0, 100] // Blocks 0-99 free
[FREE 250, 50] // Blocks 250-299 free
[FREE 350, 150] // Blocks 350-499 free
... (other free ranges)
The condensed form is equivalent to the "counting" representation—just free extents, no history.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
/** * Space Map Condensing * * Rewrites space map as minimal set of FREE entries, * eliminating all historical ALLOC/FREE pairs. */ typedef struct { uint64_t entries_before; uint64_t entries_after; uint64_t blocks_before; uint64_t blocks_after; uint64_t time_ms;} CondenseStats; /** * Condense a space map to minimal representation */CondenseStats sm_condense(SpaceMapLoaded *sm, SpaceMapHeader *header, void *disk_handle) { CondenseStats stats = {0}; uint64_t start_time = get_timestamp_ms(); stats.entries_before = header->entry_count; stats.blocks_before = header->block_count; // Step 1: Ensure space map is loaded if (!sm->free_tree) { sm_load(sm, header, disk_handle); } // Step 2: Count free ranges uint64_t range_count = avl_numnodes(sm->free_tree); // Step 3: Allocate new space map blocks uint64_t blocks_needed = (range_count + SM_BLOCK_ENTRIES - 1) / SM_BLOCK_ENTRIES; uint64_t *new_block_ptrs = malloc(blocks_needed * sizeof(uint64_t)); // Step 4: Write condensed entries avl_iterator_t iter; avl_iter_init(&iter, sm->free_tree); uint64_t entry_idx = 0; SpaceMapBlock current_block; memset(¤t_block, 0, sizeof(current_block)); range_seg_t *seg; while ((seg = avl_iter_next(&iter)) != NULL) { uint64_t block_idx = entry_idx / SM_BLOCK_ENTRIES; uint64_t local_idx = entry_idx % SM_BLOCK_ENTRIES; // Write full block if needed if (local_idx == 0 && entry_idx > 0) { new_block_ptrs[block_idx - 1] = allocate_new_block(disk_handle); disk_write(disk_handle, new_block_ptrs[block_idx - 1], ¤t_block, sizeof(current_block)); memset(¤t_block, 0, sizeof(current_block)); } // Create FREE entry for this range current_block.entries[local_idx] = sm_encode(true, seg->offset, seg->size); entry_idx++; } // Write final partial block if (entry_idx > 0) { uint64_t last_block = (entry_idx - 1) / SM_BLOCK_ENTRIES; new_block_ptrs[last_block] = allocate_new_block(disk_handle); disk_write(disk_handle, new_block_ptrs[last_block], ¤t_block, sizeof(current_block)); } // Step 5: Update header (atomically via TXG) uint64_t *old_blocks = header->block_pointers; uint64_t old_count = header->block_count; header->block_pointers = new_block_ptrs; header->block_count = blocks_needed; header->entry_count = range_count; // Step 6: Mark old blocks for freeing (after TXG commit) for (uint64_t i = 0; i < old_count; i++) { schedule_free(old_blocks[i]); } stats.entries_after = range_count; stats.blocks_after = blocks_needed; stats.time_ms = get_timestamp_ms() - start_time; return stats;} /** * Check if condensing is worthwhile */bool sm_should_condense(SpaceMapLoaded *sm, SpaceMapHeader *header) { // Condense if: // 1. Entry count is much larger than range count uint64_t expected_ranges = avl_numnodes(sm->free_tree); if (header->entry_count > expected_ranges * 10) { return true; } // 2. Space map exceeds size threshold (e.g., 1MB) if (header->block_count * 4096 > 1024 * 1024) { return true; } // 3. Alloc/free ratio indicates many canceling pairs uint64_t cancel_ratio = min(sm->alloc_count, sm->free_count); if (cancel_ratio > header->entry_count / 2) { return true; } return false;}Condensing is essentially converting from log form to counting form, then back to log form (with only FREE entries). It's similar to LSM-tree compaction in databases: accept write amplification periodically to maintain read performance. ZFS triggers condensing based on space map size and fragmentation metrics, balancing immediate write cost against future load time.
Space maps don't manage an entire vdev (virtual device) directly. Instead, ZFS divides each vdev into metaslabs—self-contained allocation units, each with its own space map.
Metaslab Structure:
┌─────────────────────────────────────────────────────────────┐
│ VDEV (e.g., 1TB disk) │
├───────────────┬───────────────┬───────────────┬─────────────┤
│ Metaslab 0 │ Metaslab 1 │ Metaslab 2 │ ... │
│ (128 MB) │ (128 MB) │ (128 MB) │ │
├───────────────┼───────────────┼───────────────┼─────────────┤
│ ┌───────────┐ │ ┌───────────┐ │ ┌───────────┐ │ │
│ │Space Map 0│ │ │Space Map 1│ │ │Space Map 2│ │ │
│ └───────────┘ │ └───────────┘ │ └───────────┘ │ │
│ ┌───────────┐ │ ┌───────────┐ │ ┌───────────┐ │ │
│ │Data Blocks│ │ │Data Blocks│ │ │Data Blocks│ │ │
│ └───────────┘ │ └───────────┘ │ └───────────┘ │ │
└───────────────┴───────────────┴───────────────┴─────────────┘
Why Metaslabs?
Metaslab Selection:
When allocating, ZFS must choose which metaslab to use:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
/** * Metaslab Allocator * * ZFS allocates from metaslabs, not directly from vdevs. * Each metaslab has its own space map for tracking. */ typedef struct { uint64_t ms_id; // Metaslab identifier uint64_t ms_start; // Starting offset in vdev uint64_t ms_size; // Total size uint64_t ms_free; // Current free space SpaceMapLoaded *ms_sm; // Space map (NULL if not loaded) bool ms_loaded; // Whether space map is loaded bool ms_loading; // Loading in progress // Allocation state uint64_t ms_weight; // Selection priority (based on free space) uint64_t ms_activation_count; // Times selected for allocation // Statistics uint64_t ms_histogram[64]; // Free segment size distribution} Metaslab; typedef struct { uint64_t vdev_id; uint64_t vdev_size; Metaslab **vd_metaslabs; uint32_t vd_ms_count; // Allocation groups (for parallelism) uint32_t vd_mg_count; } Vdev; /** * Select metaslab for allocation * Uses weighted selection based on free space */Metaslab* vdev_select_metaslab(Vdev *vd, uint64_t size) { Metaslab *best = NULL; uint64_t best_weight = 0; for (uint32_t i = 0; i < vd->vd_ms_count; i++) { Metaslab *ms = vd->vd_metaslabs[i]; // Skip if not enough free space if (ms->ms_free < size) { continue; } // Calculate weight (free space - overhead to load) uint64_t weight = ms->ms_free; if (!ms->ms_loaded) { // Penalty for needing to load space map weight = weight * 90 / 100; } if (weight > best_weight) { best_weight = weight; best = ms; } } return best;} /** * Allocate from a specific metaslab */uint64_t metaslab_alloc(Metaslab *ms, uint64_t size) { // Ensure space map is loaded if (!ms->ms_loaded) { metaslab_load(ms); } // Allocate from space map's free tree uint64_t offset = sm_alloc(ms->ms_sm, size); if (offset != 0) { ms->ms_free -= size; ms->ms_activation_count++; recalculate_weight(ms); } return offset; // Returns absolute vdev offset} /** * Free space within a metaslab */void metaslab_free(Metaslab *ms, uint64_t offset, uint64_t size) { // Load if necessary if (!ms->ms_loaded) { // For frees, we can defer to space map sync // Just record in a deferred free list defer_free(ms, offset, size); return; } // Add to free tree range_tree_add(ms->ms_sm->free_tree, offset, size); sm_record(ms->ms_sm, true, offset, size); ms->ms_free += size; recalculate_weight(ms);} /** * Lazy loading: load space map only when metaslab is accessed */void metaslab_load(Metaslab *ms) { if (ms->ms_loaded || ms->ms_loading) { return; } ms->ms_loading = true; // Allocate space map structure ms->ms_sm = malloc(sizeof(SpaceMapLoaded)); memset(ms->ms_sm, 0, sizeof(SpaceMapLoaded)); // Read space map header SpaceMapHeader header; read_space_map_header(ms, &header); // Load and process the space map log sm_load(ms->ms_sm, &header, ms->vdev_handle); // Build histogram for allocation strategy build_free_histogram(ms); ms->ms_loading = false; ms->ms_loaded = true;}| Vdev Size | Metaslabs | Metaslab Size | Space Maps in Memory (if all loaded) |
|---|---|---|---|
| 100 GB | ~780 | 128 MB | ~50 MB |
| 1 TB | ~8,000 | 128 MB | ~500 MB |
| 10 TB | ~80,000 | 128 MB | ~5 GB |
| 100 TB | ~780,000 | 128 MB | ~50 GB |
Each loaded metaslab space map consumes memory for its AVL tree of free ranges. With very large pools, ZFS can unload inactive metaslab space maps to conserve memory, reloading them when needed. This is why lazy loading is crucial—only active metaslabs consume memory resources.
Space maps inherit ZFS's copy-on-write crash consistency without requiring separate journaling. The key insight is that space map changes are part of the atomic TXG commit.
Why Space Maps are Inherently Consistent:
No in-place updates: Space map blocks are never modified; new entries go to new blocks
Atomic überblock update: The überblock (root of the pool tree) is updated atomically using a two-phase commit:
Transaction group semantics: All changes in a TXG (data, metadata, space maps) become visible atomically or not at all
Self-healing: Checksums on every block detect corruption; mirrored/RAID-Z pools can reconstruct from redundancy
Crash Scenarios:
Scenario 1: Crash during TXG write (before commit)
State before crash:
- New data blocks written to disk
- New space map entries written
- Überblock NOT yet updated
After reboot:
- Überblock points to previous TXG
- New blocks are unreferenced ("leaked")
- Space maps reflect previous TXG state
- Run spa_sync to detect and free leaked blocks
Scenario 2: Crash after TXG commit
State:
- All new blocks written
- Überblock points to new TXG
After reboot:
- Everything is consistent
- No recovery needed
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
/** * ZFS Transaction Group (TXG) Commit with Space Maps * * Demonstrates how space map updates are atomic with pool state. */ typedef struct { uint64_t ub_txg; // Transaction group number uint64_t ub_timestamp; // Commit time uint64_t ub_rootbp; // Block pointer to MOS (Meta-Object Set) uint64_t ub_checksum[4]; // 256-bit checksum} Uberblock; /** * Atomic TXG commit process */void txg_commit(Pool *pool, TxgState *txg) { // Phase 1: Write all data blocks sync_data_blocks(txg); // Phase 2: Write updated metadata (includes space maps) for (Vdev *vd : pool->vdevs) { for (Metaslab *ms : vd->metaslabs) { if (ms->ms_sm && ms->ms_sm->pending_count > 0) { // Append pending entries to space map object sm_sync_to_disk(ms->ms_sm); } } } // Phase 3: Write new MOS root block uint64_t new_mos_bp = write_mos(pool); // Phase 4: Calculate new überblock Uberblock new_ub = { .ub_txg = txg->txg_number, .ub_timestamp = current_time(), .ub_rootbp = new_mos_bp, }; calculate_checksum(&new_ub); // Phase 5: Write new überblock to next slot (rotating through slots) uint64_t ub_slot = (pool->current_ub_slot + 1) % NUM_UB_SLOTS; write_uberblock(pool, ub_slot, &new_ub); // Issue disk flush to ensure all writes are persistent disk_flush(pool->all_vdevs); // Phase 6: Commit complete - update in-memory state pool->current_ub_slot = ub_slot; pool->current_txg = new_ub.ub_txg; // Phase 7: Previous TXG's space is now available for reuse process_deferred_frees(pool, txg->txg_number - 1);} /** * Pool open - find most recent valid überblock */int spa_load(Pool *pool) { Uberblock best_ub = {0}; // Check all überblock slots, choose most recent valid one for (int i = 0; i < NUM_UB_SLOTS; i++) { Uberblock ub; read_uberblock(pool, i, &ub); if (verify_checksum(&ub) && ub.ub_txg > best_ub.ub_txg) { best_ub = ub; pool->current_ub_slot = i; } } if (best_ub.ub_txg == 0) { return ECORRUPT; // No valid überblock found } pool->current_txg = best_ub.ub_txg; // Load MOS from überblock's root pointer load_mos(pool, best_ub.ub_rootbp); // Space maps are loaded lazily when metaslabs are accessed // They will reflect state as of best_ub.ub_txg printf("Pool loaded at TXG %lu\n", best_ub.ub_txg); return 0;}Because ZFS is always consistent (thanks to COW + atomic überblock), there's no need for a file system check (fsck) after crashes. On mount, ZFS simply finds the most recent valid überblock and loads from there. Any incomplete writes from the crashed TXG are simply ignored—they occupy space temporarily but will be reclaimed during background scrubbing.
Space maps represent a sophisticated synthesis of ideas from earlier techniques. Let's compare how they stack up against the traditional approaches we've studied.
| Aspect | Bitmap | Counting | Space Maps |
|---|---|---|---|
| Allocation time | O(N) scan | O(log N) tree | O(1) append |
| Free time | O(1) | O(log N) tree | O(1) append |
| Space overhead | Fixed (1 bit/block) | Variable (extents) | Log + tree (variable) |
| Crash consistency | Needs journal | Needs journal | Inherent (COW) |
| Load time | O(N) scan | O(extents) tree build | O(entries) replay |
| Parallelism | Bitmap sections | Separate trees | Metaslabs |
| Best for | Small-medium FS | Large sequential | All workloads |
Space maps are optimal for ZFS's design goals (enterprise storage, data integrity, pooled storage) but may be overkill for embedded systems or simple use cases. Bitmaps remain excellent for their simplicity, and counting/extent tracking works well in ext4 and XFS where journaling handles consistency. Each technique has its place.
Space maps represent the culmination of decades of file system evolution, combining the best aspects of logging, extent tracking, and copy-on-write semantics. They demonstrate that free space management isn't just about tracking which blocks are available—it's deeply intertwined with crash consistency, performance, and scalability.
With this page, we complete our exploration of free space management techniques—from the elegant simplicity of bitmaps to the sophisticated log-structured approach of space maps.
You've mastered the fundamental techniques for tracking free space in file systems—from the simplest (bitmap) to the most sophisticated (space maps). These techniques form the foundation for understanding how any file system manages its storage resources. You're now equipped to analyze file system design decisions and understand the trade-offs that shape modern storage systems.