Loading content...
Consider a freshly formatted 1TB disk. Every single block is free—268 million of them. A bitmap would require 32MB to represent this state. A linked list would require traversing 268 million blocks. Even grouping would need over 500,000 container blocks.
But there's a simpler representation: "Block 0, count 268,435,456." Just two numbers capture the entire free space state. This is the essence of counting-based free space management.
Counting takes advantage of a crucial observation: free space often exists in contiguous regions (called extents or runs). Rather than track individual blocks, counting tracks the start and length of each free region. When free space is highly contiguous, this representation is extraordinarily compact. When free space is fragmented into single-block regions, counting degrades to something similar to grouping.
This technique underlies the extent-based allocation strategies used by modern file systems like ext4, XFS, and Btrfs, making it essential knowledge for understanding contemporary storage architecture.
By the end of this page, you will understand how counting represents free space as extents, the data structures for extent tracking (lists, trees, skip lists), allocation algorithms including best-fit and first-fit for extents, fragmentation effects on counting efficiency, the relationship between counting and extent-based file systems, and real-world implementations in production systems.
Instead of storing individual free block addresses, counting stores free extents—contiguous regions of free blocks characterized by a starting block and a block count:
Free Extent: (start_block, block_count)
Example - disk with scattered free regions:
┌────────────────────────────────────────────────────────────────────┐
│Block: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 │
│State: A F F F A A F F A A F F F F A │
└────────────────────────────────────────────────────────────────────┘
Bitmap representation: [0,1,1,1,0,0,1,1,0,0,1,1,1,1,0]
Linked list representation: 1→2→3→6→7→10→11→12→13 (9 entries)
Counting representation: (1,3), (6,2), (10,4) (3 entries!)
Extent Structure:
typedef struct {
uint64_t start; // Starting block number
uint64_t count; // Number of contiguous free blocks
} FreeExtent;
With 16 bytes per extent and 4KB storage blocks, one block can store 256 extents. If average extent size is 1000 blocks, one storage block represents 256,000 free blocks—compared to ~4,000 bytes for a bitmap covering the same range.
Space Efficiency Analysis:
| Scenario | Free Blocks | Extents Needed | Counting Size | Bitmap Size | Ratio |
|---|---|---|---|---|---|
| Fresh disk (1TB) | 268M | 1 | 16 bytes | 32 MB | 2,000,000:1 |
| 50% full, large files | 134M | ~1,000 | 16 KB | 32 MB | 2,000:1 |
| 50% full, tiny files | 134M | ~134M | ~2 GB | 32 MB | 1:64 |
| Worst case (alternating) | 134M | 134M | ~2 GB | 32 MB | 1:64 |
Counting efficiency depends entirely on fragmentation level. With highly contiguous free space, counting is extraordinarily compact. With pathological fragmentation (every other block free), counting takes 16 bytes per block vs. bitmap's 1 bit—128× worse! Real-world file systems typically fall between these extremes, but counting's variable overhead must be accounted for in design.
The choice of data structure for storing extents significantly impacts operation performance. Common approaches include sorted linked lists, balanced trees, and B-trees.
Approach 1: Sorted Linked List
Extents stored in a linked list sorted by starting block number.
Approach 2: Balanced BST (e.g., Red-Black Tree)
Primary tree sorted by starting block, possibly secondary index by size.
Approach 3: B-tree / B+ tree
Disk-optimized tree with high fan-out. Used extensively in database systems and modern file systems.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
#include <stdint.h>#include <stdbool.h>#include <stdlib.h> /** * Free Extent representation */typedef struct { uint64_t start; // Starting block number uint64_t count; // Number of contiguous blocks} FreeExtent; /** * Sorted Linked List Implementation */typedef struct ExtentNode { FreeExtent extent; struct ExtentNode *next;} ExtentNode; typedef struct { ExtentNode *head; uint64_t extent_count; // Number of extents uint64_t total_free; // Total free blocks (sum of all counts)} ExtentList; /** * Find extent containing a specific block * O(N) linear search */FreeExtent* list_find_containing(ExtentList *el, uint64_t block) { ExtentNode *current = el->head; while (current) { if (block >= current->extent.start && block < current->extent.start + current->extent.count) { return ¤t->extent; } // Early termination: list is sorted, no point continuing if (block < current->extent.start) { return NULL; } current = current->next; } return NULL;} /** * Red-Black Tree Node for Extent Storage * (Balanced BST, O(log N) operations) */typedef enum { RB_RED, RB_BLACK } RBColor; typedef struct RBNode { FreeExtent extent; RBColor color; struct RBNode *left, *right, *parent;} RBNode; typedef struct { RBNode *root; uint64_t extent_count; uint64_t total_free;} ExtentTree; /** * Find extent containing block using BST search * O(log N) time complexity */FreeExtent* tree_find_containing(ExtentTree *et, uint64_t block) { RBNode *current = et->root; while (current) { if (block < current->extent.start) { current = current->left; } else if (block >= current->extent.start + current->extent.count) { current = current->right; } else { return ¤t->extent; // Found! } } return NULL;} /** * B+ Tree Node for Disk-Efficient Extent Storage * Used by production file systems like XFS */#define BTREE_ORDER 128 // Fan-out: ~128 children per node typedef struct BPlusNode { bool is_leaf; uint16_t key_count; union { struct { uint64_t keys[BTREE_ORDER - 1]; struct BPlusNode *children[BTREE_ORDER]; } internal; struct { FreeExtent extents[BTREE_ORDER - 1]; struct BPlusNode *next_leaf; // For range scans } leaf; };} BPlusNode; typedef struct { BPlusNode *root; uint64_t extent_count; uint64_t total_free; uint32_t height;} BPlusTree; /** * Find extent of size >= requested in B+ tree * Uses size-based secondary index * O(log_B N) with high fan-out */FreeExtent* bplus_find_by_size(BPlusTree *bt, uint64_t min_size) { // Requires secondary B+ tree indexed by extent size // or linear scan of leaves (still efficient due to sequential access) // Simplified: scan leaves left-to-right BPlusNode *leaf = find_leftmost_leaf(bt->root); while (leaf) { for (int i = 0; i < leaf->key_count; i++) { if (leaf->leaf.extents[i].count >= min_size) { return &leaf->leaf.extents[i]; } } leaf = leaf->leaf.next_leaf; } return NULL;}| Operation | Sorted List | Red-Black Tree | B+ Tree (disk) |
|---|---|---|---|
| Find by block | O(N) | O(log N) | O(log_B N) |
| Find by size | O(N) | O(log N)* | O(log_B N)* |
| Insert | O(N) | O(log N) | O(log_B N) |
| Delete | O(N) | O(log N) | O(log_B N) |
| Merge adjacent | O(1) after find | O(log N) | O(log_B N) |
| Space overhead | Low | Moderate | Highest (nodes) |
| Disk I/O per op | O(N) | O(log N) | O(log_B N) |
When allocating from extent-tracked free space, several strategies determine which extent to use when multiple candidates exist.
First-Fit: Use the first extent large enough to satisfy the request.
Extents: [(100, 50), (300, 1000), (1500, 200), (2000, 100)]
Request: 80 blocks
First-fit: (300, 1000) → allocate 300-379, remainder (380, 920)
Best-Fit: Use the smallest extent that can satisfy the request.
Extents: [(100, 50), (300, 1000), (1500, 200), (2000, 100)]
Request: 80 blocks
Best-fit: (2000, 100) → allocate 2000-2079, remainder (2080, 20)
Worst-Fit: Use the largest extent, maximizing the leftover.
Request: 80 blocks
Worst-fit: (300, 1000) → allocate 300-379, remainder (380, 920)
Next-Fit: Like first-fit, but resume search from where last allocation ended.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
/** * Extent Allocation Strategies */ typedef enum { ALLOC_FIRST_FIT, ALLOC_BEST_FIT, ALLOC_WORST_FIT, ALLOC_NEXT_FIT} AllocationStrategy; typedef struct { uint64_t start; // First block of allocation uint64_t count; // Blocks allocated bool success;} AllocationResult; /** * Find best extent using specified strategy */ExtentNode* find_extent(ExtentList *el, uint64_t needed, AllocationStrategy strategy) { ExtentNode *best = NULL; ExtentNode *current = el->head; while (current) { if (current->extent.count >= needed) { switch (strategy) { case ALLOC_FIRST_FIT: return current; // Take immediately case ALLOC_BEST_FIT: if (!best || current->extent.count < best->extent.count) { best = current; } break; case ALLOC_WORST_FIT: if (!best || current->extent.count > best->extent.count) { best = current; } break; case ALLOC_NEXT_FIT: // Would need to track last allocation position return current; // Simplified to first-fit } } current = current->next; } return best;} /** * Allocate from extent, updating or splitting as needed */AllocationResult allocate_from_extent(ExtentList *el, ExtentNode *node, ExtentNode *prev, uint64_t needed) { AllocationResult result = { 0, 0, false }; if (!node || node->extent.count < needed) { return result; } result.start = node->extent.start; result.count = needed; result.success = true; if (node->extent.count == needed) { // Exact fit - remove extent entirely if (prev) { prev->next = node->next; } else { el->head = node->next; } free(node); el->extent_count--; } else { // Partial allocation - shrink extent node->extent.start += needed; node->extent.count -= needed; } el->total_free -= needed; return result;} /** * Allocate contiguous blocks - counting excels here! */AllocationResult allocate_contiguous(ExtentList *el, uint64_t needed, AllocationStrategy strategy) { ExtentNode *prev = NULL; ExtentNode *current = el->head; ExtentNode *best = NULL; ExtentNode *best_prev = NULL; // Find appropriate extent based on strategy while (current) { if (current->extent.count >= needed) { if (strategy == ALLOC_FIRST_FIT) { return allocate_from_extent(el, current, prev, needed); } bool is_better = false; if (strategy == ALLOC_BEST_FIT) { is_better = !best || current->extent.count < best->extent.count; } else if (strategy == ALLOC_WORST_FIT) { is_better = !best || current->extent.count > best->extent.count; } if (is_better) { best = current; best_prev = prev; } } prev = current; current = current->next; } if (best) { return allocate_from_extent(el, best, best_prev, needed); } return (AllocationResult){ 0, 0, false };}Unlike bitmaps or linked lists where finding contiguous space requires scanning, extent tracking makes contiguous allocation trivial—if an extent of sufficient size exists, it's directly usable. This is why extent-based allocation is dominant in modern file systems designed for large files and sequential I/O.
When freeing blocks, a critical operation is coalescing: merging newly freed blocks with adjacent free extents. Without coalescing, each free operation creates a new extent, leading to extent explosion.
The Coalescing Problem:
Before freeing block 400:
Extents: [(100, 50), (300, 99), (500, 200)]
Block 150 Block 399 Block 700
Without coalescing, free block 400:
Extents: [(100, 50), (300, 99), (400, 1), (500, 200)] ← 4 extents now
With coalescing, free block 400:
Check: Is 399 end of an extent? 300+99=399, YES!
Check: Is 401 start of an extent? No (500 is next)
Result: Extend (300, 99) to (300, 100)
Extents: [(100, 50), (300, 100), (500, 200)] ← still 3 extents
Coalescing Cases:
Example: Bridging merge:
Before freeing block 150:
Extents: [(100, 50), (151, 49)] ← blocks 100-149 and 151-199 free
Free block 150:
- 149 is end of first extent (100+50-1=149+1=150 ✓)
- 151 is start of second extent (✓)
- Merge all: (100, 100) ← blocks 100-199 all free
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
/** * Extent Coalescing: Merge adjacent free regions on free() */ /** * Free a range of blocks with automatic coalescing * * Handles all three cases: * 1. No adjacent extents - create new extent * 2. Adjacent to one extent - extend it * 3. Bridges two extents - merge into one */void free_with_coalesce(ExtentList *el, uint64_t start, uint64_t count) { uint64_t end = start + count; // First block after freed region ExtentNode *prev = NULL; ExtentNode *current = el->head; ExtentNode *left_neighbor = NULL; // Extent ending at start-1 ExtentNode *left_prev = NULL; ExtentNode *right_neighbor = NULL; // Extent starting at end ExtentNode *right_prev = NULL; // Find insertion point and neighbors while (current) { uint64_t extent_end = current->extent.start + current->extent.count; // Check if this extent ends just before our freed region if (extent_end == start) { left_neighbor = current; left_prev = prev; } // Check if this extent starts just after our freed region if (current->extent.start == end) { right_neighbor = current; right_prev = prev; } prev = current; current = current->next; } el->total_free += count; if (left_neighbor && right_neighbor) { // Case 3: Bridge merge - combine three regions // Extend left to include freed region and right neighbor left_neighbor->extent.count += count + right_neighbor->extent.count; // Remove right neighbor from list if (left_neighbor->next == right_neighbor) { left_neighbor->next = right_neighbor->next; } else { // right_neighbor is somewhere else in list if (right_prev) { right_prev->next = right_neighbor->next; } } free(right_neighbor); el->extent_count--; // Net: removed one extent by merging } else if (left_neighbor) { // Case 2a: Extend left neighbor to the right left_neighbor->extent.count += count; } else if (right_neighbor) { // Case 2b: Extend right neighbor to the left right_neighbor->extent.start = start; right_neighbor->extent.count += count; } else { // Case 1: No neighbors - create new extent ExtentNode *new_node = malloc(sizeof(ExtentNode)); new_node->extent.start = start; new_node->extent.count = count; // Insert in sorted order ExtentNode *insert_prev = NULL; ExtentNode *insert_pos = el->head; while (insert_pos && insert_pos->extent.start < start) { insert_prev = insert_pos; insert_pos = insert_pos->next; } new_node->next = insert_pos; if (insert_prev) { insert_prev->next = new_node; } else { el->head = new_node; } el->extent_count++; }} /** * Optimized coalescing using BST - O(log N) neighbor finding */void tree_free_with_coalesce(ExtentTree *et, uint64_t start, uint64_t count) { uint64_t end = start + count; // Find potential left neighbor: largest extent with start < our start RBNode *left = tree_find_predecessor(et, start); bool coalesce_left = (left && left->extent.start + left->extent.count == start); // Find potential right neighbor: smallest extent with start > our end RBNode *right = tree_find_successor(et, start); bool coalesce_right = (right && right->extent.start == end); et->total_free += count; if (coalesce_left && coalesce_right) { // Extend left, delete right left->extent.count += count + right->extent.count; tree_delete(et, right); et->extent_count--; } else if (coalesce_left) { left->extent.count += count; } else if (coalesce_right) { right->extent.start = start; right->extent.count += count; } else { // Insert new extent tree_insert(et, (FreeExtent){ start, count }); et->extent_count++; }}Without coalescing, extent count can grow to equal the number of free operations, destroying the space efficiency advantage. Coalescing ensures that freeing N contiguous blocks results in at most 1 extent, maintaining compactness. This is why extent-based systems are careful to always check for adjacent regions during free operations.
XFS (developed by Silicon Graphics in the 1990s, now widely used in Linux) is perhaps the most sophisticated real-world example of counting/extent-based free space management. It uses dual B+ trees per allocation group for optimal lookups.
XFS Free Space Structure:
Each Allocation Group (AG) contains:
BNO Tree: B+ tree indexed by block number
CNT Tree: B+ tree indexed by count (length)
BNO Tree (by block): CNT Tree (by count):
┌─────┐ ┌─────┐
│ 500 │ │ 100 │
└──┬──┘ └──┬──┘
┌───┴───┐ ┌───┴───┐
┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐
│ 200 │ │ 800 │ │ 50 │ │ 150 │
└─────┘ └─────┘ └─────┘ └─────┘
(200,50) (800,150) (200,50) (500,100) (800,150)
(500,100) (ordered differently!)
Why Dual Trees?
Different allocation patterns require different lookups:
Maintaining two trees costs 2× space and update time, but eliminates O(N) scans entirely.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145
/** * XFS-style Dual B+ Tree Free Space Management * * Two trees provide optimal access for different query patterns: * - BNO tree: keyed by starting block (for locality) * - CNT tree: keyed by extent length (for size-based queries) */ typedef struct { BPlusTree *bno_tree; // By block number BPlusTree *cnt_tree; // By count (length) uint64_t ag_start; // Starting block of this allocation group uint64_t ag_length; // Total blocks in this AG uint64_t free_blocks; // Free blocks in this AG uint64_t extent_count; // Number of free extents} AllocationGroup; /** * Allocate extent near target block (locality-preserving) * Used when extending files or creating related data */FreeExtent alloc_near(AllocationGroup *ag, uint64_t target, uint64_t needed) { FreeExtent result = { 0, 0 }; // Search BNO tree for nearest extent with sufficient size // 1. Find first extent starting at or after target BPlusIter *iter = bno_tree_find_ge(ag->bno_tree, target); // 2. Also look backwards for extent ending near target FreeExtent *forward = bplus_iter_get(iter); FreeExtent *backward = bno_tree_find_predecessor(ag->bno_tree, target); // 3. Choose closest one that's large enough uint64_t forward_dist = forward ? (forward->start - target) : UINT64_MAX; uint64_t backward_dist = UINT64_MAX; if (backward) { uint64_t back_end = backward->start + backward->count; backward_dist = (target >= back_end) ? (target - back_end) : 0; } FreeExtent *chosen = NULL; if (forward && forward->count >= needed && forward_dist <= backward_dist) { chosen = forward; } else if (backward && backward->count >= needed) { chosen = backward; } if (!chosen) { // No extent near target - fall back to best-fit return alloc_bestfit(ag, needed); } result = *chosen; result.count = needed; // Allocate only what's needed // Remove from both trees and re-insert remainder if any remove_and_update_trees(ag, chosen, needed); return result;} /** * Allocate using best-fit strategy (from CNT tree) * Finds smallest extent that can satisfy request */FreeExtent alloc_bestfit(AllocationGroup *ag, uint64_t needed) { FreeExtent result = { 0, 0 }; // Query CNT tree for smallest extent >= needed FreeExtent *extent = cnt_tree_find_ge(ag->cnt_tree, needed); if (!extent) { return result; // No extent large enough } result.start = extent->start; result.count = needed; remove_and_update_trees(ag, extent, needed); return result;} /** * Remove allocated portion from both trees */void remove_and_update_trees(AllocationGroup *ag, FreeExtent *extent, uint64_t allocated) { // Remove from both trees bno_tree_remove(ag->bno_tree, extent->start); cnt_tree_remove(ag->cnt_tree, extent->count, extent->start); ag->free_blocks -= allocated; if (extent->count > allocated) { // Re-insert remainder FreeExtent remainder = { .start = extent->start + allocated, .count = extent->count - allocated }; bno_tree_insert(ag->bno_tree, &remainder); cnt_tree_insert(ag->cnt_tree, &remainder); } else { ag->extent_count--; }} /** * Free extent with coalescing - update both trees */void xfs_free(AllocationGroup *ag, uint64_t start, uint64_t count) { // Find neighbors in BNO tree (O(log N)) FreeExtent *left = bno_tree_find_predecessor(ag->bno_tree, start); FreeExtent *right = bno_tree_find_successor(ag->bno_tree, start); bool merge_left = (left && left->start + left->count == start); bool merge_right = (right && right->start == start + count); FreeExtent new_extent = { start, count }; if (merge_left) { // Remove left from both trees bno_tree_remove(ag->bno_tree, left->start); cnt_tree_remove(ag->cnt_tree, left->count, left->start); new_extent.start = left->start; new_extent.count += left->count; ag->extent_count--; } if (merge_right) { // Remove right from both trees bno_tree_remove(ag->bno_tree, right->start); cnt_tree_remove(ag->cnt_tree, right->count, right->start); new_extent.count += right->count; ag->extent_count--; } // Insert coalesced extent into both trees bno_tree_insert(ag->bno_tree, &new_extent); cnt_tree_insert(ag->cnt_tree, &new_extent); ag->extent_count++; ag->free_blocks += count;}By maintaining two indexes over the same data, XFS achieves O(log N) for both position-based and size-based queries. The overhead of maintaining both trees is justified because allocations are far more frequent than the occasional bulk operations that would benefit from a linear scan. This is a textbook example of trading space (2× index) for time (O(log N) vs O(N)).
Counting-based free space management has a fundamental relationship with fragmentation: its efficiency directly reflects the fragmentation state of the disk.
Fragmentation Metrics:
Fragmentation Progression:
Stage 1: Fresh disk
Free: 1,000,000 blocks
Extents: 1
Avg extent: 1,000,000 blocks
Counting overhead: 16 bytes
Stage 2: After random allocations/deallocations
Free: 500,000 blocks
Extents: 10,000
Avg extent: 50 blocks
Counting overhead: 160 KB
Stage 3: Heavily fragmented
Free: 500,000 blocks
Extents: 500,000 (worst case: every other block)
Avg extent: 1 block
Counting overhead: 8 MB
Stage 4: After defragmentation
Free: 500,000 blocks
Extents: 100
Avg extent: 5,000 blocks
Counting overhead: 1.6 KB
| Fragmentation Level | Extent Count | Overhead | vs Bitmap (62.5KB) |
|---|---|---|---|
| None (contiguous) | 1 | 16 bytes | 3900× better |
| Low (large extents) | 100 | 1.6 KB | 39× better |
| Moderate | 10,000 | 160 KB | 2.6× worse |
| High | 100,000 | 1.6 MB | 26× worse |
| Extreme (single blocks) | 500,000 | 8 MB | 128× worse |
Fragmentation Prevention Strategies:
When Counting Beats Bitmap:
Rule of thumb: Counting is more space-efficient when:
extent_count × 16 bytes < total_blocks / 8
extent_count < total_blocks / 128
For 1 million blocks, break-even is ~7,800 extents. With fewer extents, counting wins; with more, bitmap wins.
Hybrid Approaches:
Some file systems use counting for large regions and bitmaps for small/fragmented areas:
In the worst case (every other block free), counting uses 128× more space than a bitmap. This is why production extent-based file systems monitor fragmentation levels, perform background defragmentation, and use strategies like delayed allocation to maintain large contiguous regions. Counting is brilliant with contiguity but costly without it.
Counting represents the natural evolution of free space tracking for extent-based file systems. By storing (start, length) pairs instead of individual block addresses, it achieves remarkable compression when free space is contiguous—exactly the condition that extent-based file systems strive to maintain.
What's Next:
We've covered traditional free space management techniques: bitmaps, linked lists, grouping, and counting. The final page explores Space Maps—ZFS's innovative copy-on-write approach that combines counting concepts with transactional semantics for superior crash consistency and performance.
You now understand counting-based free space management—from extent representation through allocation algorithms to XFS's dual-tree implementation. This technique underpins modern extent-based file systems and is essential knowledge for understanding contemporary storage architecture.