Loading learning content...
In the previous page, we saw how adding a single field—subtree size—transforms a basic BST into a powerful Order Statistics Tree capable of O(log n) rank queries. This wasn't magic or a special data structure; it was an application of a general pattern: tree augmentation.
Tree augmentation is the technique of storing additional computed information at each node to efficiently support operations beyond basic insert, delete, and search. The key insight is that trees naturally aggregate information along paths—if we carefully choose what to aggregate, we can answer sophisticated queries without traversing the entire structure.
This page generalizes the augmentation pattern, exploring multiple examples and providing a framework for designing your own augmentations.
By the end of this page, you will understand the general augmentation methodology, see how it applies to interval trees, range queries, and other patterns, learn the rules for what can and cannot be augmented efficiently, and gain a framework for designing custom augmentations for your specific requirements.
Not all node properties can be efficiently maintained during tree modifications. The Augmentation Theorem tells us exactly what can work:
Augmentation Theorem (informal): A field f can be efficiently maintained in a BST if and only if f at any node can be computed solely from:
- Information stored in the node itself (key, value, other node fields)
- f values of its children
- O(1) additional information
In formal terms, if we can express:
node.f = compute(node.key, node.value, node.left.f, node.right.f)
...where compute runs in O(1) time, then the augmentation is efficient.
The theorem works because tree modifications (insert, delete, rotate) only affect nodes on a single path from leaf to root. If each node's augmented field depends only on itself and its children, we can update all affected nodes in O(h) time by walking up the path and recomputing. In a balanced tree, this is O(log n).
Examples of valid augmentations:
| Augmentation | Formula | Use Case |
|---|---|---|
| Subtree size | 1 + size(left) + size(right) | Order statistics |
| Subtree height | 1 + max(height(left), height(right)) | AVL balance |
| Subtree max | max(key, max(left), max(right)) | Interval trees |
| Subtree sum | value + sum(left) + sum(right) | Range sum queries |
| Black height | Based on children's black heights | Red-Black trees |
Examples of invalid augmentations:
| Attempted Augmentation | Why It Fails |
|---|---|
| Ancestor count | Depends on path to root, not children |
| Median of subtree | Requires O(n) knowledge of all subtree values |
| Parent's key | Requires parent reference, not derived from children |
| Depth | Depends on ancestors, not descendants |
Efficient augmentation requires locality—each node's augmented data must be computable from local information (itself and children). Any augmentation requiring knowledge of ancestors, cousins, or the global tree structure cannot be maintained in O(log n) per modification.
One of the most powerful applications of augmentation is enabling efficient range queries. Consider a BST where we want to quickly answer: 'What is the sum of all values with keys in range [L, R]?'
Naive approach: Traverse all nodes, check if key is in range, sum values. Time: O(n).
Augmented approach: Store the sum of values in each subtree. Use the BST property to prune subtrees that are entirely outside the range. Time: O(log n + k) where k is the number of keys in range, or O(log n) with clever aggregation.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
interface RangeSumNode { key: number; value: number; left: RangeSumNode | null; right: RangeSumNode | null; subtreeSum: number; // Sum of values in this subtree} function getSum(node: RangeSumNode | null): number { return node === null ? 0 : node.subtreeSum;} function updateNode(node: RangeSumNode): void { node.subtreeSum = node.value + getSum(node.left) + getSum(node.right);} /** * Range sum query: sum of values where key ∈ [low, high] * * Time Complexity: O(log n) when range spans a small fraction of keys * O(n) worst case when range includes all keys * * The key insight: we don't need to visit every node in range. * We can use subtreeSum to include entire subtrees when they're * fully contained in the range. */function rangeSum( node: RangeSumNode | null, low: number, high: number): number { if (node === null) return 0; // Case 1: Current node is below range - search right only if (node.key < low) { return rangeSum(node.right, low, high); } // Case 2: Current node is above range - search left only if (node.key > high) { return rangeSum(node.left, low, high); } // Case 3: Current node is in range [low, high] // Include this node's value plus contributions from children let sum = node.value; // Left child: we need keys >= low if (node.left !== null) { if (node.left.key >= low) { // Entire left subtree might be in range sum += rangeSumOptimized(node.left, low, high); } else { sum += rangeSum(node.left, low, high); } } // Right child: we need keys <= high if (node.right !== null) { if (node.right.key <= high) { // Entire right subtree might be in range sum += rangeSumOptimized(node.right, low, high); } else { sum += rangeSum(node.right, low, high); } } return sum;} /** * Optimized range sum when we know entire subtree is in range. * Uses the subtreeSum augmentation directly. */function rangeSumOptimized( node: RangeSumNode | null, low: number, high: number): number { if (node === null) return 0; // Check if entire subtree is within range // (This requires also storing subtreeMin and subtreeMax) // For simplicity, we do standard rangeSum return rangeSum(node, low, high);}Enhanced version with min/max augmentation:
To truly achieve O(log n) range sums, we can add subtreeMin and subtreeMax fields. When processing a node:
[subtreeMin, subtreeMax] is entirely within [low, high] → return subtreeSum directly[subtreeMin, subtreeMax] is entirely outside [low, high] → return 0This allows us to short-circuit entire subtrees, achieving O(log n) for typical range queries.
Interval Trees are a canonical example of BST augmentation. They solve the interval overlap query problem:
Given a set of intervals [start, end], efficiently find all intervals that overlap with a query interval [q_start, q_end].
Applications include:
The augmentation: Each node stores the maximum endpoint value in its subtree.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
interface Interval { start: number; end: number;} interface IntervalNode { interval: Interval; left: IntervalNode | null; right: IntervalNode | null; maxEnd: number; // Maximum 'end' value in this subtree} /** * Get the maximum end value in a subtree. */function getMaxEnd(node: IntervalNode | null): number { return node === null ? -Infinity : node.maxEnd;} /** * Update the maxEnd augmentation after modification. */function updateMaxEnd(node: IntervalNode): void { node.maxEnd = Math.max( node.interval.end, getMaxEnd(node.left), getMaxEnd(node.right) );} /** * Insert an interval. BST is keyed by interval.start. */function insertInterval( root: IntervalNode | null, interval: Interval): IntervalNode { if (root === null) { return { interval, left: null, right: null, maxEnd: interval.end, }; } if (interval.start < root.interval.start) { root.left = insertInterval(root.left, interval); } else { root.right = insertInterval(root.right, interval); } updateMaxEnd(root); return root;} /** * Two intervals overlap if: a.start <= b.end AND b.start <= a.end */function intervalsOverlap(a: Interval, b: Interval): boolean { return a.start <= b.end && b.start <= a.end;} /** * Find ONE interval that overlaps with the query. * Time: O(log n) expected */function findOverlapping( node: IntervalNode | null, query: Interval): Interval | null { if (node === null) return null; // Check current node if (intervalsOverlap(node.interval, query)) { return node.interval; } // Decide which subtree to search: // If left subtree exists AND its maxEnd >= query.start, // there MIGHT be an overlapping interval on the left. if (node.left !== null && node.left.maxEnd >= query.start) { const found = findOverlapping(node.left, query); if (found !== null) return found; } // Otherwise, search right subtree return findOverlapping(node.right, query);} /** * Find ALL intervals that overlap with the query. * Time: O(log n + k) where k is the number of overlapping intervals */function findAllOverlapping( node: IntervalNode | null, query: Interval, results: Interval[] = []): Interval[] { if (node === null) return results; // Add current node if it overlaps if (intervalsOverlap(node.interval, query)) { results.push(node.interval); } // Search left if there might be overlaps if (node.left !== null && node.left.maxEnd >= query.start) { findAllOverlapping(node.left, query, results); } // Search right if there might be overlaps // The right subtree can have overlaps if its intervals start // before the query ends if (node.right !== null && node.interval.start <= query.end) { findAllOverlapping(node.right, query, results); } return results;}The maxEnd field enables powerful pruning. When searching for overlaps with query [q_start, q_end], if a subtree's maxEnd < q_start, we know NO interval in that subtree can overlap with the query (all intervals end before the query starts). This single augmented field transforms O(n) searching into O(log n + k).
Interval tree complexity:
| Operation | Time Complexity |
|---|---|
| Insert | O(log n) |
| Delete | O(log n) |
| Find any overlap | O(log n) |
| Find all k overlaps | O(log n + k) |
| Space | O(n) |
The interval tree is a textbook example of how a single augmented field (maxEnd) enables an entirely new category of queries while preserving the O(log n) modification complexity.
We're not limited to a single augmented field. Complex applications often require multiple augmentations that work together:
Example: A comprehensive statistics tree
Imagine we need:
We can combine multiple augmentations in a single structure:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117
interface StatisticsNode { key: number; value: number; left: StatisticsNode | null; right: StatisticsNode | null; // Multiple augmentations size: number; // For order statistics sum: number; // For range sum queries min: number; // Minimum key in subtree max: number; // Maximum key in subtree height: number; // For balance checking} /** * Helper functions for null-safe access */function getSize(node: StatisticsNode | null): number { return node === null ? 0 : node.size;} function getSum(node: StatisticsNode | null): number { return node === null ? 0 : node.sum;} function getMin(node: StatisticsNode | null): number { return node === null ? Infinity : node.min;} function getMax(node: StatisticsNode | null): number { return node === null ? -Infinity : node.max;} function getHeight(node: StatisticsNode | null): number { return node === null ? -1 : node.height;} /** * Update ALL augmented fields after modification. * Each field follows the augmentation theorem: * - Computable from self and children only * - O(1) computation time */function updateAugmentations(node: StatisticsNode): void { node.size = 1 + getSize(node.left) + getSize(node.right); node.sum = node.value + getSum(node.left) + getSum(node.right); node.min = Math.min(node.key, getMin(node.left), getMin(node.right)); node.max = Math.max(node.key, getMax(node.left), getMax(node.right)); node.height = 1 + Math.max(getHeight(node.left), getHeight(node.right));} /** * Create a new leaf node with all augmentations initialized. */function createStatsNode(key: number, value: number): StatisticsNode { return { key, value, left: null, right: null, size: 1, sum: value, min: key, max: key, height: 0, };} /** * Now we can implement multiple operations efficiently: */ // Order statistics: Select k-th elementfunction select(node: StatisticsNode | null, k: number): StatisticsNode | null { if (node === null || k < 1 || k > getSize(node)) return null; const leftSize = getSize(node.left); if (k <= leftSize) return select(node.left, k); if (k === leftSize + 1) return node; return select(node.right, k - leftSize - 1);} // Range sum: sum of values in key range [lo, hi]function rangeSum( node: StatisticsNode | null, lo: number, hi: number): number { if (node === null) return 0; // Use min/max to check if entire subtree is in range if (node.min >= lo && node.max <= hi) { return node.sum; // Entire subtree in range! } if (node.max < lo || node.min > hi) { return 0; // Entire subtree out of range } // Partial overlap: recurse let result = 0; if (node.key >= lo && node.key <= hi) { result += node.value; } result += rangeSum(node.left, lo, hi); result += rangeSum(node.right, lo, hi); return result;} // Check AVL balance using heightfunction isBalanced(node: StatisticsNode | null): boolean { if (node === null) return true; const balanceFactor = getHeight(node.left) - getHeight(node.right); return Math.abs(balanceFactor) <= 1 && isBalanced(node.left) && isBalanced(node.right);}Each augmented field adds: • O(1) space per node • O(1) time per modification (for updates)
With k augmented fields, insertions and deletions become O(k × log n). For a fixed number of augmentations, this is still O(log n). Be mindful of adding too many augmentations if modification performance is critical.
When you need an operation not covered by standard augmentations, follow this systematic process:
Step 1: Define the query precisely
What exactly do you need to compute? Write it mathematically.
Step 2: Identify what information would help
What aggregate data, if available at each node, would let you answer the query without full traversal?
Step 3: Check the augmentation theorem
Can this data be computed from the node itself and its children's augmented values? If not, the augmentation won't be efficient.
Step 4: Design the update formula
Write the explicit recurrence for computing the field from children.
Step 5: Handle edge cases
What value does a null (empty) subtree contribute? This becomes your base case.
Worked example: Second-maximum in subtree
Query: For any subtree, find the second-largest key.
Why useful: Efficiently find 'runner-up' in range queries.
Analysis:
Can we compute secondMax from children? Let's think:
The secondMax of current subtree is the second element in:
{key, max_L, max_R, secondMax_L, secondMax_R}
This can be computed in O(1) by sorting 5 values and taking the second.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
interface SecondMaxNode { key: number; left: SecondMaxNode | null; right: SecondMaxNode | null; max: number; // Maximum in subtree secondMax: number; // Second maximum in subtree} /** * Compute max and secondMax from children. * * We need the two largest values among: * - Current node's key * - Left child's max and secondMax * - Right child's max and secondMax */function updateSecondMax(node: SecondMaxNode): void { // Collect all candidate values const candidates: number[] = [node.key]; if (node.left !== null) { candidates.push(node.left.max); if (node.left.secondMax !== -Infinity) { candidates.push(node.left.secondMax); } } if (node.right !== null) { candidates.push(node.right.max); if (node.right.secondMax !== -Infinity) { candidates.push(node.right.secondMax); } } // Find max and second max from candidates let max1 = -Infinity; let max2 = -Infinity; for (const val of candidates) { if (val > max1) { max2 = max1; max1 = val; } else if (val > max2 && val !== max1) { max2 = val; } } node.max = max1; node.secondMax = max2;} /** * Edge case: When is secondMax = -Infinity? * - Single-node subtree: only one value, no second max * - All nodes have same key: no distinct second max * * Client code should check for secondMax === -Infinity to handle these. */ /** * Application: Find second-largest in a key range [lo, hi] * * This is more complex because we need to filter by range, * but the augmentation still helps by providing quick answers * for fully-contained subtrees. */function secondMaxInRange( node: SecondMaxNode | null, lo: number, hi: number): number { if (node === null) return -Infinity; // If entire subtree is in range, use augmented values // (Would need subtreeMin, subtreeMax augmentations for this check) // For now, collect and process const values: number[] = []; collectInRange(node, lo, hi, values); // Sort and get second max values.sort((a, b) => b - a); return values.length >= 2 ? values[1] : -Infinity;} function collectInRange( node: SecondMaxNode | null, lo: number, hi: number, results: number[]): void { if (node === null) return; if (node.key > lo) collectInRange(node.left, lo, hi, results); if (node.key >= lo && node.key <= hi) results.push(node.key); if (node.key < hi) collectInRange(node.right, lo, hi, results);}For augmented BSTs to be truly useful, we need O(log n) height—which requires self-balancing. The good news: augmentations that satisfy the augmentation theorem work seamlessly with AVL or Red-Black trees.
Key insight: Rotations (the core balancing operation) affect only O(1) nodes at a time. After each rotation, we update augmented fields on the affected nodes. Since each update is O(1), rotations remain O(1), and balancing remains O(log n).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
interface AugmentedNode<T> { key: number; value: T; left: AugmentedNode<T> | null; right: AugmentedNode<T> | null; height: number; // For AVL balance size: number; // For order statistics} /** * Right rotation with augmentation updates. * * Y X * / \ / \ * X γ → α Y * / \ / \ * α β β γ * * Both X and Y may have augmented fields that need updating. * Order matters: update Y first (now lower), then X (now higher). */function rotateRight<T>(y: AugmentedNode<T>): AugmentedNode<T> { const x = y.left!; const beta = x.right; // Perform rotation x.right = y; y.left = beta; // Update augmentations bottom-up updateAugmentations(y); // Y is now lower updateAugmentations(x); // X is now root, uses Y's updated values return x;} /** * Left rotation with augmentation updates. * Mirror of right rotation. */function rotateLeft<T>(x: AugmentedNode<T>): AugmentedNode<T> { const y = x.right!; const beta = y.left; // Perform rotation y.left = x; x.right = beta; // Update augmentations bottom-up updateAugmentations(x); // X is now lower updateAugmentations(y); // Y is now root return y;} function updateAugmentations<T>(node: AugmentedNode<T>): void { node.height = 1 + Math.max( getHeight(node.left), getHeight(node.right) ); node.size = 1 + getSize(node.left) + getSize(node.right);} function getHeight<T>(node: AugmentedNode<T> | null): number { return node === null ? -1 : node.height;} function getSize<T>(node: AugmentedNode<T> | null): number { return node === null ? 0 : node.size;} /** * Complete AVL insert with augmentation maintenance. */function avlInsertAugmented<T>( node: AugmentedNode<T> | null, key: number, value: T): AugmentedNode<T> { // Standard BST insert if (node === null) { return { key, value, left: null, right: null, height: 0, size: 1 }; } if (key < node.key) { node.left = avlInsertAugmented(node.left, key, value); } else { node.right = avlInsertAugmented(node.right, key, value); } // Update augmentations before potential rotations updateAugmentations(node); // Get balance factor const balance = getHeight(node.left) - getHeight(node.right); // Left-Left case if (balance > 1 && key < node.left!.key) { return rotateRight(node); } // Right-Right case if (balance < -1 && key > node.right!.key) { return rotateLeft(node); } // Left-Right case if (balance > 1 && key > node.left!.key) { node.left = rotateLeft(node.left!); return rotateRight(node); } // Right-Left case if (balance < -1 && key < node.right!.key) { node.right = rotateRight(node.right!); return rotateLeft(node); } return node;}After any restructuring (rotation, node removal), always update augmentations from bottom to top. A node's augmented field may depend on its children's fields, so children must be updated first. Missing or incorrect update order is a common source of bugs in augmented tree implementations.
Augmented BSTs power critical systems across many domains:
We've explored the powerful pattern of tree augmentation. Let's consolidate the key insights:
What's next:
Our exploration of BST extensions concludes with understanding how BSTs connect to the broader family of balanced tree structures. The final page examines the connection to other balanced trees—how concepts from BSTs extend to AVL trees, Red-Black trees, B-trees, and beyond, and how to choose the right variant for your needs.
You now understand augmented BSTs as a general pattern—how to identify valid augmentations, implement them correctly, maintain them through modifications and rotations, and apply them to solve real-world problems. This knowledge transforms BSTs from a basic data structure into a flexible framework for building specialized solutions.