Loading content...
Floor and ceiling let us find boundary values. But what about extracting all values within a range? This is the range query problem:
Given a BST and bounds [low, high], find all values k such that low ≤ k ≤ high.
Range queries are ubiquitous in databases, analytics, and real-time systems. Consider:
A naive approach would check every node, taking O(n) time. But the BST property enables us to do much better by pruning subtrees that cannot contain valid results.
By the end of this page, you will understand how to efficiently extract all values within a range from a BST, master the pruning strategy that skips irrelevant subtrees, analyze the output-sensitive O(h + k) complexity, and implement multiple variants including counting and sum queries.
Formal Definition:
Given:
Return:
The key insight is that the BST property tells us:
low, no need to explore its left subtreehigh, no need to explore its right subtreeThis pruning allows us to skip entire subtrees, potentially eliminating most of the tree from consideration.
Example: Range Query [30, 55]
In the tree above:
Notice that we don't need to visit:
This pruning is what gives range queries their efficiency.
The pruning strategy is the heart of efficient range queries. At each node, we make intelligent decisions about which subtrees to explore:
| Node Value | Left Subtree | Include Node? | Right Subtree |
|---|---|---|---|
| < low | Skip (all values < low) | No | Explore (might contain valid values) |
| in [low, high] | Explore (might contain valid values) | Yes | Explore (might contain valid values) |
high | Explore (might contain valid values) | No | Skip (all values > high) |
Why This Works:
If node.val < low: Due to the BST property, all values in the left subtree are even smaller. They're all below low, so we can safely skip the entire left subtree.
If node.val > high: Similarly, all values in the right subtree are even larger. They're all above high, so we can skip the entire right subtree.
If low ≤ node.val ≤ high: The current node is in range, and both subtrees might contain valid values—we must explore both.
This is a form of branch and bound optimization, where we use problem structure to eliminate large portions of the search space.
In the best case, if the range is small relative to the tree, we might skip most nodes. In the worst case (range covers all values), we still visit all nodes—but no algorithm can do better when we need all values.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
def range_query(root, low, high): """ Find all values in the BST within [low, high]. Returns values in sorted order (due to inorder traversal). Time Complexity: O(h + k) where h = height, k = number of results Space Complexity: O(h + k) for recursion stack and result storage """ result = [] def traverse(node): if node is None: return # Only explore left subtree if it might contain valid values # This happens when node.val > low (left subtree could have values >= low) if node.val > low: traverse(node.left) # Include current node if within range if low <= node.val <= high: result.append(node.val) # Only explore right subtree if it might contain valid values # This happens when node.val < high (right subtree could have values <= high) if node.val < high: traverse(node.right) traverse(root) return result def range_query_iterative(root, low, high): """ Iterative range query using explicit stack. Same complexity, but avoids recursion stack overflow for very deep trees. """ result = [] stack = [] current = root # Modified inorder traversal with pruning while stack or current: # Go left only if we might find values >= low while current and current.val > low: stack.append(current) current = current.left # If current is too small, try going right if current and current.val < low: current = current.right continue # Current could be in range or None if current: if current.val <= high: result.append(current.val) current = current.right else: # Current > high, break break elif stack: current = stack.pop() if low <= current.val <= high: result.append(current.val) if current.val < high: current = current.right else: current = None return result # Cleaner iterative version using standard inorder with pruningdef range_query_iterative_clean(root, low, high): """ Cleaner iterative implementation using bounded inorder. """ result = [] stack = [] current = root while stack or current: # Go as far left as possible, but stop if we're below low while current: stack.append(current) # Prune: if current.val <= low, no need to go further left if current.val <= low: break current = current.left current = stack.pop() # If in range, add to result if low <= current.val <= high: result.append(current.val) # If current.val >= high, no need to explore right if current.val >= high: current = None else: current = current.right return resultRange query complexity is output-sensitive—it depends not just on the tree size, but on how many results we find:
Time Complexity: O(h + k)
Where:
Breaking this down:
O(h) for navigation: We traverse at most two root-to-leaf paths—one to find where values ≥ low begin, another to find where values ≤ high end.
O(k) for collection: Once we're in the valid range, we visit each of the k nodes in order.
The O(h + k) bound is tight. We can't avoid the O(h) navigation cost (we must descend to the range boundaries), and we must touch each output node at least once.
| Scenario | Tree Size (n) | Height (h) | Results (k) | Time |
|---|---|---|---|---|
| Balanced, small range | 1,000,000 | 20 | 100 | O(120) |
| Balanced, large range | 1,000,000 | 20 | 500,000 | O(500,020) |
| Skewed, small range | 1,000,000 | 1,000,000 | 100 | O(1,000,100) |
| Empty range | 1,000,000 | 20 | 0 | O(20) |
Space Complexity: O(h + k)
If you use a streaming/generator approach and process results one by one, space reduces to O(h).
In a sorted array, range query takes O(log n + k) using binary search to find range boundaries. The BST version takes O(h + k), which is O(log n + k) for balanced trees—equivalent! But BSTs also support O(log n) insert/delete, whereas sorted arrays require O(n).
Often we don't need all values—we need aggregate information about them. Common variants include:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
def count_in_range(root, low, high): """ Count values in [low, high] range. Time: O(h + k) - still need to visit all k nodes Space: O(h) - no result storage needed """ def count(node): if node is None: return 0 total = 0 if node.val > low: total += count(node.left) if low <= node.val <= high: total += 1 if node.val < high: total += count(node.right) return total return count(root) def sum_in_range(root, low, high): """ Sum all values in [low, high] range. Time: O(h + k) Space: O(h) """ def range_sum(node): if node is None: return 0 total = 0 if node.val > low: total += range_sum(node.left) if low <= node.val <= high: total += node.val if node.val < high: total += range_sum(node.right) return total return range_sum(root) def min_in_range(root, low, high): """ Find minimum value in [low, high] range. The minimum is either ceiling(low) (if it exists and <= high) or no valid value exists. Time: O(h) - just need to find ceiling Space: O(h) or O(1) iteratively """ # Use ceiling to find smallest value >= low result = None current = root while current: if current.val < low: current = current.right elif current.val > high: current = current.left else: # current.val in [low, high] # This is a candidate, but maybe something smaller exists on the left result = current.val current = current.left return result def max_in_range(root, low, high): """ Find maximum value in [low, high] range. The maximum is either floor(high) (if it exists and >= low) or no valid value exists. Time: O(h) Space: O(h) or O(1) iteratively """ result = None current = root while current: if current.val > high: current = current.left elif current.val < low: current = current.right else: # current.val in [low, high] # This is a candidate, but maybe something larger exists on the right result = current.val current = current.right return result def range_stats(root, low, high): """ Compute multiple statistics in a single traversal. Returns: (count, sum, min, max) for values in [low, high] More efficient than calling each function separately. """ stats = { 'count': 0, 'sum': 0, 'min': float('inf'), 'max': float('-inf') } def traverse(node): if node is None: return if node.val > low: traverse(node.left) if low <= node.val <= high: stats['count'] += 1 stats['sum'] += node.val stats['min'] = min(stats['min'], node.val) stats['max'] = max(stats['max'], node.val) if node.val < high: traverse(node.right) traverse(root) # Handle empty range if stats['count'] == 0: stats['min'] = None stats['max'] = None return (stats['count'], stats['sum'], stats['min'], stats['max'])For min/max in range, we can achieve O(h) without visiting all k nodes. Minimum is the ceiling of low (if within range), maximum is the floor of high (if within range). Count and sum still require O(h + k) in basic BSTs, but augmented BSTs can achieve O(log n) for these too.
When dealing with large ranges, storing all results in a list may be impractical. A generator (or iterator) approach yields values one at a time, reducing space from O(k) to O(h):
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
def range_query_generator(root, low, high): """ Generator that yields values in [low, high] one at a time. Memory-efficient: O(h) space instead of O(h + k). Caller can stop early without processing entire range. Time: O(h + k) total if all values consumed Space: O(h) at any point """ def traverse(node): if node is None: return # Traverse left if it might contain valid values if node.val > low: yield from traverse(node.left) # Yield current if in range if low <= node.val <= high: yield node.val # Traverse right if it might contain valid values if node.val < high: yield from traverse(node.right) return traverse(root) # Usage examples:def print_range(root, low, high): """Print all values in range without storing them.""" for value in range_query_generator(root, low, high): print(value) def first_n_in_range(root, low, high, n): """Get first n values in range (early termination).""" result = [] for value in range_query_generator(root, low, high): result.append(value) if len(result) >= n: break return result def exists_in_range(root, low, high): """Check if any value exists in range.""" for _ in range_query_generator(root, low, high): return True return False # Iterative generator (no recursion)def range_query_iterator(root, low, high): """ Truly iterative range iterator using explicit stack. Avoids Python's generator recursion limit issues. """ stack = [] current = root while stack or current: # Go left, but only if values >= low might exist while current: if current.val > low: stack.append(current) current = current.left else: stack.append(current) break if stack: current = stack.pop() if low <= current.val <= high: yield current.val if current.val < high: current = current.right else: current = NoneBenefits of Streaming:
The basic range count takes O(h + k) because we must visit every node in the range. But what if we need only the count, not the actual values? With augmented BSTs, we can achieve O(log n)!
Key Idea: Store the size of each subtree at every node. This allows us to count nodes without visiting them.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
class AugmentedNode: """BST node augmented with subtree size.""" def __init__(self, val): self.val = val self.left = None self.right = None self.size = 1 # Size of subtree rooted at this node def get_size(node): """Helper to safely get size (handles None).""" return node.size if node else 0 def update_size(node): """Update size based on children sizes.""" if node: node.size = 1 + get_size(node.left) + get_size(node.right) def count_less_than(root, x): """ Count nodes with value < x. Time: O(h) = O(log n) for balanced tree """ count = 0 current = root while current: if current.val < x: # Current node and entire left subtree are < x count += 1 + get_size(current.left) current = current.right else: # Current node >= x, only left subtree might have values < x current = current.left return count def count_less_or_equal(root, x): """ Count nodes with value <= x. Time: O(h) = O(log n) for balanced tree """ count = 0 current = root while current: if current.val <= x: # Current node and entire left subtree are <= x count += 1 + get_size(current.left) current = current.right else: # Current node > x, only left subtree might have values <= x current = current.left return count def count_in_range_optimized(root, low, high): """ Count nodes in [low, high] range in O(log n). Uses the identity: count([low, high]) = count(<= high) - count(< low) = count(<= high) - count(<= low - 1) Time: O(h) = O(log n) for balanced tree Space: O(1) """ # For integers: count(<= high) - count(<= low - 1) # General: count(<= high) - count(< low) return count_less_or_equal(root, high) - count_less_than(root, low) # Example: Maintaining size during insertionsdef insert_augmented(root, val): """ Insert while maintaining subtree sizes. Time: O(h) """ if root is None: return AugmentedNode(val) if val < root.val: root.left = insert_augmented(root.left, val) else: root.right = insert_augmented(root.right, val) update_size(root) return rootAugmentation requires maintaining size information during every insert and delete. This adds constant-factor overhead to mutations. Use augmented BSTs when range count queries are frequent enough to justify the maintenance cost.
Range queries are a workhorse operation in many systems:
Example: Java's NavigableSet
Java's TreeSet (backed by a Red-Black Tree) provides range query methods:
NavigableSet<Integer> set = new TreeSet<>();
set.addAll(Arrays.asList(10, 20, 30, 40, 50, 60, 70));
// Range query: all values in [25, 55]
SortedSet<Integer> subset = set.subSet(25, true, 55, true);
// Result: [30, 40, 50]
// Exclusive bounds
SortedSet<Integer> exclusive = set.subSet(30, false, 60, false);
// Result: [40, 50]
The subSet method returns a view backed by the original tree—changes to the view affect the tree and vice versa. This is the streaming/iterator pattern in action.
Range queries have several subtle pitfalls:
>= instead of > (or vice versa) changes whether you include boundary values. Be consistent with your range definition.| Case | Input | Expected Output |
|---|---|---|
| Empty tree | root=null, [1, 10] | [] |
| No values in range | [100, 200] in tree with values 1-50 | [] |
| All values in range | [1, 100] in tree with values 10-50 | [10, 20, ..., 50] |
| Single value | [25, 25] in tree with 25 | [25] |
| Single value missing | [26, 26] in tree without 26 | [] |
| Low equals high | [50, 50] | [50] if exists, else [] |
| Reversed bounds | [100, 50] | [] (invalid range) |
Range queries leverage the BST property to efficiently extract subsets of data. Let's consolidate the key insights:
Looking Ahead:
We've now covered validation, floor/ceiling, and range queries—all operations on existing BSTs. But how do we construct a BST efficiently? The next page addresses converting a sorted array into a balanced BST, ensuring optimal O(log n) height from the start.
You now understand how to efficiently extract values within a range from a BST using pruning strategies. You can implement basic and streaming range queries, compute range aggregates, and appreciate when augmented BSTs provide additional speedups. Next, we'll explore converting sorted arrays to balanced BSTs.