Common BST Patterns - Learning Module

Loading content...

0/276

Range Queries in BST

Extracting Subsets Efficiently

Floor and ceiling let us find boundary values. But what about extracting all values within a range? This is the range query problem:

Given a BST and bounds [low, high], find all values k such that low ≤ k ≤ high.

Range queries are ubiquitous in databases, analytics, and real-time systems. Consider:

"Find all transactions between $100 and $500"
"List all events scheduled between 2pm and 5pm"
"Retrieve all users with ratings between 4.0 and 4.5"

A naive approach would check every node, taking O(n) time. But the BST property enables us to do much better by pruning subtrees that cannot contain valid results.

What You Will Learn

By the end of this page, you will understand how to efficiently extract all values within a range from a BST, master the pruning strategy that skips irrelevant subtrees, analyze the output-sensitive O(h + k) complexity, and implement multiple variants including counting and sum queries.

The Range Query Problem

Formal Definition:

Given:

A Binary Search Tree T with n nodes
A range [low, high] where low ≤ high

Return:

All values v in T such that low ≤ v ≤ high
Values should be returned in sorted order (a natural consequence of BST traversal)

The key insight is that the BST property tells us:

If a node's value is less than low, no need to explore its left subtree
If a node's value is greater than high, no need to explore its right subtree

This pruning allows us to skip entire subtrees, potentially eliminating most of the tree from consideration.

Converting Mermaid diagram...

Example: Range Query [30, 55]

In the tree above:

Green nodes (30, 35, 40) are within range and in the left portion
Blue nodes (50, 55, 60) show the path and boundary
Values in range: [30, 35, 40, 50, 55]

Notice that we don't need to visit:

The subtree rooted at 10 (all values < 30)
The subtree rooted at 90 (all values > 55)
Node 65 (> 55) and node 75 (> 55)

This pruning is what gives range queries their efficiency.

The Pruning Strategy

The pruning strategy is the heart of efficient range queries. At each node, we make intelligent decisions about which subtrees to explore:

Pruning Decision Matrix
Node Value	Left Subtree	Include Node?	Right Subtree
< low	Skip (all values < low)	No	Explore (might contain valid values)
in [low, high]	Explore (might contain valid values)	Yes	Explore (might contain valid values)
high	Explore (might contain valid values)	No	Skip (all values > high)

Why This Works:

If node.val < low: Due to the BST property, all values in the left subtree are even smaller. They're all below low, so we can safely skip the entire left subtree.
If node.val > high: Similarly, all values in the right subtree are even larger. They're all above high, so we can skip the entire right subtree.
If low ≤ node.val ≤ high: The current node is in range, and both subtrees might contain valid values—we must explore both.

This is a form of branch and bound optimization, where we use problem structure to eliminate large portions of the search space.

The Power of Pruning

In the best case, if the range is small relative to the tree, we might skip most nodes. In the worst case (range covers all values), we still visit all nodes—but no algorithm can do better when we need all values.

Basic Range Query Implementation

range_query.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
def range_query(root, low, high):
    """
    Find all values in the BST within [low, high].
    
    Returns values in sorted order (due to inorder traversal).
    
    Time Complexity: O(h + k) where h = height, k = number of results
    Space Complexity: O(h + k) for recursion stack and result storage
    """
    result = []
    
    def traverse(node):
        if node is None:
            return
        
        # Only explore left subtree if it might contain valid values
        # This happens when node.val > low (left subtree could have values >= low)
        if node.val > low:
            traverse(node.left)
        
        # Include current node if within range
        if low <= node.val <= high:
            result.append(node.val)
        
        # Only explore right subtree if it might contain valid values
        # This happens when node.val < high (right subtree could have values <= high)
        if node.val < high:
            traverse(node.right)
    
    traverse(root)
    return result
 
 
def range_query_iterative(root, low, high):
    """
    Iterative range query using explicit stack.
    
    Same complexity, but avoids recursion stack overflow
    for very deep trees.
    """
    result = []
    stack = []
    current = root
    
    # Modified inorder traversal with pruning
    while stack or current:
        # Go left only if we might find values >= low
        while current and current.val > low:
            stack.append(current)
            current = current.left
        
        # If current is too small, try going right
        if current and current.val < low:
            current = current.right
            continue
        
        # Current could be in range or None
        if current:
            if current.val <= high:
                result.append(current.val)
                current = current.right
            else:
                # Current > high, break
                break
        elif stack:
            current = stack.pop()
            if low <= current.val <= high:
                result.append(current.val)
            if current.val < high:
                current = current.right
            else:
                current = None
    
    return result
 
 
# Cleaner iterative version using standard inorder with pruning
def range_query_iterative_clean(root, low, high):
    """
    Cleaner iterative implementation using bounded inorder.
    """
    result = []
    stack = []
    current = root
    
    while stack or current:
        # Go as far left as possible, but stop if we're below low
        while current:
            stack.append(current)
            # Prune: if current.val <= low, no need to go further left
            if current.val <= low:
                break
            current = current.left
        
        current = stack.pop()
        
        # If in range, add to result
        if low <= current.val <= high:
            result.append(current.val)
        
        # If current.val >= high, no need to explore right
        if current.val >= high:
            current = None
        else:
            current = current.right
    
    return result

Complexity Analysis

Range query complexity is output-sensitive—it depends not just on the tree size, but on how many results we find:

Time Complexity: O(h + k)

Where:

h = height of the tree (at most log n for balanced, n for skewed)
k = number of values in the range (the output size)

Breaking this down:

O(h) for navigation: We traverse at most two root-to-leaf paths—one to find where values ≥ low begin, another to find where values ≤ high end.
O(k) for collection: Once we're in the valid range, we visit each of the k nodes in order.

The O(h + k) bound is tight. We can't avoid the O(h) navigation cost (we must descend to the range boundaries), and we must touch each output node at least once.

Complexity Examples
Scenario	Tree Size (n)	Height (h)	Results (k)	Time
Balanced, small range	1,000,000	20	100	O(120)
Balanced, large range	1,000,000	20	500,000	O(500,020)
Skewed, small range	1,000,000	1,000,000	100	O(1,000,100)
Empty range	1,000,000	20	0	O(20)

Space Complexity: O(h + k)

O(h) for the recursion stack (or explicit stack in iterative version)
O(k) for storing the result list

If you use a streaming/generator approach and process results one by one, space reduces to O(h).

Comparison with Array Binary Search

In a sorted array, range query takes O(log n + k) using binary search to find range boundaries. The BST version takes O(h + k), which is O(log n + k) for balanced trees—equivalent! But BSTs also support O(log n) insert/delete, whereas sorted arrays require O(n).

Variations — Count, Sum, Min, Max in Range

Often we don't need all values—we need aggregate information about them. Common variants include:

range_aggregates.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
def count_in_range(root, low, high):
    """
    Count values in [low, high] range.
    
    Time: O(h + k) - still need to visit all k nodes
    Space: O(h) - no result storage needed
    """
    def count(node):
        if node is None:
            return 0
        
        total = 0
        
        if node.val > low:
            total += count(node.left)
        
        if low <= node.val <= high:
            total += 1
        
        if node.val < high:
            total += count(node.right)
        
        return total
    
    return count(root)
 
 
def sum_in_range(root, low, high):
    """
    Sum all values in [low, high] range.
    
    Time: O(h + k)
    Space: O(h)
    """
    def range_sum(node):
        if node is None:
            return 0
        
        total = 0
        
        if node.val > low:
            total += range_sum(node.left)
        
        if low <= node.val <= high:
            total += node.val
        
        if node.val < high:
            total += range_sum(node.right)
        
        return total
    
    return range_sum(root)
 
 
def min_in_range(root, low, high):
    """
    Find minimum value in [low, high] range.
    
    The minimum is either ceiling(low) (if it exists and <= high)
    or no valid value exists.
    
    Time: O(h) - just need to find ceiling
    Space: O(h) or O(1) iteratively
    """
    # Use ceiling to find smallest value >= low
    result = None
    current = root
    
    while current:
        if current.val < low:
            current = current.right
        elif current.val > high:
            current = current.left
        else:
            # current.val in [low, high]
            # This is a candidate, but maybe something smaller exists on the left
            result = current.val
            current = current.left
    
    return result
 
 
def max_in_range(root, low, high):
    """
    Find maximum value in [low, high] range.
    
    The maximum is either floor(high) (if it exists and >= low)
    or no valid value exists.
    
    Time: O(h)
    Space: O(h) or O(1) iteratively
    """
    result = None
    current = root
    
    while current:
        if current.val > high:
            current = current.left
        elif current.val < low:
            current = current.right
        else:
            # current.val in [low, high]
            # This is a candidate, but maybe something larger exists on the right
            result = current.val
            current = current.right
    
    return result
 
 
def range_stats(root, low, high):
    """
    Compute multiple statistics in a single traversal.
    
    Returns: (count, sum, min, max) for values in [low, high]
    
    More efficient than calling each function separately.
    """
    stats = {
        'count': 0,
        'sum': 0,
        'min': float('inf'),
        'max': float('-inf')
    }
    
    def traverse(node):
        if node is None:
            return
        
        if node.val > low:
            traverse(node.left)
        
        if low <= node.val <= high:
            stats['count'] += 1
            stats['sum'] += node.val
            stats['min'] = min(stats['min'], node.val)
            stats['max'] = max(stats['max'], node.val)
        
        if node.val < high:
            traverse(node.right)
    
    traverse(root)
    
    # Handle empty range
    if stats['count'] == 0:
        stats['min'] = None
        stats['max'] = None
    
    return (stats['count'], stats['sum'], stats['min'], stats['max'])

Optimization Insight

For min/max in range, we can achieve O(h) without visiting all k nodes. Minimum is the ceiling of low (if within range), maximum is the floor of high (if within range). Count and sum still require O(h + k) in basic BSTs, but augmented BSTs can achieve O(log n) for these too.

Streaming Range Query with Generators

When dealing with large ranges, storing all results in a list may be impractical. A generator (or iterator) approach yields values one at a time, reducing space from O(k) to O(h):

streaming_range.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def range_query_generator(root, low, high):
    """
    Generator that yields values in [low, high] one at a time.
    
    Memory-efficient: O(h) space instead of O(h + k).
    Caller can stop early without processing entire range.
    
    Time: O(h + k) total if all values consumed
    Space: O(h) at any point
    """
    def traverse(node):
        if node is None:
            return
        
        # Traverse left if it might contain valid values
        if node.val > low:
            yield from traverse(node.left)
        
        # Yield current if in range
        if low <= node.val <= high:
            yield node.val
        
        # Traverse right if it might contain valid values
        if node.val < high:
            yield from traverse(node.right)
    
    return traverse(root)
 
 
# Usage examples:
def print_range(root, low, high):
    """Print all values in range without storing them."""
    for value in range_query_generator(root, low, high):
        print(value)
 
 
def first_n_in_range(root, low, high, n):
    """Get first n values in range (early termination)."""
    result = []
    for value in range_query_generator(root, low, high):
        result.append(value)
        if len(result) >= n:
            break
    return result
 
 
def exists_in_range(root, low, high):
    """Check if any value exists in range."""
    for _ in range_query_generator(root, low, high):
        return True
    return False
 
 
# Iterative generator (no recursion)
def range_query_iterator(root, low, high):
    """
    Truly iterative range iterator using explicit stack.
    
    Avoids Python's generator recursion limit issues.
    """
    stack = []
    current = root
    
    while stack or current:
        # Go left, but only if values >= low might exist
        while current:
            if current.val > low:
                stack.append(current)
                current = current.left
            else:
                stack.append(current)
                break
        
        if stack:
            current = stack.pop()
            
            if low <= current.val <= high:
                yield current.val
            
            if current.val < high:
                current = current.right
            else:
                current = None

Benefits of Streaming:

Memory efficiency: Only O(h) space instead of O(h + k)
Early termination: Stop as soon as you have enough results
Composability: Chain with other operations (map, filter, take)
Lazy evaluation: Don't compute until values are needed

Augmented BSTs for O(log n) Range Count

The basic range count takes O(h + k) because we must visit every node in the range. But what if we need only the count, not the actual values? With augmented BSTs, we can achieve O(log n)!

Key Idea: Store the size of each subtree at every node. This allows us to count nodes without visiting them.

augmented_count.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class AugmentedNode:
    """BST node augmented with subtree size."""
    def __init__(self, val):
        self.val = val
        self.left = None
        self.right = None
        self.size = 1  # Size of subtree rooted at this node
 
 
def get_size(node):
    """Helper to safely get size (handles None)."""
    return node.size if node else 0
 
 
def update_size(node):
    """Update size based on children sizes."""
    if node:
        node.size = 1 + get_size(node.left) + get_size(node.right)
 
 
def count_less_than(root, x):
    """
    Count nodes with value < x.
    
    Time: O(h) = O(log n) for balanced tree
    """
    count = 0
    current = root
    
    while current:
        if current.val < x:
            # Current node and entire left subtree are < x
            count += 1 + get_size(current.left)
            current = current.right
        else:
            # Current node >= x, only left subtree might have values < x
            current = current.left
    
    return count
 
 
def count_less_or_equal(root, x):
    """
    Count nodes with value <= x.
    
    Time: O(h) = O(log n) for balanced tree
    """
    count = 0
    current = root
    
    while current:
        if current.val <= x:
            # Current node and entire left subtree are <= x
            count += 1 + get_size(current.left)
            current = current.right
        else:
            # Current node > x, only left subtree might have values <= x
            current = current.left
    
    return count
 
 
def count_in_range_optimized(root, low, high):
    """
    Count nodes in [low, high] range in O(log n).
    
    Uses the identity:
    count([low, high]) = count(<= high) - count(< low)
                       = count(<= high) - count(<= low - 1)
    
    Time: O(h) = O(log n) for balanced tree
    Space: O(1)
    """
    # For integers: count(<= high) - count(<= low - 1)
    # General: count(<= high) - count(< low)
    return count_less_or_equal(root, high) - count_less_than(root, low)
 
 
# Example: Maintaining size during insertions
def insert_augmented(root, val):
    """
    Insert while maintaining subtree sizes.
    
    Time: O(h)
    """
    if root is None:
        return AugmentedNode(val)
    
    if val < root.val:
        root.left = insert_augmented(root.left, val)
    else:
        root.right = insert_augmented(root.right, val)
    
    update_size(root)
    return root

Trade-off

Augmentation requires maintaining size information during every insert and delete. This adds constant-factor overhead to mutations. Use augmented BSTs when range count queries are frequent enough to justify the maintenance cost.

Real-World Applications

Range queries are a workhorse operation in many systems:

Production Use Cases

•Database Indices: B-trees and B+ trees (generalizations of BSTs) power database range scans. 'SELECT * FROM orders WHERE date BETWEEN '2024-01-01' AND '2024-03-31'' is a range query.
•Time-Series Data: IoT sensors, financial tickers, and logging systems constantly query 'all values between timestamps T1 and T2'.
•Geographic Information Systems: Spatial databases use tree structures (like R-trees) for range queries over coordinates ('find all restaurants within these lat/long bounds').
•Analytics Dashboards: Aggregations like 'sum of sales between $100 and $1000' or 'count of users with age 25-35' are range aggregate queries.
•Auto-complete Systems: Finding all words that start with a prefix (trie range query) or all entries within a score threshold.
•Memory Allocators: Free-list managers query for memory blocks within a size range.
•Calendar Applications: Finding all events within a time window is a range query over event start times.

Example: Java's NavigableSet

Java's TreeSet (backed by a Red-Black Tree) provides range query methods:

NavigableSet<Integer> set = new TreeSet<>();
set.addAll(Arrays.asList(10, 20, 30, 40, 50, 60, 70));

// Range query: all values in [25, 55]
SortedSet<Integer> subset = set.subSet(25, true, 55, true);
// Result: [30, 40, 50]

// Exclusive bounds
SortedSet<Integer> exclusive = set.subSet(30, false, 60, false);
// Result: [40, 50]

The subSet method returns a view backed by the original tree—changes to the view affect the tree and vice versa. This is the streaming/iterator pattern in action.

Common Mistakes and Edge Cases

Range queries have several subtle pitfalls:

Common Mistakes

•Incorrect pruning conditions: Using >= instead of > (or vice versa) changes whether you include boundary values. Be consistent with your range definition.
•Off-by-one in boundaries: Clarify whether bounds are inclusive or exclusive. [low, high] vs (low, high) vs [low, high) all behave differently.
•Missing base case: Forgetting to handle null nodes causes null pointer exceptions.
•Not handling empty range: When low > high, or when no values exist in the range, ensure you return an empty result (not null or error).
•Visiting nodes twice: In poorly structured code, you might process the same node multiple times. The standard inorder-with-pruning pattern avoids this.
•Inefficient pruning: Pruning too late (after descending) wastes work. The condition checks should happen before recursing.

Edge Cases to Test
Case	Input	Expected Output
Empty tree	root=null, [1, 10]	[]
No values in range	[100, 200] in tree with values 1-50	[]
All values in range	[1, 100] in tree with values 10-50	[10, 20, ..., 50]
Single value	[25, 25] in tree with 25	[25]
Single value missing	[26, 26] in tree without 26	[]
Low equals high	[50, 50]	[50] if exists, else []
Reversed bounds	[100, 50]	[] (invalid range)

Summary and Key Takeaways

Range queries leverage the BST property to efficiently extract subsets of data. Let's consolidate the key insights:

Key Takeaways

•Pruning is the key optimization: Skip left subtree when node.val < low; skip right subtree when node.val > high. This turns O(n) into O(h + k).
•Output-sensitive complexity O(h + k): Time depends on tree height (navigation) plus output size (collection). Efficient for small ranges.
•Streaming reduces space to O(h): Generator/iterator patterns yield values one at a time, enabling early termination and memory efficiency.
•Variants are straightforward: Count, sum, min, max in range follow the same traversal pattern with different accumulation logic.
•Augmented BSTs enable O(log n) counting: By storing subtree sizes, we can count without visiting each node—useful for frequent count queries.
•Standard libraries provide range queries: TreeSet, TreeMap, and equivalents offer built-in methods like subSet, subMap, which are backed by these algorithms.

Looking Ahead:

We've now covered validation, floor/ceiling, and range queries—all operations on existing BSTs. But how do we construct a BST efficiently? The next page addresses converting a sorted array into a balanced BST, ensuring optimal O(log n) height from the start.

Page Complete

You now understand how to efficiently extract values within a range from a BST using pruning strategies. You can implement basic and streaming range queries, compute range aggregates, and appreciate when augmented BSTs provide additional speedups. Next, we'll explore converting sorted arrays to balanced BSTs.