Data Structures & AlgorithmsBalanced Tree Operations & Guarantees

Balanced Tree Operations & Guarantees

LevelIntermediate

Duration75 mins

TopicBalanced Tree Operations & Guarantees

3 / 4

Search (Unchanged from BST)

The Simplicity of Search

After the complexities of insertion with rotations and deletion with its many cases, search comes as a breath of fresh air. The search operation in balanced trees is exactly the same as in standard Binary Search Trees—no modifications, no additional logic, no rebalancing.

Why is this significant?

The entire purpose of balanced trees is to guarantee fast search times. We accepted the overhead of maintaining balance during insertions and deletions precisely so that every search would be fast. If search itself became more complex or slower due to balancing, we'd have defeated our purpose.

Balanced trees achieve the ideal outcome: the tree maintains balance automatically during modifications, but exploits that balance during search without any additional work. The complexity is front-loaded into writes; reads are clean and fast.

What You Will Learn

By the end of this page, you will understand:

• Why search doesn't need modification for balanced trees • How the BST property enables binary search regardless of tree shape • Why guaranteed height bounds translate directly to guaranteed search time • The relationship between tree height and search complexity • How search is the primary beneficiary of all the rebalancing work • Why read-heavy workloads are ideally suited for balanced trees

The BST Search Algorithm: A Review

The BST search algorithm is one of the most elegant algorithms in computer science. Its simplicity belies its power: by exploiting the ordering property of BSTs, we can eliminate half the remaining candidates with each comparison.

The algorithm:

Start at the root
Compare the target key with the current node's key
If equal: found—return the node
If target < current: recurse into the left subtree
If target > current: recurse into the right subtree
If we reach a null pointer: not found—return null

This is binary search in tree form. At each step, we're deciding: could the target be in the left subtree, or the right subtree? The BST property guarantees that if the target exists, there's exactly one correct direction to go.

bst_search.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def search(node, key):
    """
    Search for a key in a BST rooted at node.
    Returns the node containing the key, or None if not found.
    
    This exact code works for:
    - Standard BST
    - AVL Trees
    - Red-Black Trees
    - Any balanced BST variant
    
    The algorithm is identical; only the tree structure differs.
    """
    # Base case: empty tree or key not found
    if node is None:
        return None
    
    # Found the key
    if key == node.key:
        return node
    
    # BST property: key < node.key means search left
    if key < node.key:
        return search(node.left, key)
    
    # BST property: key > node.key means search right
    return search(node.right, key)
 
 
def search_iterative(root, key):
    """
    Iterative version of BST search.
    Same logic, but uses a loop instead of recursion.
    Often preferred for efficiency (no function call overhead).
    """
    current = root
    
    while current is not None:
        if key == current.key:
            return current
        elif key < current.key:
            current = current.left
        else:
            current = current.right
    
    return None

Why this code needs no modifications for balanced trees:

The search algorithm depends only on the BST property:

All keys in the left subtree are less than the current key
All keys in the right subtree are greater than the current key

Balanced trees (AVL, Red-Black, etc.) maintain this property exactly—they add additional constraints (balance factors, color rules), but they never violate the fundamental BST ordering. Therefore, the same search algorithm works unchanged.

What balanced trees guarantee:

In a standard BST, the search path length (number of comparisons) equals the depth of the target node, which is at most the tree's height h. The problem is that h can be as large as n-1 for a degenerate tree.

In a balanced tree, the height h is guaranteed to be O(log n). This transforms the worst-case from O(n) to O(log n):

Tree Type	Maximum Height	Search Complexity
Standard BST	n - 1	O(n) worst case
AVL Tree	1.44 log₂(n)	O(log n) guaranteed
Red-Black Tree	2 log₂(n)	O(log n) guaranteed

Separation of Concerns

This is separation of concerns at its finest. The BST property handles correctness (we find the right node). The balance property handles performance (we find it quickly). Search relies on the first and benefits from the second without needing to know about the second at all.

Why Search Doesn't Trigger Rebalancing

One might wonder: should we rebalance during search? After all, we're traversing the tree anyway—why not take the opportunity to improve its structure?

The answer is multifaceted: theoretical, practical, and architectural.

Theoretical reason: Search doesn't change structure

Rebalancing is triggered by structural changes—adding or removing nodes that alter subtree heights or color distributions. Search is a read-only operation. It traverses the tree without modifying any pointers or properties. If the tree was balanced before the search, it's balanced after. There's nothing to fix.

Practical reason: Search would become slower

If we added rebalancing logic to search, every search would pay the cost of:

Checking balance factors on the way down or up
Potentially performing rotations
Writing updates even for read operations

This would slow down the most common operation. The whole point of balancing the tree during modifications is to avoid any overhead during reads.

Architectural reason: Read-write separation

In concurrent systems, treating search as purely read-only is crucial. Read operations can proceed without locks (in certain implementations) or with read-locks that allow parallel readers. If search could trigger writes (rebalancing), we'd need write-locks for reads, eliminating scalability benefits.

Benefits of Search Being Unchanged

•Zero overhead: No balance checks, height updates, or rotations during search
•Cache-friendly: Searching is pure traversal—modern CPUs can prefetch the path
•Concurrency-friendly: Multiple threads can search simultaneously without interference
•Simple implementation: The search code is trivial—just comparisons and pointer following
•Predictable performance: Every search takes O(log n) with no rebalancing variance

A note on splay trees:

There exists a variant called splay trees that do rebalance during search. After finding a node, they 'splay' it to the root through a sequence of rotations. This provides amortized O(log n) performance but suffers from exactly the drawbacks mentioned above—searches modify the tree, making them unsuitable for concurrent access and adding overhead to every read.

Splay trees demonstrate that the choice to not rebalance during search is a design decision, not necessarily the only option. But for most applications, the AVL/Red-Black approach of keeping search simple is preferred.

The Single-Writer Pattern

In systems with a single writer and multiple readers (common for data stores), keeping search read-only is essential. Writers can take exclusive locks during insert/delete and their rebalancing, but readers never need to wait for each other. If search could modify the tree, this pattern would break.

Height Bounds and Search Performance

The performance guarantee of search derives directly from the height guarantee of balanced trees. Let's examine this relationship rigorously.

Search complexity = O(height)

Each search comparison moves us one level deeper into the tree. In the worst case, we traverse from root to a leaf—a path of length equal to the tree's height. Therefore:

Time(search) = O(h) where h is tree height

Balanced tree height = O(log n)

Different balanced tree types achieve different height bounds, but all are logarithmic:

AVL Trees:

The strict balance condition (|balance factor| ≤ 1) yields the tightest bound:

h(n) ≤ 1.45 × log₂(n + 2) - 1.33

For practical purposes: h ≤ 1.5 × log₂(n)

This means an AVL tree with 1 million nodes has height at most ~30.

Red-Black Trees:

The looser balance condition (all paths have same black-height, longest path ≤ 2× shortest) yields:

h(n) ≤ 2 × log₂(n + 1)

A Red-Black tree with 1 million nodes has height at most ~40.

The logarithmic guarantee is what matters:

Both bounds are O(log n). The constant factors (1.45 vs 2) matter in practice but don't change the fundamental guarantee. Doubling the data doesn't double the search time—it adds just one or two extra comparisons.

Comparison: Search Performance Across Tree Types
n	Standard BST (worst)	AVL Tree (max)	Red-Black Tree (max)
100	99 comparisons	10 comparisons	14 comparisons
1,000	999 comparisons	15 comparisons	20 comparisons
10,000	9,999 comparisons	20 comparisons	27 comparisons
100,000	99,999 comparisons	25 comparisons	34 comparisons
1,000,000	999,999 comparisons	30 comparisons	40 comparisons
1,000,000,000	999,999,999 comparisons	45 comparisons	60 comparisons

Interpreting the numbers:

For 1 billion elements:

A degenerate BST could require 1 billion comparisons per search
An AVL tree requires at most 45 comparisons
A Red-Black tree requires at most 60 comparisons

At 1 nanosecond per comparison, that's the difference between 1 second (degenerate BST) and 60 nanoseconds (Red-Black tree). For a database serving thousands of queries per second, this difference is catastrophic vs. imperceptible.

The scaling benefit:

As data grows 10×, search time grows by only ~3-4 comparisons. As data grows 1000×, search time grows by only ~10 comparisons. This sublinear scaling is the hallmark of logarithmic algorithms and the primary motivation for using balanced trees.

Log Base Doesn't Matter for Big-O

Whether we measure height in log₂, log₁₀, or ln, the big-O complexity is the same: O(log n). This is because all logarithms differ only by constant factors: log₂(n) = log₁₀(n)/log₁₀(2) ≈ 3.32 × log₁₀(n). In practice, we typically use log₂ because it directly reflects the binary branching of these trees.

Search: The Primary Beneficiary of Balance

Why do we go through all the trouble of maintaining balance—the rotations, the height tracking, the complex deletion cases? The answer: for the sake of search.

The read-heavy reality:

Most data structures are read much more often than they are written. Consider typical systems:

Databases: Queries (reads) vastly outnumber insertions and deletions
Caches: Lookups dominate; updates are infrequent
Symbol tables: Compilers look up identifiers repeatedly, add them once
Routing tables: Lookups per packet; updates periodic
Spell checkers: Dictionary lookups for every word; dictionary rarely modified

In a 100:1 read:write ratio (conservative for many applications), optimizing reads at the cost of writes is clearly worthwhile.

What Writes Pay

•Height/color tracking overhead
•Balance factor computation
•Rotation operations (pointer reassignment)
•Multiple rotations for deletion
•Code complexity in implementation

What Reads Gain

•Guaranteed O(log n) time—no exceptions
•Zero overhead—no balancing checks
•Predictable latency for all queries
•Scalability—adding data doesn't slow reads much
•Simple, fast implementation

The amortization perspective:

Suppose each insertion takes O(log n) time including rebalancing overhead, and each search takes O(log n) time. The O(log n) search is made possible by the tree shape that the O(log n) insertions maintained.

In a standard BST, insertions are O(log n) average but searching becomes O(n) worst-case if the tree degenerates. We'd have 'saved' time on writes but paid a much higher cost on reads.

With a balanced tree:

1000 insertions: ~1000 × log(n) work
100,000 searches: ~100,000 × log(n) work
Total: ~101,000 × log(n) work

With a degenerate BST:

1000 insertions: ~1000 × log(n) average work
100,000 searches: ~100,000 × n work (worst case)
Total: dominated by 100,000 × n

For n = 1000, balanced tree does ~100,010 × 10 = ~1M units of work while degenerate BST does ~100,000 × 1000 = ~100M units—a 100× difference that grows with both n and the read:write ratio.

The Right Trade-Off

Balanced trees embody a fundamental engineering trade-off: accept constant-factor overhead on infrequent writes to guarantee fast frequent reads. This trade-off is almost universally beneficial for real-world workloads.

Beyond Point Search: Range Queries and Iteration

While we've focused on point search (finding a single key), balanced trees excel at operations that extend beyond single lookups. The same O(log n) height guarantee benefits these operations too.

Range queries:

A range query asks: find all keys between L and R. The algorithm:

Search for L, finding the position where L would be (or the smallest key ≥ L)
Traverse in-order from that position, collecting keys until we exceed R
Total time: O(log n + k) where k is the number of keys in range

The O(log n) term is finding the starting point (a search). The k term is visiting the k results. In a balanced tree, this is optimal—we can't do better than visiting each result once, and we reach the first result in logarithmic time.

Ordered iteration:

Want to process keys in sorted order? In-order traversal of a balanced BST visits all n keys in O(n) time—optimal for any ordered structure. The shape (balanced vs. unbalanced) doesn't affect traversal time; it's always O(n). But for a balanced tree, we get the bonus of fast search interleaved with traversal.

Floor and ceiling queries:

floor(k): largest key ≤ k
ceiling(k): smallest key ≥ k

Both are O(log n) in balanced trees—essentially modified searches that track the best candidate while traversing.

range_query.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def range_query(root, low, high):
    """
    Find all keys in the BST between low and high (inclusive).
    Time: O(log n + k) where k is the number of keys in range.
    Space: O(h) for recursion stack, O(k) for result list.
    """
    result = []
    _range_query_helper(root, low, high, result)
    return result
 
 
def _range_query_helper(node, low, high, result):
    """Recursive helper for range query."""
    if node is None:
        return
    
    # If current key > low, there might be results in left subtree
    if node.key > low:
        _range_query_helper(node.left, low, high, result)
    
    # If current key is in range, include it
    if low <= node.key <= high:
        result.append(node.key)
    
    # If current key < high, there might be results in right subtree
    if node.key < high:
        _range_query_helper(node.right, low, high, result)
 
 
def floor(root, key):
    """
    Find the largest key ≤ given key.
    Returns None if all keys are greater than given key.
    Time: O(log n) for balanced tree.
    """
    result = None
    current = root
    
    while current is not None:
        if current.key == key:
            return current.key
        elif current.key < key:
            result = current.key  # Candidate for floor
            current = current.right  # Look for larger candidate
        else:
            current = current.left  # Current key too large
    
    return result

Why This Matters for Databases

Database indexes often need range scans (SELECT ... WHERE date BETWEEN x AND y). A balanced tree index enables O(log n) time to find the start of the range, then linear time to scan results. This is why B-trees (multi-way balanced trees) are the dominant database index structure—they provide the same logarithmic search guarantees with better disk I/O characteristics.

Search Performance in Practice

The theoretical O(log n) guarantee is reassuring, but how does search perform in real systems? Several factors influence practical performance beyond the basic complexity analysis.

Cache effects:

Modern CPUs have multi-level caches that dramatically speed up memory access—if data is cache-resident. When searching through a balanced tree:

Nodes near the root are accessed frequently and likely cache-hot
The top few levels of the tree often fit entirely in L1/L2 cache
Deeper levels may incur cache misses

For a 1-million-node tree with 30 levels, the top 10 levels (about 1000 nodes) might fit in L2 cache. Most searches traverse these cached levels, then only 20 more levels—not 30 fresh cache lines from scratch.

Comparison costs:

Our analysis assumes O(1) comparisons. For integers, this is accurate. For strings, comparisons can be O(m) where m is string length. For complex objects with expensive comparison functions, the comparison cost dominates.

Real complexity for string keys: O(m × log n) where m is average string length.

Memory layout:

Balanced trees are pointer-based structures. Each node is allocated separately, scattered in memory. This is less cache-friendly than arrays. For small n, linear search through a sorted array can outperform tree search despite O(n) vs O(log n) complexity—the array's cache-friendliness wins for small data.

Practical Performance Considerations
Factor	Impact	Mitigation
Cache locality	Tree nodes scattered in memory	Use arena allocation to place nodes contiguously
Pointer chasing	Each level requires following a pointer	CPU prefetching helps; consider B-trees for disk
String keys	Comparison cost is O(key length)	Use prefix-sharing or hash acceleration
Small n	Linear search may be faster	Hybrid: linear for small, tree for large
Balancing overhead	Extra height/color fields per node	Minimal: 1-2 bytes per node

When balanced trees shine:

Large datasets: n > 10,000 is definitely tree territory. The O(log n) vs O(n) difference becomes undeniable.
Frequent reads: The more searches relative to modifications, the more the zero-overhead search pays off.
Ordered operations needed: If you need range queries, ordered iteration, or floor/ceiling, trees are the right structure.
Worst-case guarantees required: For latency-sensitive applications, the guaranteed O(log n) is essential.

When alternatives might win:

Point lookups only, unordered: Hash tables offer O(1) expected lookup.
Tiny datasets: Array or list search is simpler and cache-friendlier.
Static data: If data never changes, a sorted array with binary search is simpler and faster.
Disk-based storage: B-trees (not binary balanced trees) are preferred due to reduced I/O.

The Library Reality

In practice, you rarely implement balanced trees yourself. Standard libraries (TreeMap in Java, std::map in C++, SortedDict in Python's sortedcontainers) provide well-optimized implementations. Knowing how they work helps you choose the right structure and predict performance, even if you use library code.

Summary: Search (Unchanged from BST)

The search operation in balanced trees demonstrates the elegance of the self-balancing approach: all the complexity is absorbed by modifications, leaving search as a simple, fast, read-only operation that directly benefits from the maintained structure.

Key Takeaways

•Search is unchanged from BST: The same algorithm works for standard BSTs, AVL trees, and Red-Black trees
•No rebalancing overhead: Search is read-only; it doesn't trigger or need any balancing logic
•O(log n) guaranteed: Height bounds from balancing translate directly to search time bounds
•Search is the primary beneficiary: All the rebalancing work exists to enable fast, guaranteed search
•Trade-off works for real workloads: Read-heavy workloads amortize write-time overhead across many cheap reads
•Extends to range queries: Floor, ceiling, and range operations also benefit from logarithmic height
•Practical performance is excellent: For large n, balanced trees deliver on their theoretical promises

What's next:

With search, insertion, and deletion all achieving O(log n) time, we're ready to consolidate these results into the unifying guarantee of balanced trees: O(log n) for all operations, regardless of input order. The next page examines this guarantee formally and explores its implications for system design.

Page Complete

You now understand why search in balanced trees is identical to standard BST search, and why this is a feature rather than a limitation. The separation of concerns—ordering handled by the BST property, performance handled by the balance property—yields an elegant and efficient search operation that benefits fully from the tree's guaranteed structure.

3 / 4

Loading learning content...

Data Structures & AlgorithmsBalanced Tree Operations & Guarantees

Balanced Tree Operations & Guarantees

LevelIntermediate

Duration75 mins

TopicBalanced Tree Operations & Guarantees

3 / 4

Search (Unchanged from BST)

The Simplicity of Search

Why is this significant?

What You Will Learn

By the end of this page, you will understand:

The BST Search Algorithm: A Review

The algorithm:

Start at the root
Compare the target key with the current node's key
If equal: found—return the node
If target < current: recurse into the left subtree
If target > current: recurse into the right subtree
If we reach a null pointer: not found—return null

bst_search.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def search(node, key):
    """
    Search for a key in a BST rooted at node.
    Returns the node containing the key, or None if not found.
    
    This exact code works for:
    - Standard BST
    - AVL Trees
    - Red-Black Trees
    - Any balanced BST variant
    
    The algorithm is identical; only the tree structure differs.
    """
    # Base case: empty tree or key not found
    if node is None:
        return None
    
    # Found the key
    if key == node.key:
        return node
    
    # BST property: key < node.key means search left
    if key < node.key:
        return search(node.left, key)
    
    # BST property: key > node.key means search right
    return search(node.right, key)
 
 
def search_iterative(root, key):
    """
    Iterative version of BST search.
    Same logic, but uses a loop instead of recursion.
    Often preferred for efficiency (no function call overhead).
    """
    current = root
    
    while current is not None:
        if key == current.key:
            return current
        elif key < current.key:
            current = current.left
        else:
            current = current.right
    
    return None

Why this code needs no modifications for balanced trees:

The search algorithm depends only on the BST property:

All keys in the left subtree are less than the current key
All keys in the right subtree are greater than the current key

What balanced trees guarantee:

In a balanced tree, the height h is guaranteed to be O(log n). This transforms the worst-case from O(n) to O(log n):

Tree Type	Maximum Height	Search Complexity
Standard BST	n - 1	O(n) worst case
AVL Tree	1.44 log₂(n)	O(log n) guaranteed
Red-Black Tree	2 log₂(n)	O(log n) guaranteed

Separation of Concerns

Why Search Doesn't Trigger Rebalancing

One might wonder: should we rebalance during search? After all, we're traversing the tree anyway—why not take the opportunity to improve its structure?

The answer is multifaceted: theoretical, practical, and architectural.

Theoretical reason: Search doesn't change structure

Practical reason: Search would become slower

If we added rebalancing logic to search, every search would pay the cost of:

Checking balance factors on the way down or up
Potentially performing rotations
Writing updates even for read operations

This would slow down the most common operation. The whole point of balancing the tree during modifications is to avoid any overhead during reads.

Architectural reason: Read-write separation

Benefits of Search Being Unchanged

•Zero overhead: No balance checks, height updates, or rotations during search
•Cache-friendly: Searching is pure traversal—modern CPUs can prefetch the path
•Concurrency-friendly: Multiple threads can search simultaneously without interference
•Simple implementation: The search code is trivial—just comparisons and pointer following
•Predictable performance: Every search takes O(log n) with no rebalancing variance

A note on splay trees:

The Single-Writer Pattern

Height Bounds and Search Performance

The performance guarantee of search derives directly from the height guarantee of balanced trees. Let's examine this relationship rigorously.

Search complexity = O(height)

Each search comparison moves us one level deeper into the tree. In the worst case, we traverse from root to a leaf—a path of length equal to the tree's height. Therefore:

Time(search) = O(h) where h is tree height

Balanced tree height = O(log n)

Different balanced tree types achieve different height bounds, but all are logarithmic:

AVL Trees:

The strict balance condition (|balance factor| ≤ 1) yields the tightest bound:

h(n) ≤ 1.45 × log₂(n + 2) - 1.33

For practical purposes: h ≤ 1.5 × log₂(n)

This means an AVL tree with 1 million nodes has height at most ~30.

Red-Black Trees:

The looser balance condition (all paths have same black-height, longest path ≤ 2× shortest) yields:

h(n) ≤ 2 × log₂(n + 1)

A Red-Black tree with 1 million nodes has height at most ~40.

The logarithmic guarantee is what matters:

Comparison: Search Performance Across Tree Types
n	Standard BST (worst)	AVL Tree (max)	Red-Black Tree (max)
100	99 comparisons	10 comparisons	14 comparisons
1,000	999 comparisons	15 comparisons	20 comparisons
10,000	9,999 comparisons	20 comparisons	27 comparisons
100,000	99,999 comparisons	25 comparisons	34 comparisons
1,000,000	999,999 comparisons	30 comparisons	40 comparisons
1,000,000,000	999,999,999 comparisons	45 comparisons	60 comparisons

Interpreting the numbers:

For 1 billion elements:

A degenerate BST could require 1 billion comparisons per search
An AVL tree requires at most 45 comparisons
A Red-Black tree requires at most 60 comparisons

The scaling benefit:

Log Base Doesn't Matter for Big-O

Search: The Primary Beneficiary of Balance

Why do we go through all the trouble of maintaining balance—the rotations, the height tracking, the complex deletion cases? The answer: for the sake of search.

The read-heavy reality:

Most data structures are read much more often than they are written. Consider typical systems:

Databases: Queries (reads) vastly outnumber insertions and deletions
Caches: Lookups dominate; updates are infrequent
Symbol tables: Compilers look up identifiers repeatedly, add them once
Routing tables: Lookups per packet; updates periodic
Spell checkers: Dictionary lookups for every word; dictionary rarely modified

In a 100:1 read:write ratio (conservative for many applications), optimizing reads at the cost of writes is clearly worthwhile.

What Writes Pay

•Height/color tracking overhead
•Balance factor computation
•Rotation operations (pointer reassignment)
•Multiple rotations for deletion
•Code complexity in implementation

What Reads Gain

•Guaranteed O(log n) time—no exceptions
•Zero overhead—no balancing checks
•Predictable latency for all queries
•Scalability—adding data doesn't slow reads much
•Simple, fast implementation

The amortization perspective:

In a standard BST, insertions are O(log n) average but searching becomes O(n) worst-case if the tree degenerates. We'd have 'saved' time on writes but paid a much higher cost on reads.

With a balanced tree:

1000 insertions: ~1000 × log(n) work
100,000 searches: ~100,000 × log(n) work
Total: ~101,000 × log(n) work

With a degenerate BST:

1000 insertions: ~1000 × log(n) average work
100,000 searches: ~100,000 × n work (worst case)
Total: dominated by 100,000 × n

For n = 1000, balanced tree does ~100,010 × 10 = ~1M units of work while degenerate BST does ~100,000 × 1000 = ~100M units—a 100× difference that grows with both n and the read:write ratio.

The Right Trade-Off

Beyond Point Search: Range Queries and Iteration

While we've focused on point search (finding a single key), balanced trees excel at operations that extend beyond single lookups. The same O(log n) height guarantee benefits these operations too.

Range queries:

A range query asks: find all keys between L and R. The algorithm:

Search for L, finding the position where L would be (or the smallest key ≥ L)
Traverse in-order from that position, collecting keys until we exceed R
Total time: O(log n + k) where k is the number of keys in range

Ordered iteration:

Floor and ceiling queries:

floor(k): largest key ≤ k
ceiling(k): smallest key ≥ k

Both are O(log n) in balanced trees—essentially modified searches that track the best candidate while traversing.

range_query.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def range_query(root, low, high):
    """
    Find all keys in the BST between low and high (inclusive).
    Time: O(log n + k) where k is the number of keys in range.
    Space: O(h) for recursion stack, O(k) for result list.
    """
    result = []
    _range_query_helper(root, low, high, result)
    return result
 
 
def _range_query_helper(node, low, high, result):
    """Recursive helper for range query."""
    if node is None:
        return
    
    # If current key > low, there might be results in left subtree
    if node.key > low:
        _range_query_helper(node.left, low, high, result)
    
    # If current key is in range, include it
    if low <= node.key <= high:
        result.append(node.key)
    
    # If current key < high, there might be results in right subtree
    if node.key < high:
        _range_query_helper(node.right, low, high, result)
 
 
def floor(root, key):
    """
    Find the largest key ≤ given key.
    Returns None if all keys are greater than given key.
    Time: O(log n) for balanced tree.
    """
    result = None
    current = root
    
    while current is not None:
        if current.key == key:
            return current.key
        elif current.key < key:
            result = current.key  # Candidate for floor
            current = current.right  # Look for larger candidate
        else:
            current = current.left  # Current key too large
    
    return result

Why This Matters for Databases

Search Performance in Practice

The theoretical O(log n) guarantee is reassuring, but how does search perform in real systems? Several factors influence practical performance beyond the basic complexity analysis.

Cache effects:

Modern CPUs have multi-level caches that dramatically speed up memory access—if data is cache-resident. When searching through a balanced tree:

Nodes near the root are accessed frequently and likely cache-hot
The top few levels of the tree often fit entirely in L1/L2 cache
Deeper levels may incur cache misses

Comparison costs:

Real complexity for string keys: O(m × log n) where m is average string length.

Memory layout:

Practical Performance Considerations
Factor	Impact	Mitigation
Cache locality	Tree nodes scattered in memory	Use arena allocation to place nodes contiguously
Pointer chasing	Each level requires following a pointer	CPU prefetching helps; consider B-trees for disk
String keys	Comparison cost is O(key length)	Use prefix-sharing or hash acceleration
Small n	Linear search may be faster	Hybrid: linear for small, tree for large
Balancing overhead	Extra height/color fields per node	Minimal: 1-2 bytes per node

When balanced trees shine:

Large datasets: n > 10,000 is definitely tree territory. The O(log n) vs O(n) difference becomes undeniable.
Frequent reads: The more searches relative to modifications, the more the zero-overhead search pays off.
Ordered operations needed: If you need range queries, ordered iteration, or floor/ceiling, trees are the right structure.
Worst-case guarantees required: For latency-sensitive applications, the guaranteed O(log n) is essential.

When alternatives might win:

Point lookups only, unordered: Hash tables offer O(1) expected lookup.
Tiny datasets: Array or list search is simpler and cache-friendlier.
Static data: If data never changes, a sorted array with binary search is simpler and faster.
Disk-based storage: B-trees (not binary balanced trees) are preferred due to reduced I/O.

The Library Reality

Summary: Search (Unchanged from BST)

Key Takeaways

•Search is unchanged from BST: The same algorithm works for standard BSTs, AVL trees, and Red-Black trees
•No rebalancing overhead: Search is read-only; it doesn't trigger or need any balancing logic
•O(log n) guaranteed: Height bounds from balancing translate directly to search time bounds
•Search is the primary beneficiary: All the rebalancing work exists to enable fast, guaranteed search
•Trade-off works for real workloads: Read-heavy workloads amortize write-time overhead across many cheap reads
•Extends to range queries: Floor, ceiling, and range operations also benefit from logarithmic height
•Practical performance is excellent: For large n, balanced trees deliver on their theoretical promises

What's next:

Page Complete

3 / 4