Spatial Indexes - Learning Module

Loading content...

0/252

R-tree Operations: Search, Insert, Split, and Delete

The Algorithms That Power Spatial Search

When you ask your phone "Find restaurants near me," an R-tree springs into action. But unlike B+-tree search, which follows a single path, R-tree operations must navigate the complexity of overlapping bounding rectangles and multi-dimensional space.

This page dissects the four fundamental R-tree operations:

Search — Finding objects that match spatial predicates
Insert — Adding new objects while maintaining balance
Split — Handling node overflow when capacity is exceeded
Delete — Removing objects and handling underflow

Each operation involves careful algorithmic choices that dramatically impact performance. A good insertion strategy produces tight MBRs and minimal overlap; a poor one creates a bloated, slow tree.

What You Will Learn

By the end of this page, you will understand complete algorithms for R-tree search (range and nearest-neighbor), the ChooseSubtree algorithm for optimal insertion paths, multiple node splitting strategies (linear, quadratic, R*-tree), and deletion with condense tree rebalancing. You'll see how every decision traces back to MBR quality metrics.

Search Algorithm: Range Queries

The R-tree search algorithm exemplifies the power of hierarchical pruning. Given a query rectangle Q, we want all objects whose MBRs intersect Q. The algorithm descends through the tree, exploring only branches whose MBRs overlap the query.

Key Difference from B+-tree: In a B+-tree, exactly one path leads to any key. In an R-tree, multiple paths may need exploration because sibling MBRs can overlap.

rtree_search.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Range search: Find all objects whose MBRs intersect query rectangle Q
function search(node, queryMBR):
    results = []
    
    if node.isLeaf():
        // Leaf node: check each entry against query
        for each entry in node.entries:
            if intersects(entry.mbr, queryMBR):
                results.add(entry.objectPointer)
    else:
        // Internal node: recursively search qualifying children
        for each entry in node.entries:
            if intersects(entry.mbr, queryMBR):
                results.addAll(search(entry.childNode, queryMBR))
    
    return results
 
// Main search entry point
function rangeSearch(tree, queryMBR):
    if tree.root == null:
        return []
    return search(tree.root, queryMBR)
 
// Example usage
// Find all restaurants within visible map bounds
queryBounds = MBR(-122.45, -122.40, 37.75, 37.80)
results = rangeSearch(restaurantTree, queryBounds)
 
// Cost Analysis:
// Let n = number of objects, h = tree height, f = fanout
// Best case (no overlap): O(h + k) where k = result size
// Worst case (high overlap): O(n) - degenerates to full scan
// Average case: O(h * p^h-1 + k) where p = probability of MBR overlap

Search Traversal Visualization:

                    ROOT
                  [0,100,0,100]
                 /      |      
              A          B          C
          [0,40,0,50] [30,70,40,90] [60,100,0,60]
           /   \         |   \         |    
         L1   L2        L3   L4       L5   L6

Query: [35,45,45,55]

1. Check ROOT MBR [0,100,0,100] → intersects? YES
2. Check A [0,40,0,50] → intersects [35,45,45,55]? NO (maxY=50 < 45) ✗
3. Check B [30,70,40,90] → intersects? YES → descend
   - Check L3 entries...
   - Check L4 entries...
4. Check C [60,100,0,60] → intersects? NO (minX=60 > 45) ✗

Result: Only subtree B explored, A and C pruned entirely!

Performance Insight: The key to fast search is aggressive pruning. When 90% of sibling MBRs don't overlap the query, we skip 90% of the tree. But if 90% of siblings overlap (high overlap tree), we explore almost everything.

Nearest Neighbor Search

Finding the k nearest objects to a query point is more complex than range search. We can't simply filter by MBR intersection because we don't know the result distances in advance. Instead, we use a priority-queue based traversal that explores nodes in order of their minimum distance to the query point.

rtree_knn.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// k-Nearest Neighbor Search using priority queue
function kNearestNeighbors(tree, queryPoint, k):
    results = []                        // Result collector
    pq = PriorityQueue()                // Min-heap by distance
    
    // Initialize with root
    pq.insert(tree.root, 0)             // Root has distance 0 (contains everything)
    
    while not pq.isEmpty() and results.size() < k:
        item, dist = pq.extractMin()
        
        // Pruning: if we have k results and this item is farther, skip it
        if results.size() == k and dist > results.maxDistance():
            continue
            
        if item is LeafEntry:
            // Actual object - add to results
            results.insert(item.object, dist)
            
        elif item is Node:
            // Internal or leaf node - expand children
            for each entry in item.entries:
                minDist = minDistanceToMBR(queryPoint, entry.mbr)
                
                // Pruning: only consider if potentially closer than k-th result
                if results.size() < k or minDist < results.maxDistance():
                    if item.isLeaf():
                        // For leaf entries, calculate exact distance
                        exactDist = distance(queryPoint, entry.object)
                        pq.insert(entry, exactDist)
                    else:
                        // For internal entries, use MBR minimum distance
                        pq.insert(entry.childNode, minDist)
    
    return results
 
// Minimum distance from point to MBR
// This is the closest the point could possibly be to any object in the MBR
function minDistanceToMBR(point, mbr):
    dx = 0
    dy = 0
    
    if point.x < mbr.minX:
        dx = mbr.minX - point.x
    elif point.x > mbr.maxX:
        dx = point.x - mbr.maxX
        
    if point.y < mbr.minY:
        dy = mbr.minY - point.y
    elif point.y > mbr.maxY:
        dy = point.y - mbr.maxY
        
    return sqrt(dx*dx + dy*dy)

Why Priority Queue Order Matters

By exploring nodes in order of their minimum possible distance, we find nearby objects first. Once we have k candidates, any node whose minimum distance exceeds our current k-th distance can be pruned—objects in that subtree can't be among the k nearest. This best-first traversal typically examines far fewer nodes than depth-first search.

MINDIST vs. MINMAXDIST:

Two distance bounds are useful for pruning:

MINDIST — Minimum possible distance from query to any point in MBR
- Point outside MBR: distance to nearest corner/edge
- Point inside MBR: 0
MINMAXDIST — Maximum distance to the nearest face of MBR
- Guarantees at least one object is no farther than this distance
- Enables stronger pruning: if MINDIST(B) > MINMAXDIST(A), skip B entirely

Query point: Q = (50, 50)
MBR A: [60, 80] × [70, 90]

MINDIST = √((60-50)² + (70-50)²) = √(100 + 400) = √500 ≈ 22.4

MINMAXDIST: Consider each face's farthest point from Q:
- Left face (x=60): farthest at (60, 90), dist = √(100+1600) = √1700 ≈ 41.2
- Bottom face (y=70): farthest at (80, 70), dist = √(900+400) = √1300 ≈ 36.1
- ...minimum of these is MINMAXDIST ≈ 36.1

Insert Algorithm: ChooseSubtree

Insertion is where R-tree quality is determined. The ChooseSubtree algorithm decides which leaf should receive a new object. This single decision, repeated millions of times, shapes the entire tree's MBR structure and query performance.

Design Tension: Should we minimize enlargement (keep MBRs small) or minimize overlap (keep MBRs disjoint)? Different strategies balance this tradeoff differently.

rtree_insert.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Insert a new object into the R-tree
function insert(tree, object, objectMBR):
    // Step 1: Find the best leaf to insert into
    leaf = chooseSubtree(tree.root, objectMBR)
    
    // Step 2: Add the entry to the leaf
    newEntry = Entry(objectMBR, object)
    leaf.addEntry(newEntry)
    
    // Step 3: Propagate MBR changes upward
    adjustTree(leaf)
    
    // Step 4: Handle overflow if necessary
    if leaf.entryCount() > M:
        handleOverflow(leaf)
 
// ChooseSubtree: Find best leaf for new entry
// Original R-tree algorithm (Guttman 1984)
function chooseSubtree(node, objectMBR):
    if node.isLeaf():
        return node
    
    // Find child requiring minimum enlargement to contain objectMBR
    bestChild = null
    minEnlargement = INFINITY
    
    for each entry in node.entries:
        enlargement = computeEnlargement(entry.mbr, objectMBR)
        
        if enlargement < minEnlargement:
            minEnlargement = enlargement
            bestChild = entry.childNode
        elif enlargement == minEnlargement:
            // Tie-breaker: prefer smaller MBR
            if area(entry.mbr) < area(bestChild.mbr):
                bestChild = entry.childNode
    
    return chooseSubtree(bestChild, objectMBR)
 
// Compute how much MBR grows to contain new object
function computeEnlargement(existingMBR, newMBR):
    combined = union(existingMBR, newMBR)
    return area(combined) - area(existingMBR)

R-tree ChooseSubtree Enhancement:*

The R*-tree (1990) improved ChooseSubtree by considering overlap as well as enlargement:

// R*-tree ChooseSubtree (simplified)
function chooseSubtreeRStar(node, objectMBR, level):
    if level == 1:  // One level above leaves
        // Minimize OVERLAP enlargement, then area enlargement
        return child minimizing 
            (overlapEnlargement, thenAreaEnlargement, thenArea)
    else:
        // Higher levels: minimize area enlargement only
        return child minimizing (areaEnlargement, thenArea)

Why Overlap Matters at Leaf Level:

At the leaf parent level, overlap directly affects query performance. Two overlapping leaf MBRs mean many queries will examine both leaves. At higher levels, overlap is less damaging because we're comparing larger regions anyway.

Insert Order Affects Tree Quality

Unlike B+-trees where insert order doesn't affect final structure significantly, R-tree insert order profoundly impacts quality. Inserting spatially scattered objects randomly produces worse trees than inserting spatially coherent batches. This is why bulk loading algorithms often outperform repeated single inserts.

Node Splitting Strategies

When a node exceeds capacity M, it must split into two nodes. The splitting algorithm determines how entries are distributed between the new nodes—a critical decision affecting MBR quality.

The Goal: Create two groups whose MBRs have minimal total area, minimal overlap, and reasonable shape (not too elongated).

Splitting Algorithm Comparison
Algorithm	Time Complexity	Split Quality	Used In
Linear Split	O(M)	Poor to Fair	Original R-tree option
Quadratic Split	O(M²)	Fair to Good	Original R-tree default
R*-tree Split	O(M² log M)	Excellent	R*-tree, PostGIS
Greene's Split	O(M log M)	Good	Some implementations

rtree_split.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Quadratic Split (Guttman's original)
function quadraticSplit(entries):
    // Step 1: Pick seeds - two entries that waste most area if together
    seed1, seed2 = pickSeeds(entries)
    
    group1 = [seed1]
    group2 = [seed2]
    remaining = entries - {seed1, seed2}
    
    // Step 2: Assign remaining entries
    while remaining is not empty:
        // Check if one group needs all remaining to reach minimum
        if group1.size() + remaining.size() == m:
            group1.addAll(remaining)
            break
        if group2.size() + remaining.size() == m:
            group2.addAll(remaining)
            break
            
        // Pick next: entry with maximum preference for one group
        entry, preferredGroup = pickNext(remaining, group1, group2)
        preferredGroup.add(entry)
        remaining.remove(entry)
    
    return group1, group2
 
// Pick seeds: Find two entries that waste most space if together
function pickSeeds(entries):
    maxWaste = -INFINITY
    seed1, seed2 = null
    
    for each pair (e1, e2) in entries:
        combined = union(e1.mbr, e2.mbr)
        waste = area(combined) - area(e1.mbr) - area(e2.mbr)
        if waste > maxWaste:
            maxWaste = waste
            seed1, seed2 = e1, e2
    
    return seed1, seed2
 
// Pick next: Entry with maximum preference difference
function pickNext(remaining, group1, group2):
    maxDiff = -INFINITY
    chosen = null
    chosenGroup = null
    
    mbr1 = computeMBR(group1)
    mbr2 = computeMBR(group2)
    
    for each entry in remaining:
        d1 = enlargement(mbr1, entry.mbr)  // Cost to add to group1
        d2 = enlargement(mbr2, entry.mbr)  // Cost to add to group2
        diff = abs(d1 - d2)
        
        if diff > maxDiff:
            maxDiff = diff
            chosen = entry
            chosenGroup = (d1 < d2) ? group1 : group2
    
    return chosen, chosenGroup

R-tree Split (Superior Quality):*

The R*-tree uses a more sophisticated approach:

Choose split axis: Try splitting along X, then Y. Count the margin (perimeter) of resulting MBRs. Choose the axis with minimum total margin.
Choose split position: Along the chosen axis, try all valid splits (maintaining minimum fill m). Choose the distribution minimizing overlap, then area.
Forced reinsert: Before splitting, try removing 30% of entries farthest from center and reinserting them. This often creates better distributions than splitting.

R*-tree forced reinsert:
if (overflowCount[level] == 0):  // First overflow at this level
    overflowCount[level] = 1
    reinsert(farthest p entries from center)
    return  // No split needed if reinsert resolves overflow
else:
    proceed with split  // Reinsert already tried

Split Cascades

When a node splits, its parent gains an entry. If the parent was already full, it splits too, potentially cascading to the root. A root split is the only time R-tree height increases. Forced reinsert reduces cascade frequency by redistributing entries instead of splitting.

Deletion and Underflow Handling

Deleting an object requires locating it, removing it, and handling the consequences: MBRs may shrink, and nodes may underflow (fall below minimum entries m).

rtree_delete.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// Delete an object from the R-tree
function delete(tree, object, objectMBR):
    // Step 1: Find the leaf containing the object
    leafPath = findLeaf(tree.root, objectMBR, object, [])
    
    if leafPath == null:
        return false  // Object not found
    
    leaf = leafPath.last()
    
    // Step 2: Remove the entry from leaf
    entry = leaf.findEntry(object)
    leaf.removeEntry(entry)
    
    // Step 3: Condense tree - handle underflow and adjust MBRs
    condenseTree(tree, leafPath)
    
    // Step 4: Shrink tree if root has only one child
    if tree.root is not leaf and tree.root.entryCount() == 1:
        tree.root = tree.root.entries[0].childNode
    
    return true
 
// Find leaf containing an object
function findLeaf(node, objectMBR, object, path):
    path.append(node)
    
    if node.isLeaf():
        for each entry in node.entries:
            if entry.object == object:
                return path
        return null  // Not in this leaf
    else:
        for each entry in node.entries:
            if contains(entry.mbr, objectMBR):
                result = findLeaf(entry.childNode, objectMBR, object, path.copy())
                if result != null:
                    return result
        return null  // Not in any subtree
 
// Condense tree after deletion
function condenseTree(tree, path):
    orphans = []  // Entries from eliminated nodes
    
    // Walk up the path from leaf to root
    for i = path.length - 1 down to 1:
        node = path[i]
        parent = path[i-1]
        
        if node.entryCount() < m:
            // Node underflow: remove from parent, save entries for reinsert
            parent.removeChild(node)
            orphans.addAll(node.entries)
        else:
            // Adjust parent's MBR to fit (now smaller) node
            parent.adjustMBRFor(node)
    
    // Reinsert orphaned entries
    for each entry in orphans:
        if entry.isLeafEntry:
            insert(tree, entry.object, entry.mbr)
        else:
            // Internal entry: must insert entire subtree at correct level
            insertSubtree(tree, entry.childNode, entry.level)

Reinsertion vs. Merging

Unlike B+-trees which merge underflowing siblings, R-trees typically reinsert orphaned entries. Why? Because the 'sibling' with best merge potential may not be a true sibling—it might be in a different branch. Reinsertion finds the globally best home for each orphan. This is more expensive but produces better tree quality.

AdjustTree: Propagating Changes Upward

After any modification (insert, delete, split), MBRs along the path from the affected node to the root may need adjustment. The AdjustTree algorithm propagates these changes upward.

rtree_adjust.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Adjust MBRs and handle splits from leaf to root
function adjustTree(node, splitNode = null):
    while node != tree.root:
        parent = node.parent
        parentEntry = parent.entryFor(node)
        
        // Recalculate this node's MBR in parent
        parentEntry.mbr = computeMBRFromChildren(node)
        
        // If node was split, add the new sibling to parent
        if splitNode != null:
            newEntry = Entry(computeMBRFromChildren(splitNode), splitNode)
            
            if parent.hasSpace():
                parent.addEntry(newEntry)
                splitNode = null
            else:
                // Parent overflows too - must split
                parent.addEntry(newEntry)  // Temporarily M+1 entries
                parent, splitNode = split(parent)
        
        node = parent
    
    // Handle root split
    if splitNode != null:
        newRoot = createNode()
        newRoot.addEntry(Entry(computeMBR(tree.root), tree.root))
        newRoot.addEntry(Entry(computeMBR(splitNode), splitNode))
        tree.root = newRoot
        tree.height += 1
 
// Compute MBR of a node (union of all entry MBRs)
function computeMBRFromChildren(node):
    mbr = node.entries[0].mbr
    for i = 1 to node.entryCount() - 1:
        mbr = union(mbr, node.entries[i].mbr)
    return mbr

MBR Tightening:

After deletion, child MBRs may shrink, allowing parent MBRs to shrink too. This tightening propagates up the tree:

Before deletion:         After deletion:
┌────────────────┐      ┌────────────┐
│ ┌────────────┐ │      │ ┌────────┐ │
│ │ ┌────┐     │ │      │ │ ┌────┐ │ │
│ │ │ A  │ (B) │ │  →   │ │ │ A  │ │ │  (B deleted, MBRs shrink)
│ │ └────┘     │ │      │ │ └────┘ │ │
│ └────────────┘ │      │ └────────┘ │
└────────────────┘      └────────────┘

This automatic tightening keeps MBRs minimal, improving query performance even as data is deleted.

Bulk Loading: Building Optimal Trees

When loading a large dataset, repeated single inserts produce suboptimal trees. Bulk loading algorithms construct R-trees bottom-up with much better quality.

Why Bulk Loading Matters:

Single inserts: Each must descend tree, evaluate ChooseSubtree, potentially split
Bulk loading: Sort once, pack nodes efficiently, build index in one pass
Result: 10-100x faster loading, 2-5x faster queries

rtree_bulk_load.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// Sort-Tile-Recursive (STR) Bulk Loading
// One of the most effective bulk loading algorithms
function STRBulkLoad(objects, M):
    n = objects.length
    P = ceil(n / M)           // Number of leaf-level pages
    S = ceil(sqrt(P))         // Number of vertical slices
    
    // Step 1: Sort by X-coordinate (or center X)
    objects.sortBy(o => center(o.mbr).x)
    
    // Step 2: Divide into S vertical slices
    sliceSize = ceil(n / S)
    slices = partition(objects, sliceSize)
    
    // Step 3: Within each slice, sort by Y and pack into nodes
    leaves = []
    for each slice in slices:
        slice.sortBy(o => center(o.mbr).y)
        
        for each group of M objects in slice:
            leaf = createLeafNode(group)
            leaves.add(leaf)
    
    // Step 4: Recursively build internal levels
    return buildInternalLevels(leaves, M)
 
function buildInternalLevels(nodes, M):
    if nodes.length <= M:
        // Can fit in single root
        root = createInternalNode(nodes)
        return root
    
    // Need another level - apply STR to node MBRs
    P = ceil(nodes.length / M)
    S = ceil(sqrt(P))
    
    nodes.sortBy(n => center(n.mbr).x)
    slices = partition(nodes, ceil(nodes.length / S))
    
    parents = []
    for each slice in slices:
        slice.sortBy(n => center(n.mbr).y)
        for each group of M nodes in slice:
            parent = createInternalNode(group)
            parents.add(parent)
    
    return buildInternalLevels(parents, M)

Bulk Loading Algorithm Comparison
Algorithm	Key Idea	Pros	Cons
STR (Sort-Tile-Recursive)	Sort by X, tile into strips, sort each by Y	Simple, effective, good packing	Not cache-optimal
Hilbert Packing	Sort by Hilbert curve value, pack sequentially	Excellent spatial clustering	Hilbert computation overhead
OMT (Overlap Minimizing Top-down)	Recursive spatial partitioning	Minimal overlap	Complex, slower to build
Buffer Tree	Lazy buffered inserts	Good for streaming data	Higher memory usage

When to Rebuild vs. Update

If you're modifying more than 30-50% of the data, it's often faster to bulk-load a new index than to perform individual updates. PostGIS and other systems provide REINDEX commands that drop and rebuild spatial indexes using bulk loading for exactly this reason.

Summary: R-tree Operations Mastery

We've covered the complete algorithmic foundation of R-trees. Let's consolidate the critical insights:

Key Takeaways

•Search uses hierarchical pruning — Skip subtrees whose MBRs don't intersect the query; multiple paths may need exploration.
•Nearest-neighbor uses priority queue — Best-first traversal by minimum distance enables early termination and aggressive pruning.
•ChooseSubtree determines tree quality — Minimizing enlargement (and overlap for R*-tree) produces tight MBRs.
•Splitting is the critical decision — Quadratic split is adequate; R*-tree split with forced reinsert is excellent.
•Deletion uses reinsertion — Orphaned entries are reinserted rather than merged, finding globally optimal homes.
•AdjustTree propagates changes — MBRs tighten upward after deletion; splits cascade upward to maintain balance.
•Bulk loading is essential — STR and similar algorithms produce 2-5x better trees than repeated inserts.

What's Next:

With algorithmic foundations complete, we're ready to explore spatial queries in practice. The next page covers real-world query types including range queries, k-NN search, spatial joins, and complex predicates. You'll see how the algorithms we've learned translate into SQL and how query optimizers leverage R-trees for dramatic performance improvements.

Page Complete

You now understand the complete R-tree algorithm suite: search (range and k-NN), insert with ChooseSubtree, splitting strategies, deletion with condense tree, and bulk loading. This operational knowledge prepares you to implement, optimize, and troubleshoot spatial indexes in production systems.