Loading content...
When you ask your phone "Find restaurants near me," an R-tree springs into action. But unlike B+-tree search, which follows a single path, R-tree operations must navigate the complexity of overlapping bounding rectangles and multi-dimensional space.
This page dissects the four fundamental R-tree operations:
Each operation involves careful algorithmic choices that dramatically impact performance. A good insertion strategy produces tight MBRs and minimal overlap; a poor one creates a bloated, slow tree.
By the end of this page, you will understand complete algorithms for R-tree search (range and nearest-neighbor), the ChooseSubtree algorithm for optimal insertion paths, multiple node splitting strategies (linear, quadratic, R*-tree), and deletion with condense tree rebalancing. You'll see how every decision traces back to MBR quality metrics.
The R-tree search algorithm exemplifies the power of hierarchical pruning. Given a query rectangle Q, we want all objects whose MBRs intersect Q. The algorithm descends through the tree, exploring only branches whose MBRs overlap the query.
Key Difference from B+-tree: In a B+-tree, exactly one path leads to any key. In an R-tree, multiple paths may need exploration because sibling MBRs can overlap.
123456789101112131415161718192021222324252627282930313233
// Range search: Find all objects whose MBRs intersect query rectangle Qfunction search(node, queryMBR): results = [] if node.isLeaf(): // Leaf node: check each entry against query for each entry in node.entries: if intersects(entry.mbr, queryMBR): results.add(entry.objectPointer) else: // Internal node: recursively search qualifying children for each entry in node.entries: if intersects(entry.mbr, queryMBR): results.addAll(search(entry.childNode, queryMBR)) return results // Main search entry pointfunction rangeSearch(tree, queryMBR): if tree.root == null: return [] return search(tree.root, queryMBR) // Example usage// Find all restaurants within visible map boundsqueryBounds = MBR(-122.45, -122.40, 37.75, 37.80)results = rangeSearch(restaurantTree, queryBounds) // Cost Analysis:// Let n = number of objects, h = tree height, f = fanout// Best case (no overlap): O(h + k) where k = result size// Worst case (high overlap): O(n) - degenerates to full scan// Average case: O(h * p^h-1 + k) where p = probability of MBR overlapSearch Traversal Visualization:
ROOT
[0,100,0,100]
/ |
A B C
[0,40,0,50] [30,70,40,90] [60,100,0,60]
/ \ | \ |
L1 L2 L3 L4 L5 L6
Query: [35,45,45,55]
1. Check ROOT MBR [0,100,0,100] → intersects? YES
2. Check A [0,40,0,50] → intersects [35,45,45,55]? NO (maxY=50 < 45) ✗
3. Check B [30,70,40,90] → intersects? YES → descend
- Check L3 entries...
- Check L4 entries...
4. Check C [60,100,0,60] → intersects? NO (minX=60 > 45) ✗
Result: Only subtree B explored, A and C pruned entirely!
Performance Insight: The key to fast search is aggressive pruning. When 90% of sibling MBRs don't overlap the query, we skip 90% of the tree. But if 90% of siblings overlap (high overlap tree), we explore almost everything.
Finding the k nearest objects to a query point is more complex than range search. We can't simply filter by MBR intersection because we don't know the result distances in advance. Instead, we use a priority-queue based traversal that explores nodes in order of their minimum distance to the query point.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
// k-Nearest Neighbor Search using priority queuefunction kNearestNeighbors(tree, queryPoint, k): results = [] // Result collector pq = PriorityQueue() // Min-heap by distance // Initialize with root pq.insert(tree.root, 0) // Root has distance 0 (contains everything) while not pq.isEmpty() and results.size() < k: item, dist = pq.extractMin() // Pruning: if we have k results and this item is farther, skip it if results.size() == k and dist > results.maxDistance(): continue if item is LeafEntry: // Actual object - add to results results.insert(item.object, dist) elif item is Node: // Internal or leaf node - expand children for each entry in item.entries: minDist = minDistanceToMBR(queryPoint, entry.mbr) // Pruning: only consider if potentially closer than k-th result if results.size() < k or minDist < results.maxDistance(): if item.isLeaf(): // For leaf entries, calculate exact distance exactDist = distance(queryPoint, entry.object) pq.insert(entry, exactDist) else: // For internal entries, use MBR minimum distance pq.insert(entry.childNode, minDist) return results // Minimum distance from point to MBR// This is the closest the point could possibly be to any object in the MBRfunction minDistanceToMBR(point, mbr): dx = 0 dy = 0 if point.x < mbr.minX: dx = mbr.minX - point.x elif point.x > mbr.maxX: dx = point.x - mbr.maxX if point.y < mbr.minY: dy = mbr.minY - point.y elif point.y > mbr.maxY: dy = point.y - mbr.maxY return sqrt(dx*dx + dy*dy)By exploring nodes in order of their minimum possible distance, we find nearby objects first. Once we have k candidates, any node whose minimum distance exceeds our current k-th distance can be pruned—objects in that subtree can't be among the k nearest. This best-first traversal typically examines far fewer nodes than depth-first search.
MINDIST vs. MINMAXDIST:
Two distance bounds are useful for pruning:
MINDIST — Minimum possible distance from query to any point in MBR
MINMAXDIST — Maximum distance to the nearest face of MBR
Query point: Q = (50, 50)
MBR A: [60, 80] × [70, 90]
MINDIST = √((60-50)² + (70-50)²) = √(100 + 400) = √500 ≈ 22.4
MINMAXDIST: Consider each face's farthest point from Q:
- Left face (x=60): farthest at (60, 90), dist = √(100+1600) = √1700 ≈ 41.2
- Bottom face (y=70): farthest at (80, 70), dist = √(900+400) = √1300 ≈ 36.1
- ...minimum of these is MINMAXDIST ≈ 36.1
Insertion is where R-tree quality is determined. The ChooseSubtree algorithm decides which leaf should receive a new object. This single decision, repeated millions of times, shapes the entire tree's MBR structure and query performance.
Design Tension: Should we minimize enlargement (keep MBRs small) or minimize overlap (keep MBRs disjoint)? Different strategies balance this tradeoff differently.
12345678910111213141516171819202122232425262728293031323334353637383940414243
// Insert a new object into the R-treefunction insert(tree, object, objectMBR): // Step 1: Find the best leaf to insert into leaf = chooseSubtree(tree.root, objectMBR) // Step 2: Add the entry to the leaf newEntry = Entry(objectMBR, object) leaf.addEntry(newEntry) // Step 3: Propagate MBR changes upward adjustTree(leaf) // Step 4: Handle overflow if necessary if leaf.entryCount() > M: handleOverflow(leaf) // ChooseSubtree: Find best leaf for new entry// Original R-tree algorithm (Guttman 1984)function chooseSubtree(node, objectMBR): if node.isLeaf(): return node // Find child requiring minimum enlargement to contain objectMBR bestChild = null minEnlargement = INFINITY for each entry in node.entries: enlargement = computeEnlargement(entry.mbr, objectMBR) if enlargement < minEnlargement: minEnlargement = enlargement bestChild = entry.childNode elif enlargement == minEnlargement: // Tie-breaker: prefer smaller MBR if area(entry.mbr) < area(bestChild.mbr): bestChild = entry.childNode return chooseSubtree(bestChild, objectMBR) // Compute how much MBR grows to contain new objectfunction computeEnlargement(existingMBR, newMBR): combined = union(existingMBR, newMBR) return area(combined) - area(existingMBR)R-tree ChooseSubtree Enhancement:*
The R*-tree (1990) improved ChooseSubtree by considering overlap as well as enlargement:
// R*-tree ChooseSubtree (simplified)
function chooseSubtreeRStar(node, objectMBR, level):
if level == 1: // One level above leaves
// Minimize OVERLAP enlargement, then area enlargement
return child minimizing
(overlapEnlargement, thenAreaEnlargement, thenArea)
else:
// Higher levels: minimize area enlargement only
return child minimizing (areaEnlargement, thenArea)
Why Overlap Matters at Leaf Level:
At the leaf parent level, overlap directly affects query performance. Two overlapping leaf MBRs mean many queries will examine both leaves. At higher levels, overlap is less damaging because we're comparing larger regions anyway.
Unlike B+-trees where insert order doesn't affect final structure significantly, R-tree insert order profoundly impacts quality. Inserting spatially scattered objects randomly produces worse trees than inserting spatially coherent batches. This is why bulk loading algorithms often outperform repeated single inserts.
When a node exceeds capacity M, it must split into two nodes. The splitting algorithm determines how entries are distributed between the new nodes—a critical decision affecting MBR quality.
The Goal: Create two groups whose MBRs have minimal total area, minimal overlap, and reasonable shape (not too elongated).
| Algorithm | Time Complexity | Split Quality | Used In |
|---|---|---|---|
| Linear Split | O(M) | Poor to Fair | Original R-tree option |
| Quadratic Split | O(M²) | Fair to Good | Original R-tree default |
| R*-tree Split | O(M² log M) | Excellent | R*-tree, PostGIS |
| Greene's Split | O(M log M) | Good | Some implementations |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
// Quadratic Split (Guttman's original)function quadraticSplit(entries): // Step 1: Pick seeds - two entries that waste most area if together seed1, seed2 = pickSeeds(entries) group1 = [seed1] group2 = [seed2] remaining = entries - {seed1, seed2} // Step 2: Assign remaining entries while remaining is not empty: // Check if one group needs all remaining to reach minimum if group1.size() + remaining.size() == m: group1.addAll(remaining) break if group2.size() + remaining.size() == m: group2.addAll(remaining) break // Pick next: entry with maximum preference for one group entry, preferredGroup = pickNext(remaining, group1, group2) preferredGroup.add(entry) remaining.remove(entry) return group1, group2 // Pick seeds: Find two entries that waste most space if togetherfunction pickSeeds(entries): maxWaste = -INFINITY seed1, seed2 = null for each pair (e1, e2) in entries: combined = union(e1.mbr, e2.mbr) waste = area(combined) - area(e1.mbr) - area(e2.mbr) if waste > maxWaste: maxWaste = waste seed1, seed2 = e1, e2 return seed1, seed2 // Pick next: Entry with maximum preference differencefunction pickNext(remaining, group1, group2): maxDiff = -INFINITY chosen = null chosenGroup = null mbr1 = computeMBR(group1) mbr2 = computeMBR(group2) for each entry in remaining: d1 = enlargement(mbr1, entry.mbr) // Cost to add to group1 d2 = enlargement(mbr2, entry.mbr) // Cost to add to group2 diff = abs(d1 - d2) if diff > maxDiff: maxDiff = diff chosen = entry chosenGroup = (d1 < d2) ? group1 : group2 return chosen, chosenGroupR-tree Split (Superior Quality):*
The R*-tree uses a more sophisticated approach:
Choose split axis: Try splitting along X, then Y. Count the margin (perimeter) of resulting MBRs. Choose the axis with minimum total margin.
Choose split position: Along the chosen axis, try all valid splits (maintaining minimum fill m). Choose the distribution minimizing overlap, then area.
Forced reinsert: Before splitting, try removing 30% of entries farthest from center and reinserting them. This often creates better distributions than splitting.
R*-tree forced reinsert:
if (overflowCount[level] == 0): // First overflow at this level
overflowCount[level] = 1
reinsert(farthest p entries from center)
return // No split needed if reinsert resolves overflow
else:
proceed with split // Reinsert already tried
When a node splits, its parent gains an entry. If the parent was already full, it splits too, potentially cascading to the root. A root split is the only time R-tree height increases. Forced reinsert reduces cascade frequency by redistributing entries instead of splitting.
Deleting an object requires locating it, removing it, and handling the consequences: MBRs may shrink, and nodes may underflow (fall below minimum entries m).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
// Delete an object from the R-treefunction delete(tree, object, objectMBR): // Step 1: Find the leaf containing the object leafPath = findLeaf(tree.root, objectMBR, object, []) if leafPath == null: return false // Object not found leaf = leafPath.last() // Step 2: Remove the entry from leaf entry = leaf.findEntry(object) leaf.removeEntry(entry) // Step 3: Condense tree - handle underflow and adjust MBRs condenseTree(tree, leafPath) // Step 4: Shrink tree if root has only one child if tree.root is not leaf and tree.root.entryCount() == 1: tree.root = tree.root.entries[0].childNode return true // Find leaf containing an objectfunction findLeaf(node, objectMBR, object, path): path.append(node) if node.isLeaf(): for each entry in node.entries: if entry.object == object: return path return null // Not in this leaf else: for each entry in node.entries: if contains(entry.mbr, objectMBR): result = findLeaf(entry.childNode, objectMBR, object, path.copy()) if result != null: return result return null // Not in any subtree // Condense tree after deletionfunction condenseTree(tree, path): orphans = [] // Entries from eliminated nodes // Walk up the path from leaf to root for i = path.length - 1 down to 1: node = path[i] parent = path[i-1] if node.entryCount() < m: // Node underflow: remove from parent, save entries for reinsert parent.removeChild(node) orphans.addAll(node.entries) else: // Adjust parent's MBR to fit (now smaller) node parent.adjustMBRFor(node) // Reinsert orphaned entries for each entry in orphans: if entry.isLeafEntry: insert(tree, entry.object, entry.mbr) else: // Internal entry: must insert entire subtree at correct level insertSubtree(tree, entry.childNode, entry.level)Unlike B+-trees which merge underflowing siblings, R-trees typically reinsert orphaned entries. Why? Because the 'sibling' with best merge potential may not be a true sibling—it might be in a different branch. Reinsertion finds the globally best home for each orphan. This is more expensive but produces better tree quality.
After any modification (insert, delete, split), MBRs along the path from the affected node to the root may need adjustment. The AdjustTree algorithm propagates these changes upward.
12345678910111213141516171819202122232425262728293031323334353637
// Adjust MBRs and handle splits from leaf to rootfunction adjustTree(node, splitNode = null): while node != tree.root: parent = node.parent parentEntry = parent.entryFor(node) // Recalculate this node's MBR in parent parentEntry.mbr = computeMBRFromChildren(node) // If node was split, add the new sibling to parent if splitNode != null: newEntry = Entry(computeMBRFromChildren(splitNode), splitNode) if parent.hasSpace(): parent.addEntry(newEntry) splitNode = null else: // Parent overflows too - must split parent.addEntry(newEntry) // Temporarily M+1 entries parent, splitNode = split(parent) node = parent // Handle root split if splitNode != null: newRoot = createNode() newRoot.addEntry(Entry(computeMBR(tree.root), tree.root)) newRoot.addEntry(Entry(computeMBR(splitNode), splitNode)) tree.root = newRoot tree.height += 1 // Compute MBR of a node (union of all entry MBRs)function computeMBRFromChildren(node): mbr = node.entries[0].mbr for i = 1 to node.entryCount() - 1: mbr = union(mbr, node.entries[i].mbr) return mbrMBR Tightening:
After deletion, child MBRs may shrink, allowing parent MBRs to shrink too. This tightening propagates up the tree:
Before deletion: After deletion:
┌────────────────┐ ┌────────────┐
│ ┌────────────┐ │ │ ┌────────┐ │
│ │ ┌────┐ │ │ │ │ ┌────┐ │ │
│ │ │ A │ (B) │ │ → │ │ │ A │ │ │ (B deleted, MBRs shrink)
│ │ └────┘ │ │ │ │ └────┘ │ │
│ └────────────┘ │ │ └────────┘ │
└────────────────┘ └────────────┘
This automatic tightening keeps MBRs minimal, improving query performance even as data is deleted.
When loading a large dataset, repeated single inserts produce suboptimal trees. Bulk loading algorithms construct R-trees bottom-up with much better quality.
Why Bulk Loading Matters:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
// Sort-Tile-Recursive (STR) Bulk Loading// One of the most effective bulk loading algorithmsfunction STRBulkLoad(objects, M): n = objects.length P = ceil(n / M) // Number of leaf-level pages S = ceil(sqrt(P)) // Number of vertical slices // Step 1: Sort by X-coordinate (or center X) objects.sortBy(o => center(o.mbr).x) // Step 2: Divide into S vertical slices sliceSize = ceil(n / S) slices = partition(objects, sliceSize) // Step 3: Within each slice, sort by Y and pack into nodes leaves = [] for each slice in slices: slice.sortBy(o => center(o.mbr).y) for each group of M objects in slice: leaf = createLeafNode(group) leaves.add(leaf) // Step 4: Recursively build internal levels return buildInternalLevels(leaves, M) function buildInternalLevels(nodes, M): if nodes.length <= M: // Can fit in single root root = createInternalNode(nodes) return root // Need another level - apply STR to node MBRs P = ceil(nodes.length / M) S = ceil(sqrt(P)) nodes.sortBy(n => center(n.mbr).x) slices = partition(nodes, ceil(nodes.length / S)) parents = [] for each slice in slices: slice.sortBy(n => center(n.mbr).y) for each group of M nodes in slice: parent = createInternalNode(group) parents.add(parent) return buildInternalLevels(parents, M)| Algorithm | Key Idea | Pros | Cons |
|---|---|---|---|
| STR (Sort-Tile-Recursive) | Sort by X, tile into strips, sort each by Y | Simple, effective, good packing | Not cache-optimal |
| Hilbert Packing | Sort by Hilbert curve value, pack sequentially | Excellent spatial clustering | Hilbert computation overhead |
| OMT (Overlap Minimizing Top-down) | Recursive spatial partitioning | Minimal overlap | Complex, slower to build |
| Buffer Tree | Lazy buffered inserts | Good for streaming data | Higher memory usage |
If you're modifying more than 30-50% of the data, it's often faster to bulk-load a new index than to perform individual updates. PostGIS and other systems provide REINDEX commands that drop and rebuild spatial indexes using bulk loading for exactly this reason.
We've covered the complete algorithmic foundation of R-trees. Let's consolidate the critical insights:
What's Next:
With algorithmic foundations complete, we're ready to explore spatial queries in practice. The next page covers real-world query types including range queries, k-NN search, spatial joins, and complex predicates. You'll see how the algorithms we've learned translate into SQL and how query optimizers leverage R-trees for dramatic performance improvements.
You now understand the complete R-tree algorithm suite: search (range and k-NN), insert with ChooseSubtree, splitting strategies, deletion with condense tree, and bulk loading. This operational knowledge prepares you to implement, optimize, and troubleshoot spatial indexes in production systems.