Spatial Indexes - Learning Module

Loading content...

0/241

Spatial Queries: From Theory to Production SQL

Bringing Spatial Search to Life

Every location-aware application—from food delivery to emergency response, from real estate search to fleet management—depends on efficient spatial queries. The R-tree algorithms we've studied are the engine; now we explore how applications drive that engine.

This page bridges theory and practice: you'll learn the spatial query vocabulary, master PostGIS/SQL syntax, understand query optimizer behavior, and develop intuition for writing queries that leverage indexes effectively. By the end, you'll be able to design spatial database schemas and write queries that scale to millions of objects.

What You Will Learn

By the end of this page, you will master range queries (window and radius), nearest-neighbor queries with proper index usage, containment and intersection predicates, spatial joins for relating two datasets, and performance optimization techniques. Real SQL examples using PostGIS demonstrate each concept.

Range Queries: Windows and Bounding Boxes

Range queries (also called window queries) find all objects within a specified rectangular region. This is the most fundamental spatial query—used whenever a user views a map to show all visible features.

range_queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Basic range query: Find all restaurants in visible map area
-- Using ST_Intersects (exact geometry test)
SELECT id, name, cuisine, ST_AsText(location)
FROM restaurants
WHERE ST_Intersects(
    location,
    ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326)
    -- MakeEnvelope(minX, minY, maxX, maxY, SRID)
);
 
-- Using && operator (MBR-only test, faster but may include false positives)
-- Useful when geometries are simple points
SELECT id, name, cuisine
FROM restaurants
WHERE location && ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326);
 
-- Combining with non-spatial filters (index assists spatial, then filters)
SELECT id, name, rating, price_level
FROM restaurants
WHERE location && ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326)
  AND cuisine = 'Italian'
  AND rating >= 4.0
  AND is_open = true
ORDER BY rating DESC
LIMIT 20;
 
-- EXPLAIN output showing index usage
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM restaurants
WHERE ST_Intersects(location, ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326));
 
/*
Bitmap Heap Scan on restaurants (cost=4.39..358.72 rows=84 width=168)
  Recheck Cond: (location && '...'::geometry)
  Filter: st_intersects(location, '...'::geometry)
  Heap Blocks: exact=82
  ->  Bitmap Index Scan on restaurants_location_idx (cost=0.00..4.37 rows=84)
        Index Cond: (location && '...'::geometry)
Planning Time: 0.285 ms
Execution Time: 2.341 ms
*/

The && Operator

In PostGIS, && tests bounding box overlap only (MBR intersection). It's faster than ST_Intersects because it skips exact geometry testing. For point data where MBR = point, && and ST_Intersects are equivalent. For polygons, && may return false positives that ST_Intersects filters out.

Query Pattern: Dynamic Map Bounds

Web mapping applications send the visible bounds with each pan/zoom:

// Frontend sends viewport bounds
const bounds = map.getBounds();
const query = {
    minLng: bounds.getWest(),
    maxLng: bounds.getEast(),
    minLat: bounds.getSouth(),
    maxLat: bounds.getNorth()
};

// API call
fetch(`/api/restaurants?bbox=${query.minLng},${query.minLat},${query.maxLng},${query.maxLat}`);

-- Backend query (parameterized)
SELECT id, name, ST_AsGeoJSON(location) as geojson
FROM restaurants
WHERE location && ST_MakeEnvelope($1, $2, $3, $4, 4326)
LIMIT 1000;  -- Prevent overwhelming the client

Radius Queries: Finding Objects Within Distance

Radius queries find all objects within a specified distance from a point. Unlike rectangular range queries, radius queries define a circular search region—essential for "find nearby" features.

radius_queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Find all hospitals within 5 km of user's location
-- Method 1: ST_DWithin (optimized for spatial indexes)
SELECT id, name, address,
       ST_Distance(location::geography, 
                   ST_GeographyFromText('POINT(-122.4194 37.7749)')) AS distance_m
FROM hospitals
WHERE ST_DWithin(
    location::geography,                              -- Cast to geography for meters
    ST_GeographyFromText('POINT(-122.4194 37.7749)'), -- User location
    5000                                              -- 5000 meters = 5 km
)
ORDER BY distance_m;
 
-- Method 2: ST_Buffer + ST_Intersects (less efficient, avoid)
-- Creates a circular polygon and tests intersection
SELECT id, name
FROM hospitals
WHERE ST_Intersects(
    location,
    ST_Buffer(ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)::geography, 5000)::geometry
);
-- Slower: ST_Buffer is expensive, and the polygon has many vertices
 
-- Method 3: Bounding box pre-filter + exact distance (manual optimization)
-- Useful when you can't use geography type directly
WITH params AS (
    SELECT 
        ST_GeomFromText('POINT(-122.4194 37.7749)', 4326) AS center,
        5000.0 AS radius_m,
        5000.0 / 111320.0 AS radius_deg  -- Approximate degrees (at this latitude)
)
SELECT h.id, h.name,
       ST_Distance(h.location::geography, p.center::geography) AS distance_m
FROM hospitals h, params p
WHERE h.location && ST_Expand(p.center, p.radius_deg)  -- Fast MBR filter
  AND ST_DWithin(h.location::geography, p.center::geography, p.radius_m)  -- Exact filter
ORDER BY distance_m;

Geography vs. Geometry for Distance

ST_Distance on geometry type returns distance in the coordinate system's units—degrees for WGS84, which is meaningless as distance. Always cast to geography for distance in meters, or transform to a local projected coordinate system (like UTM). The geography cast adds computation cost but ensures correct results.

Performance Tip: Index Usage with ST_DWithin

ST_DWithin is specially optimized to use spatial indexes effectively. Internally it:

Expands the search point by the radius to create a search box
Uses the R-tree index to find candidates in the box
Calculates exact distance only for candidates

This is why ST_DWithin is much faster than ST_Distance(a,b) < radius—the latter cannot use the index and must calculate distance for every row.

-- SLOW: Can't use index efficiently
SELECT * FROM hospitals
WHERE ST_Distance(location::geography, $1::geography) < 5000;

-- FAST: Uses index via internal bounding box expansion
SELECT * FROM hospitals  
WHERE ST_DWithin(location::geography, $1::geography, 5000);

Nearest Neighbor Queries: K-NN Search

Nearest neighbor (k-NN) queries find the k closest objects to a reference point. This powers "find closest" features: nearest ATM, closest store, 5 nearest gas stations.

knn_queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Find 5 nearest gas stations to user location
-- Using the <-> distance operator (enables index-accelerated k-NN)
SELECT id, name, brand,
       location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326) AS distance
FROM gas_stations
ORDER BY location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)
LIMIT 5;
 
-- K-NN with actual distance in meters
SELECT id, name, brand,
       ST_Distance(location::geography, 
                   ST_GeographyFromText('POINT(-122.4194 37.7749)')) AS distance_m
FROM gas_stations
ORDER BY location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)
LIMIT 5;
 
-- K-NN with additional filter (careful with ordering!)
-- WRONG: Filter applied after ORDER BY LIMIT, may miss valid results
SELECT id, name FROM gas_stations
WHERE brand = 'Shell'
ORDER BY location <-> $1
LIMIT 5;  -- Gets 5 closest overall, then filters to Shell (may return < 5)
 
-- CORRECT: Filter within the sorting scope
SELECT id, name FROM (
    SELECT id, name, brand, location <-> $1 AS dist
    FROM gas_stations
    WHERE brand = 'Shell'  -- Filter before ordering
    ORDER BY dist
    LIMIT 5
) sub;
 
-- K-NN for polygons: Distance to nearest edge
SELECT p.id, p.name,
       ST_Distance(p.boundary::geography, $1::geography) AS distance_to_edge_m
FROM parks p
ORDER BY p.boundary <-> $1
LIMIT 3;

The <-> Operator is Magic

The <-> operator in PostGIS triggers index-accelerated k-NN search. It uses the R-tree's priority-queue traversal to find nearest objects without computing all distances. Without LIMIT, it's useless—with LIMIT k, it's extremely efficient: O(log n + k) instead of O(n log k) for sorting all distances.

K-NN Query Plan Analysis:

EXPLAIN (ANALYZE) SELECT * FROM gas_stations
ORDER BY location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)
LIMIT 5;

Limit (cost=0.29..2.35 rows=5)
  ->  Index Scan using gas_stations_location_idx on gas_stations
        Order By: (location <-> '...'::geometry)

Note: Index Scan, not Bitmap Index Scan or Seq Scan. The index directly returns rows in distance order. This is the ideal k-NN plan.

Anti-Pattern Detection:

If you see Sort in the plan for a k-NN query, the index isn't being used:

Sort (cost=15234.12..15484.32 rows=100000)   -- BAD: Sorting all rows!
  Sort Key: (location <-> '...')
  ->  Seq Scan on gas_stations

Causes: missing index, wrong SRID, complex expression, or old planner statistics.

Spatial Predicates: Contains, Within, Intersects

Spatial predicates test topological relationships between geometries. Each has specific semantics based on the DE-9IM model we discussed earlier.

Common PostGIS Spatial Predicates
Predicate	Returns TRUE When	Use Case
ST_Intersects(A, B)	A and B share any point	"Do these areas overlap at all?"
ST_Contains(A, B)	B is completely inside A	"Is this point in the delivery zone?"
ST_Within(A, B)	A is completely inside B	Same as Contains with swapped args
ST_Covers(A, B)	No point of B is outside A	More robust than Contains for boundary cases
ST_Crosses(A, B)	Geometries cross each other	"Does this road cross the river?"
ST_Touches(A, B)	Share boundary but not interior	"Do these parcels share an edge?"
ST_Overlaps(A, B)	Same dimension, partial intersection	"Do these flood zones partially overlap?"
ST_Equals(A, B)	Geometrically identical	Deduplication, version comparison

spatial_predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Point-in-polygon: Is delivery address in service area?
SELECT EXISTS(
    SELECT 1 FROM service_zones
    WHERE ST_Contains(boundary, ST_GeomFromText('POINT(-122.420 37.780)', 4326))
      AND zone_type = 'delivery'
) AS is_serviceable;
 
-- Find which congressional district contains an address
SELECT district_id, representative_name
FROM congressional_districts
WHERE ST_Contains(boundary, ST_GeomFromText('POINT(-122.420 37.780)', 4326));
 
-- Find all roads that cross a specific river
SELECT r.road_name, r.road_type
FROM roads r, rivers rv
WHERE rv.name = 'Mississippi River'
  AND ST_Crosses(r.geom, rv.geom);
 
-- Find neighboring parcels (share a boundary)
SELECT a.parcel_id AS parcel_a, b.parcel_id AS parcel_b
FROM parcels a, parcels b
WHERE a.parcel_id < b.parcel_id  -- Avoid duplicates
  AND ST_Touches(a.boundary, b.boundary);
 
-- Find overlapping claims (partial intersection)
SELECT c1.claim_id, c2.claim_id,
       ST_Area(ST_Intersection(c1.geom, c2.geom)) AS overlap_area
FROM mining_claims c1, mining_claims c2
WHERE c1.claim_id < c2.claim_id
  AND ST_Overlaps(c1.geom, c2.geom);

Contains vs. Covers

ST_Contains requires B's interior to have at least one point in A's interior—a point exactly on A's boundary is NOT considered contained. ST_Covers is more intuitive: any point of B that exists must be in A (including boundaries). For robustness, prefer ST_Covers/ST_CoveredBy over ST_Contains/ST_Within.

Spatial Joins: Relating Datasets by Location

Spatial joins combine rows from two tables based on spatial relationships—analogous to regular JOINs but using geometry predicates instead of key equality. This is one of the most powerful yet computationally expensive spatial operations.

spatial_joins.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Assign each customer to their census tract
SELECT c.customer_id, c.name, ct.tract_id, ct.median_income
FROM customers c
JOIN census_tracts ct ON ST_Contains(ct.boundary, c.location);
 
-- Count restaurants per neighborhood  
SELECT n.name AS neighborhood, 
       COUNT(r.id) AS restaurant_count,
       COUNT(CASE WHEN r.rating >= 4.5 THEN 1 END) AS top_rated_count
FROM neighborhoods n
LEFT JOIN restaurants r ON ST_Contains(n.boundary, r.location)
GROUP BY n.name
ORDER BY restaurant_count DESC;
 
-- Find all parcels affected by flood zone expansion
SELECT p.parcel_id, p.owner_name, p.appraised_value,
       ST_Area(ST_Intersection(p.boundary, f.zone_geom)) AS affected_area
FROM parcels p
JOIN flood_zones f ON ST_Intersects(p.boundary, f.zone_geom)
WHERE f.zone_name = 'Proposed Expansion Zone 2024';
 
-- Distance-based join: Link each school to nearest fire station
SELECT DISTINCT ON (s.id)
    s.name AS school_name,
    fs.name AS nearest_fire_station,
    ST_Distance(s.location::geography, fs.location::geography) AS distance_m
FROM schools s
CROSS JOIN LATERAL (
    SELECT name, location
    FROM fire_stations
    ORDER BY location <-> s.location
    LIMIT 1
) fs;
 
-- Many-to-many spatial join: Properties within 1km of train stations
SELECT p.property_id, p.address, p.price,
       s.station_name,
       ST_Distance(p.location::geography, s.location::geography) AS distance_m
FROM properties p
JOIN train_stations s 
    ON ST_DWithin(p.location::geography, s.location::geography, 1000);

Spatial Join Performance

Spatial joins can be extremely expensive. A naive nested-loop join between two tables of 100K rows each would require 10 billion geometry comparisons. Spatial indexes reduce this dramatically, but watch for: (1) missing indexes on either table, (2) complex geometries inflating refine costs, (3) high spatial overlap causing many pairs to pass the filter stage.

LATERAL Joins for K-NN Relationships:

When you need to find the nearest neighbor for EACH row in a table, use CROSS JOIN LATERAL:

-- For each customer, find their 3 nearest stores
SELECT c.customer_id, c.name, nearest_stores.*
FROM customers c
CROSS JOIN LATERAL (
    SELECT s.store_id, s.store_name,
           ST_Distance(c.location::geography, s.location::geography) AS dist
    FROM stores s
    ORDER BY s.location <-> c.location
    LIMIT 3
) AS nearest_stores;

The LATERAL keyword allows the subquery to reference the outer table (c.location), enabling per-row k-NN searches. The <-> operator ensures each subquery uses the index efficiently.

Query Optimization Techniques

Spatial query performance depends heavily on proper indexing and query structure. Here are critical optimization techniques for production systems.

Essential Optimization Strategies

•Create appropriate indexes — Use GIST for most cases; GIN for contains operations on complex geometries; BRIN for spatially sorted data.
•Use ST_DWithin instead of ST_Distance < X — ST_DWithin uses index bounding box expansion; ST_Distance comparison cannot.
•Simplify complex geometries — Consider storing simplified versions for filtering; use exact geometries only in final refinement.
•Apply non-spatial filters first — When combining spatial and non-spatial predicates, ensure non-spatial indexes can reduce candidates.
•Use explicit && for MBR-only tests — When exact geometry testing isn't needed (e.g., point data), && is faster.
•LIMIT early for k-NN — Ensure LIMIT appears at the right level to enable index-accelerated ordering.
•ANALYZE after bulk changes — PostgreSQL needs current statistics for optimal spatial query planning.

optimization_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Create optimal spatial indexes
CREATE INDEX idx_restaurants_location 
ON restaurants USING GIST (location);
 
-- Analyze to update planner statistics
ANALYZE restaurants;
 
-- Check if index is being used
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM restaurants
WHERE ST_DWithin(location::geography, $1, 5000);
 
-- Optimize complex polygon queries with simplified versions
ALTER TABLE parks ADD COLUMN boundary_simplified geometry;
 
UPDATE parks 
SET boundary_simplified = ST_Simplify(boundary, 0.0001);
 
CREATE INDEX idx_parks_simplified ON parks USING GIST (boundary_simplified);
 
-- Query uses simplified geometry for filtering, exact for final check
SELECT p.*
FROM parks p
WHERE p.boundary_simplified && ST_Expand($query, 0.01)  -- Fast filter
  AND ST_Intersects(p.boundary, $query);                -- Exact refine
 
-- Partition large tables spatially (PostgreSQL 11+)
CREATE TABLE spatial_data (
    id BIGINT,
    location GEOMETRY(Point, 4326),
    data JSONB
) PARTITION BY RANGE (ST_X(location));
 
-- Create partitions for longitude ranges
CREATE TABLE spatial_data_west PARTITION OF spatial_data
    FOR VALUES FROM (-180) TO (-100);
CREATE TABLE spatial_data_central PARTITION OF spatial_data
    FOR VALUES FROM (-100) TO (-80);
CREATE TABLE spatial_data_east PARTITION OF spatial_data  
    FOR VALUES FROM (-80) TO (180);

Cluster for Spatial Locality

PostgreSQL's CLUSTER command physically reorders table rows to match index order. For spatial indexes: CLUSTER restaurants USING idx_restaurants_location; This dramatically improves range query performance by reducing random I/O. Re-cluster periodically after bulk updates.

Common Patterns and Anti-Patterns

Patterns (DO This)

•Use ST_DWithin(geom, point, dist) for radius queries
•Use ORDER BY geom <-> point LIMIT k for k-NN
•Cast to ::geography for accurate distance in meters
•Create GIST index on geometry columns
•Use && for bbox-only tests on simple geometries
•Apply LIMIT with ORDER BY <-> for index usage
•Store coordinates in consistent SRID (4326 for GPS)
•Use LATERAL joins for per-row k-NN queries

Anti-Patterns (DON'T Do This)

•WHERE ST_Distance(a, b) < X (can't use index)
•ORDER BY ST_Distance(a, b) without LIMIT
•Distance on geometry in degrees (meaningless)
•Missing spatial indexes on join columns
•ST_Buffer for radius queries (slow polygon creation)
•Mixing SRIDs without explicit ST_Transform
•Storing coordinates as separate lat/lng columns
•Complex functions in WHERE obscuring index usage

antipattern_fixes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- ANTI-PATTERN: Distance comparison (can't use index)
SELECT * FROM stores
WHERE ST_Distance(location::geography, $1) < 5000;
 
-- FIX: Use ST_DWithin (uses index)
SELECT * FROM stores  
WHERE ST_DWithin(location::geography, $1, 5000);
 
 
-- ANTI-PATTERN: K-NN without LIMIT (computes all distances)
SELECT *, ST_Distance(location, $1) AS dist FROM stores
ORDER BY dist;
 
-- FIX: Use <-> operator with LIMIT
SELECT *, location <-> $1 AS dist FROM stores
ORDER BY location <-> $1
LIMIT 10;
 
 
-- ANTI-PATTERN: Buffer for radius (creates expensive polygon)
SELECT * FROM stores
WHERE ST_Intersects(location, ST_Buffer($1::geography, 5000));
 
-- FIX: ST_DWithin (optimized for this exact purpose)  
SELECT * FROM stores
WHERE ST_DWithin(location::geography, $1, 5000);
 
 
-- ANTI-PATTERN: Subquery hides optimization opportunity
SELECT * FROM stores
WHERE id IN (
    SELECT store_id FROM deliveries 
    WHERE ST_DWithin(delivery_point, stores.location, 1000)  -- Can't reference outer!
);
 
-- FIX: Use explicit JOIN or LATERAL
SELECT DISTINCT s.* FROM stores s
JOIN deliveries d ON ST_DWithin(d.delivery_point, s.location::geography, 1000);

Summary: Mastering Spatial Queries

We've bridged R-tree theory with practical spatial SQL. Let's consolidate the essential knowledge:

Key Takeaways

•Range queries use ST_Intersects or && — The && operator tests MBR overlap only; ST_Intersects tests exact geometry.
•Radius queries use ST_DWithin — Not ST_Distance comparison; ST_DWithin enables index-accelerated bounding box filtering.
•K-NN uses ORDER BY geom <-> point LIMIT k — The <-> operator triggers index-accelerated nearest-neighbor search.
•Spatial predicates express topological relationships — Contains, Within, Intersects, Crosses, Touches each have specific semantics.
•Spatial joins are powerful but expensive — Ensure both tables have spatial indexes; use LATERAL for per-row k-NN.
•Geography type for accurate distance — Cast to ::geography for distances in meters; geometry distances are in coordinate units.
•Anti-patterns destroy performance — Distance comparisons, missing LIMITs, and SRID mismatches are common pitfalls.

Module Complete:

You have now completed the Spatial Indexes module. You understand spatial data representation, R-tree structure and algorithms, MBR approximation strategies, and practical SQL query patterns. This knowledge enables you to design efficient location-based applications, diagnose spatial query performance issues, and make informed decisions about spatial database architecture.

Next Steps:

Practice with real PostGIS installations
Explore specialized indexes (SP-GiST, BRIN for spatial)
Study 3D and temporal spatial extensions
Investigate spatial analytics and clustering algorithms

Module Complete

Congratulations! You've mastered Spatial Indexes—from the theoretical foundations of multi-dimensional indexing through R-tree algorithms to production SQL patterns. You're now equipped to build and optimize location-based database systems at scale.