Loading content...
Every location-aware application—from food delivery to emergency response, from real estate search to fleet management—depends on efficient spatial queries. The R-tree algorithms we've studied are the engine; now we explore how applications drive that engine.
This page bridges theory and practice: you'll learn the spatial query vocabulary, master PostGIS/SQL syntax, understand query optimizer behavior, and develop intuition for writing queries that leverage indexes effectively. By the end, you'll be able to design spatial database schemas and write queries that scale to millions of objects.
By the end of this page, you will master range queries (window and radius), nearest-neighbor queries with proper index usage, containment and intersection predicates, spatial joins for relating two datasets, and performance optimization techniques. Real SQL examples using PostGIS demonstrate each concept.
Range queries (also called window queries) find all objects within a specified rectangular region. This is the most fundamental spatial query—used whenever a user views a map to show all visible features.
1234567891011121314151617181920212223242526272829303132333435363738394041
-- Basic range query: Find all restaurants in visible map area-- Using ST_Intersects (exact geometry test)SELECT id, name, cuisine, ST_AsText(location)FROM restaurantsWHERE ST_Intersects( location, ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326) -- MakeEnvelope(minX, minY, maxX, maxY, SRID)); -- Using && operator (MBR-only test, faster but may include false positives)-- Useful when geometries are simple pointsSELECT id, name, cuisineFROM restaurantsWHERE location && ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326); -- Combining with non-spatial filters (index assists spatial, then filters)SELECT id, name, rating, price_levelFROM restaurantsWHERE location && ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326) AND cuisine = 'Italian' AND rating >= 4.0 AND is_open = trueORDER BY rating DESCLIMIT 20; -- EXPLAIN output showing index usageEXPLAIN (ANALYZE, BUFFERS) SELECT * FROM restaurantsWHERE ST_Intersects(location, ST_MakeEnvelope(-122.45, 37.75, -122.40, 37.80, 4326)); /*Bitmap Heap Scan on restaurants (cost=4.39..358.72 rows=84 width=168) Recheck Cond: (location && '...'::geometry) Filter: st_intersects(location, '...'::geometry) Heap Blocks: exact=82 -> Bitmap Index Scan on restaurants_location_idx (cost=0.00..4.37 rows=84) Index Cond: (location && '...'::geometry)Planning Time: 0.285 msExecution Time: 2.341 ms*/In PostGIS, && tests bounding box overlap only (MBR intersection). It's faster than ST_Intersects because it skips exact geometry testing. For point data where MBR = point, && and ST_Intersects are equivalent. For polygons, && may return false positives that ST_Intersects filters out.
Query Pattern: Dynamic Map Bounds
Web mapping applications send the visible bounds with each pan/zoom:
// Frontend sends viewport bounds
const bounds = map.getBounds();
const query = {
minLng: bounds.getWest(),
maxLng: bounds.getEast(),
minLat: bounds.getSouth(),
maxLat: bounds.getNorth()
};
// API call
fetch(`/api/restaurants?bbox=${query.minLng},${query.minLat},${query.maxLng},${query.maxLat}`);
-- Backend query (parameterized)
SELECT id, name, ST_AsGeoJSON(location) as geojson
FROM restaurants
WHERE location && ST_MakeEnvelope($1, $2, $3, $4, 4326)
LIMIT 1000; -- Prevent overwhelming the client
Radius queries find all objects within a specified distance from a point. Unlike rectangular range queries, radius queries define a circular search region—essential for "find nearby" features.
12345678910111213141516171819202122232425262728293031323334353637
-- Find all hospitals within 5 km of user's location-- Method 1: ST_DWithin (optimized for spatial indexes)SELECT id, name, address, ST_Distance(location::geography, ST_GeographyFromText('POINT(-122.4194 37.7749)')) AS distance_mFROM hospitalsWHERE ST_DWithin( location::geography, -- Cast to geography for meters ST_GeographyFromText('POINT(-122.4194 37.7749)'), -- User location 5000 -- 5000 meters = 5 km)ORDER BY distance_m; -- Method 2: ST_Buffer + ST_Intersects (less efficient, avoid)-- Creates a circular polygon and tests intersectionSELECT id, nameFROM hospitalsWHERE ST_Intersects( location, ST_Buffer(ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)::geography, 5000)::geometry);-- Slower: ST_Buffer is expensive, and the polygon has many vertices -- Method 3: Bounding box pre-filter + exact distance (manual optimization)-- Useful when you can't use geography type directlyWITH params AS ( SELECT ST_GeomFromText('POINT(-122.4194 37.7749)', 4326) AS center, 5000.0 AS radius_m, 5000.0 / 111320.0 AS radius_deg -- Approximate degrees (at this latitude))SELECT h.id, h.name, ST_Distance(h.location::geography, p.center::geography) AS distance_mFROM hospitals h, params pWHERE h.location && ST_Expand(p.center, p.radius_deg) -- Fast MBR filter AND ST_DWithin(h.location::geography, p.center::geography, p.radius_m) -- Exact filterORDER BY distance_m;ST_Distance on geometry type returns distance in the coordinate system's units—degrees for WGS84, which is meaningless as distance. Always cast to geography for distance in meters, or transform to a local projected coordinate system (like UTM). The geography cast adds computation cost but ensures correct results.
Performance Tip: Index Usage with ST_DWithin
ST_DWithin is specially optimized to use spatial indexes effectively. Internally it:
This is why ST_DWithin is much faster than ST_Distance(a,b) < radius—the latter cannot use the index and must calculate distance for every row.
-- SLOW: Can't use index efficiently
SELECT * FROM hospitals
WHERE ST_Distance(location::geography, $1::geography) < 5000;
-- FAST: Uses index via internal bounding box expansion
SELECT * FROM hospitals
WHERE ST_DWithin(location::geography, $1::geography, 5000);
Nearest neighbor (k-NN) queries find the k closest objects to a reference point. This powers "find closest" features: nearest ATM, closest store, 5 nearest gas stations.
1234567891011121314151617181920212223242526272829303132333435363738
-- Find 5 nearest gas stations to user location-- Using the <-> distance operator (enables index-accelerated k-NN)SELECT id, name, brand, location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326) AS distanceFROM gas_stationsORDER BY location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)LIMIT 5; -- K-NN with actual distance in metersSELECT id, name, brand, ST_Distance(location::geography, ST_GeographyFromText('POINT(-122.4194 37.7749)')) AS distance_mFROM gas_stationsORDER BY location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)LIMIT 5; -- K-NN with additional filter (careful with ordering!)-- WRONG: Filter applied after ORDER BY LIMIT, may miss valid resultsSELECT id, name FROM gas_stationsWHERE brand = 'Shell'ORDER BY location <-> $1LIMIT 5; -- Gets 5 closest overall, then filters to Shell (may return < 5) -- CORRECT: Filter within the sorting scopeSELECT id, name FROM ( SELECT id, name, brand, location <-> $1 AS dist FROM gas_stations WHERE brand = 'Shell' -- Filter before ordering ORDER BY dist LIMIT 5) sub; -- K-NN for polygons: Distance to nearest edgeSELECT p.id, p.name, ST_Distance(p.boundary::geography, $1::geography) AS distance_to_edge_mFROM parks pORDER BY p.boundary <-> $1LIMIT 3;The <-> operator in PostGIS triggers index-accelerated k-NN search. It uses the R-tree's priority-queue traversal to find nearest objects without computing all distances. Without LIMIT, it's useless—with LIMIT k, it's extremely efficient: O(log n + k) instead of O(n log k) for sorting all distances.
K-NN Query Plan Analysis:
EXPLAIN (ANALYZE) SELECT * FROM gas_stations
ORDER BY location <-> ST_GeomFromText('POINT(-122.4194 37.7749)', 4326)
LIMIT 5;
Limit (cost=0.29..2.35 rows=5)
-> Index Scan using gas_stations_location_idx on gas_stations
Order By: (location <-> '...'::geometry)
Note: Index Scan, not Bitmap Index Scan or Seq Scan. The index directly returns rows in distance order. This is the ideal k-NN plan.
Anti-Pattern Detection:
If you see Sort in the plan for a k-NN query, the index isn't being used:
Sort (cost=15234.12..15484.32 rows=100000) -- BAD: Sorting all rows!
Sort Key: (location <-> '...')
-> Seq Scan on gas_stations
Causes: missing index, wrong SRID, complex expression, or old planner statistics.
Spatial predicates test topological relationships between geometries. Each has specific semantics based on the DE-9IM model we discussed earlier.
| Predicate | Returns TRUE When | Use Case |
|---|---|---|
| ST_Intersects(A, B) | A and B share any point | "Do these areas overlap at all?" |
| ST_Contains(A, B) | B is completely inside A | "Is this point in the delivery zone?" |
| ST_Within(A, B) | A is completely inside B | Same as Contains with swapped args |
| ST_Covers(A, B) | No point of B is outside A | More robust than Contains for boundary cases |
| ST_Crosses(A, B) | Geometries cross each other | "Does this road cross the river?" |
| ST_Touches(A, B) | Share boundary but not interior | "Do these parcels share an edge?" |
| ST_Overlaps(A, B) | Same dimension, partial intersection | "Do these flood zones partially overlap?" |
| ST_Equals(A, B) | Geometrically identical | Deduplication, version comparison |
123456789101112131415161718192021222324252627282930
-- Point-in-polygon: Is delivery address in service area?SELECT EXISTS( SELECT 1 FROM service_zones WHERE ST_Contains(boundary, ST_GeomFromText('POINT(-122.420 37.780)', 4326)) AND zone_type = 'delivery') AS is_serviceable; -- Find which congressional district contains an addressSELECT district_id, representative_nameFROM congressional_districtsWHERE ST_Contains(boundary, ST_GeomFromText('POINT(-122.420 37.780)', 4326)); -- Find all roads that cross a specific riverSELECT r.road_name, r.road_typeFROM roads r, rivers rvWHERE rv.name = 'Mississippi River' AND ST_Crosses(r.geom, rv.geom); -- Find neighboring parcels (share a boundary)SELECT a.parcel_id AS parcel_a, b.parcel_id AS parcel_bFROM parcels a, parcels bWHERE a.parcel_id < b.parcel_id -- Avoid duplicates AND ST_Touches(a.boundary, b.boundary); -- Find overlapping claims (partial intersection)SELECT c1.claim_id, c2.claim_id, ST_Area(ST_Intersection(c1.geom, c2.geom)) AS overlap_areaFROM mining_claims c1, mining_claims c2WHERE c1.claim_id < c2.claim_id AND ST_Overlaps(c1.geom, c2.geom);ST_Contains requires B's interior to have at least one point in A's interior—a point exactly on A's boundary is NOT considered contained. ST_Covers is more intuitive: any point of B that exists must be in A (including boundaries). For robustness, prefer ST_Covers/ST_CoveredBy over ST_Contains/ST_Within.
Spatial joins combine rows from two tables based on spatial relationships—analogous to regular JOINs but using geometry predicates instead of key equality. This is one of the most powerful yet computationally expensive spatial operations.
1234567891011121314151617181920212223242526272829303132333435363738394041
-- Assign each customer to their census tractSELECT c.customer_id, c.name, ct.tract_id, ct.median_incomeFROM customers cJOIN census_tracts ct ON ST_Contains(ct.boundary, c.location); -- Count restaurants per neighborhood SELECT n.name AS neighborhood, COUNT(r.id) AS restaurant_count, COUNT(CASE WHEN r.rating >= 4.5 THEN 1 END) AS top_rated_countFROM neighborhoods nLEFT JOIN restaurants r ON ST_Contains(n.boundary, r.location)GROUP BY n.nameORDER BY restaurant_count DESC; -- Find all parcels affected by flood zone expansionSELECT p.parcel_id, p.owner_name, p.appraised_value, ST_Area(ST_Intersection(p.boundary, f.zone_geom)) AS affected_areaFROM parcels pJOIN flood_zones f ON ST_Intersects(p.boundary, f.zone_geom)WHERE f.zone_name = 'Proposed Expansion Zone 2024'; -- Distance-based join: Link each school to nearest fire stationSELECT DISTINCT ON (s.id) s.name AS school_name, fs.name AS nearest_fire_station, ST_Distance(s.location::geography, fs.location::geography) AS distance_mFROM schools sCROSS JOIN LATERAL ( SELECT name, location FROM fire_stations ORDER BY location <-> s.location LIMIT 1) fs; -- Many-to-many spatial join: Properties within 1km of train stationsSELECT p.property_id, p.address, p.price, s.station_name, ST_Distance(p.location::geography, s.location::geography) AS distance_mFROM properties pJOIN train_stations s ON ST_DWithin(p.location::geography, s.location::geography, 1000);Spatial joins can be extremely expensive. A naive nested-loop join between two tables of 100K rows each would require 10 billion geometry comparisons. Spatial indexes reduce this dramatically, but watch for: (1) missing indexes on either table, (2) complex geometries inflating refine costs, (3) high spatial overlap causing many pairs to pass the filter stage.
LATERAL Joins for K-NN Relationships:
When you need to find the nearest neighbor for EACH row in a table, use CROSS JOIN LATERAL:
-- For each customer, find their 3 nearest stores
SELECT c.customer_id, c.name, nearest_stores.*
FROM customers c
CROSS JOIN LATERAL (
SELECT s.store_id, s.store_name,
ST_Distance(c.location::geography, s.location::geography) AS dist
FROM stores s
ORDER BY s.location <-> c.location
LIMIT 3
) AS nearest_stores;
The LATERAL keyword allows the subquery to reference the outer table (c.location), enabling per-row k-NN searches. The <-> operator ensures each subquery uses the index efficiently.
Spatial query performance depends heavily on proper indexing and query structure. Here are critical optimization techniques for production systems.
12345678910111213141516171819202122232425262728293031323334353637383940
-- Create optimal spatial indexesCREATE INDEX idx_restaurants_location ON restaurants USING GIST (location); -- Analyze to update planner statisticsANALYZE restaurants; -- Check if index is being usedEXPLAIN (ANALYZE, BUFFERS) SELECT * FROM restaurantsWHERE ST_DWithin(location::geography, $1, 5000); -- Optimize complex polygon queries with simplified versionsALTER TABLE parks ADD COLUMN boundary_simplified geometry; UPDATE parks SET boundary_simplified = ST_Simplify(boundary, 0.0001); CREATE INDEX idx_parks_simplified ON parks USING GIST (boundary_simplified); -- Query uses simplified geometry for filtering, exact for final checkSELECT p.*FROM parks pWHERE p.boundary_simplified && ST_Expand($query, 0.01) -- Fast filter AND ST_Intersects(p.boundary, $query); -- Exact refine -- Partition large tables spatially (PostgreSQL 11+)CREATE TABLE spatial_data ( id BIGINT, location GEOMETRY(Point, 4326), data JSONB) PARTITION BY RANGE (ST_X(location)); -- Create partitions for longitude rangesCREATE TABLE spatial_data_west PARTITION OF spatial_data FOR VALUES FROM (-180) TO (-100);CREATE TABLE spatial_data_central PARTITION OF spatial_data FOR VALUES FROM (-100) TO (-80);CREATE TABLE spatial_data_east PARTITION OF spatial_data FOR VALUES FROM (-80) TO (180);PostgreSQL's CLUSTER command physically reorders table rows to match index order. For spatial indexes: CLUSTER restaurants USING idx_restaurants_location; This dramatically improves range query performance by reducing random I/O. Re-cluster periodically after bulk updates.
ST_DWithin(geom, point, dist) for radius queriesORDER BY geom <-> point LIMIT k for k-NN::geography for accurate distance in meters&& for bbox-only tests on simple geometriesWHERE ST_Distance(a, b) < X (can't use index)ORDER BY ST_Distance(a, b) without LIMITST_Buffer for radius queries (slow polygon creation)1234567891011121314151617181920212223242526272829303132333435363738
-- ANTI-PATTERN: Distance comparison (can't use index)SELECT * FROM storesWHERE ST_Distance(location::geography, $1) < 5000; -- FIX: Use ST_DWithin (uses index)SELECT * FROM stores WHERE ST_DWithin(location::geography, $1, 5000); -- ANTI-PATTERN: K-NN without LIMIT (computes all distances)SELECT *, ST_Distance(location, $1) AS dist FROM storesORDER BY dist; -- FIX: Use <-> operator with LIMITSELECT *, location <-> $1 AS dist FROM storesORDER BY location <-> $1LIMIT 10; -- ANTI-PATTERN: Buffer for radius (creates expensive polygon)SELECT * FROM storesWHERE ST_Intersects(location, ST_Buffer($1::geography, 5000)); -- FIX: ST_DWithin (optimized for this exact purpose) SELECT * FROM storesWHERE ST_DWithin(location::geography, $1, 5000); -- ANTI-PATTERN: Subquery hides optimization opportunitySELECT * FROM storesWHERE id IN ( SELECT store_id FROM deliveries WHERE ST_DWithin(delivery_point, stores.location, 1000) -- Can't reference outer!); -- FIX: Use explicit JOIN or LATERALSELECT DISTINCT s.* FROM stores sJOIN deliveries d ON ST_DWithin(d.delivery_point, s.location::geography, 1000);We've bridged R-tree theory with practical spatial SQL. Let's consolidate the essential knowledge:
Module Complete:
You have now completed the Spatial Indexes module. You understand spatial data representation, R-tree structure and algorithms, MBR approximation strategies, and practical SQL query patterns. This knowledge enables you to design efficient location-based applications, diagnose spatial query performance issues, and make informed decisions about spatial database architecture.
Next Steps:
Congratulations! You've mastered Spatial Indexes—from the theoretical foundations of multi-dimensional indexing through R-tree algorithms to production SQL patterns. You're now equipped to build and optimize location-based database systems at scale.