Loading learning content...
Real-world queries rarely filter on a single column. Most selection predicates combine multiple conditions: "Find orders placed by customer X in the last 30 days with status Y." These conjunctive selections—predicates connected by AND—present unique implementation challenges and optimization opportunities.
The challenge arises because multiple filtering conditions can be satisfied through various strategies: using a composite index that covers all predicates, using separate indexes and intersecting results, or using one index followed by in-memory filtering. Each approach has distinct performance characteristics, and the optimal choice depends on selectivity, available indexes, and data distribution.
Mastering conjunctive selection implementation is essential because it represents the most common pattern in production query workloads, and poor handling of multi-predicate filters is a leading cause of query performance problems.
By the end of this page, you will understand how databases evaluate conjunctive predicates, master the strategies for combining multiple index accesses, analyze cost models for index intersection operations, and learn optimal filter ordering for residual predicate evaluation.
A conjunctive selection is a selection operation where the predicate consists of multiple conditions combined with logical AND. Formally:
σ(P₁ ∧ P₂ ∧ ... ∧ Pₙ)(R)
where P₁, P₂, ..., Pₙ are individual predicates. A tuple satisfies the conjunctive selection if and only if it satisfies ALL individual predicates.
Mathematical Properties:
Conjunctive selections exhibit important properties that enable optimization:
Selectivity Multiplication: If predicates are independent, combined selectivity = s₁ × s₂ × ... × sₙ. For 10% selectivity on each of 3 predicates: 0.1 × 0.1 × 0.1 = 0.1% (1 in 1000 tuples match).
Associativity: The order of evaluation doesn't affect the result (though it affects performance).
Short-Circuit Potential: Once a tuple fails any predicate, remaining predicates need not be evaluated.
Decomposition: σ(P₁ ∧ P₂)(R) ≡ σ(P₁)(σ(P₂)(R)) ≡ σ(P₂)(σ(P₁)(R)). We can evaluate predicates in any order.
123456789101112131415161718192021222324252627282930
-- Conjunctive Selection Examples -- Example 1: Three-predicate conjunctionSELECT * FROM ordersWHERE customer_id = 12345 -- P1: Customer filter AND order_date > '2024-01-01' -- P2: Date range AND status = 'shipped'; -- P3: Status filter -- Selectivity analysis (assume independence):-- P1: 1 customer out of 100,000 = 0.001%-- P2: Last year out of 10 years = 10%-- P3: Shipped = 1 of 5 statuses = 20%-- Combined: 0.001% × 10% × 20% ≈ 0.0002%-- On 100M orders: ~200 matching rows -- Example 2: Range conjunctionsSELECT * FROM productsWHERE category = 'Electronics' AND price BETWEEN 100 AND 500 AND rating >= 4.0 AND in_stock = true; -- Example 3: Expression conjunctionsSELECT * FROM employeesWHERE department_id = 10 AND EXTRACT(YEAR FROM hire_date) = 2020 AND salary * 1.1 > 50000; -- Note: Expressions involving functions/calculations-- typically cannot use indexes directlyThe selectivity multiplication formula assumes predicate independence, which often doesn't hold. Correlated predicates (e.g., 'city' and 'zip_code') have actual selectivity different from the multiplicative estimate. Modern optimizers use multi-column histograms and correlation statistics to improve accuracy, but estimation errors for conjunctive predicates remain a significant source of poor query plans.
Database systems employ several strategies for implementing conjunctive selections, each with distinct trade-offs. The optimizer evaluates available options and chooses based on estimated cost.
Strategy 1: Composite Index Scan
The most efficient approach when a composite index exists on multiple predicate columns:
12345678910111213141516171819202122232425262728
-- Strategy 1: Composite Index Scan -- Composite index covering multiple predicates:CREATE INDEX idx_cust_date_status ON orders(customer_id, order_date, status); -- Query with matching predicates:SELECT * FROM ordersWHERE customer_id = 12345 AND order_date > '2024-01-01' AND status = 'shipped'; -- Execution:-- 1. Navigate B+tree to (customer_id=12345, order_date>'2024-01-01')-- 2. Scan leaf entries, filtering on status='shipped' (ICP if supported)-- 3. Fetch matching tuples from heap-- 4. Return results -- Cost analysis:-- Index traversal: ~4 pages (tree height)-- Leaf scan: Proportional to (customer's orders × date selectivity)-- Heap access: Only for fully matching entries -- This is optimal because:-- + All predicates evaluated using index-- + Minimal index pages scanned-- + Minimal heap accesses -- Limitation: Composite index must exist with columns in prefix-compatible orderStrategy 2: Single Index + Filter
When only one predicate has an index, use that index and filter the rest:
12345678910111213141516171819202122232425262728293031
-- Strategy 2: Single Index + Filter -- Only customer_id is indexed:CREATE INDEX idx_customer ON orders(customer_id); -- Query with multiple predicates:SELECT * FROM ordersWHERE customer_id = 12345 -- Index condition AND order_date > '2024-01-01' -- Filter condition 1 AND status = 'shipped'; -- Filter condition 2 -- Execution:-- 1. Index scan: Find all entries for customer_id=12345-- (Returns all orders by this customer, maybe 500 rows)-- 2. Heap access: Fetch each matching tuple-- 3. Filter: Apply order_date and status predicates-- 4. Return only tuples passing all filters -- PostgreSQL EXPLAIN output:-- Index Scan using idx_customer on orders-- Index Cond: (customer_id = 12345)-- Filter: (order_date > '2024-01-01' AND status = 'shipped')-- Rows Removed by Filter: 450-- (shows 500 fetched, 450 filtered out, 50 returned) -- Cost analysis:-- Index traversal: ~4 pages-- Leaf scan: Small for single customer-- Heap access: 500 random page reads (all customer orders)-- Filter CPU: 500 predicate evaluations-- Result: Only 50 rows (if ~10% match additional filters)Strategy 3: Index Intersection (Bitmap AND)
When multiple single-column indexes exist, combine their results:
123456789101112131415161718192021222324252627282930313233343536373839404142
-- Strategy 3: Index Intersection / Bitmap AND -- Separate indexes:CREATE INDEX idx_customer ON orders(customer_id);CREATE INDEX idx_date ON orders(order_date);CREATE INDEX idx_status ON orders(status); -- Query:SELECT * FROM ordersWHERE customer_id = 12345 AND order_date > '2024-01-01' AND status = 'shipped'; -- Execution (Bitmap AND approach):-- 1. Bitmap Index Scan on idx_customer:-- -> Build bitmap1: Pages containing customer 12345's orders-- 2. Bitmap Index Scan on idx_date:-- -> Build bitmap2: Pages with orders after 2024-01-01-- 3. Bitmap Index Scan on idx_status:-- -> Build bitmap3: Pages with shipped orders-- 4. BitmapAnd: Intersect bitmap1 ∩ bitmap2 ∩ bitmap3-- -> Result bitmap = pages containing potential matches-- 5. Bitmap Heap Scan: Fetch pages in result bitmap-- 6. Recheck: Verify all three conditions on each tuple -- PostgreSQL EXPLAIN output:-- Bitmap Heap Scan on orders-- Recheck Cond: (customer_id = 12345 AND order_date > ... AND status = ...)-- -> BitmapAnd-- -> Bitmap Index Scan on idx_customer-- Index Cond: (customer_id = 12345)-- -> Bitmap Index Scan on idx_date-- Index Cond: (order_date > '2024-01-01')-- -> Bitmap Index Scan on idx_status-- Index Cond: (status = 'shipped') -- Cost analysis:-- + Uses all three indexes to narrow candidates-- + Heap access limited to intersection pages-- + Sequential heap access (sorted bitmap)-- - Three index scans required-- - Bitmap operations have memory/CPU cost| Strategy | When Applicable | Advantages | Disadvantages |
|---|---|---|---|
| Composite Index | Index exists on all/most columns | Single index traversal, minimal I/O | Requires specific index, column order matters |
| Single Index + Filter | Index on most selective column | Uses best available index | Heap access for all index matches |
| Index Intersection | Separate indexes on each column | Combines multiple indexes | Multiple scans, bitmap overhead |
| Sequential + Filter | No useful indexes exist | Predictable sequential I/O | Reads entire table |
Generally, prefer composite indexes when available. Use single index + filter when one predicate is highly selective. Consider index intersection when no single index is selective enough but combining two or more significantly narrows candidates. Fall back to sequential scan when combined selectivity is still high (>20% of table).
The column order in a composite index profoundly affects its usefulness for conjunctive selections. Understanding the principles of column ordering enables designing indexes that serve multiple query patterns efficiently.
The Prefix Rule:
A composite index on (A, B, C) can be used for predicates involving:
But NOT for:
Column Ordering Principles:
1234567891011121314151617181920212223242526272829303132333435363738
-- Column Ordering Impact Analysis -- Scenario: Queries on (department_id, job_title, salary) -- Query patterns:-- Q1: WHERE department_id = 10 AND job_title = 'Engineer'-- Q2: WHERE department_id = 10 AND salary > 80000-- Q3: WHERE department_id = 10 AND job_title = 'Engineer' AND salary > 80000 -- Option A: Index on (department_id, job_title, salary)-- Q1: ✓ Uses (dept, job_title) as equality prefix - excellent-- Q2: ✓ Uses (dept) prefix, salary as filter - good-- Q3: ✓ Uses full prefix (dept, job_title), range on salary - excellent -- Option B: Index on (department_id, salary, job_title)-- Q1: ✓ Uses (dept), job_title as filter - good (not great)-- Q2: ✓ Uses (dept) prefix, range on salary - good-- Q3: ✓ Uses (dept), salary range, job_title as filter - moderate -- Analysis:-- Q1 is most frequent -> Option A is better-- Q1 has two equality predicates -> maximize prefix usage -- Column ordering for Q3:-- "dept = 10 AND job_title = 'Engineer' AND salary > 80000"-- -- With index (dept, job_title, salary):-- Navigate to (10, 'Engineer', _)-- Scan forward where salary > 80000-- Index entries are in salary order → efficient range scan-- -- With index (dept, salary, job_title):-- Navigate to (10, 80000, _)-- Scan forward through all salaries > 80000-- Filter on job_title = 'Engineer' during scan-- Much larger scan range! -- Rule: Equality columns BEFORE range columns in indexCovering Index Extensions:
When designing composite indexes for conjunctive selections, consider adding columns that don't participate in filtering but are needed in the SELECT clause. This creates a covering index that eliminates heap access entirely:
123456789101112131415161718192021222324252627282930313233343536
-- Covering Index Design for Conjunctive Selections -- Query to optimize:SELECT order_id, total, ship_dateFROM ordersWHERE customer_id = 12345 AND order_date > '2024-01-01' AND status = 'shipped'; -- Basic composite index (filter columns only):CREATE INDEX idx_filter ON orders(customer_id, order_date, status);-- Requires heap access for order_id, total, ship_date -- Covering composite index (includes SELECT columns):CREATE INDEX idx_covering ON orders( customer_id, order_date, status, order_id, -- Additional for SELECT total, -- Additional for SELECT ship_date -- Additional for SELECT); -- Result: Index-Only Scan - no heap access! -- Tradeoff:-- + Eliminates all heap I/O for this query-- + Can be 10-100x faster for large result sets-- - Larger index on disk-- - Higher maintenance cost (more columns to update)-- - May exceed index size limits (some DBs) -- PostgreSQL INCLUDE syntax (cleaner approach):CREATE INDEX idx_covering_v2 ON orders(customer_id, order_date, status) INCLUDE (order_id, total, ship_date);-- INCLUDE columns are stored in index but not used for searchingComposite indexes accelerate specific query patterns but increase storage requirements and write overhead. Each additional index column increases the per-entry size, reducing entries per page. Write operations must update all affected indexes. The optimal design balances query performance against maintenance cost, typically prioritizing the most frequent and performance-critical queries.
When no single index is sufficiently selective but multiple indexes exist, index intersection combines their results. This technique is particularly powerful for conjunctive selections where each predicate independently matches many rows, but their intersection is small.
How Bitmap AND Works:
The bitmap approach represents matching tuples as bitmaps (one bit per tuple or page), then performs logical AND to find the intersection:
123456789101112131415161718192021222324252627282930
-- Bitmap AND Internals -- Table: 1 million rows across 10,000 pages (~100 rows/page)-- Index 1: idx_status (status = 'active' matches 200,000 rows = 20%)-- Index 2: idx_region (region = 'EAST' matches 250,000 rows = 25%) -- Query: WHERE status = 'active' AND region = 'EAST'-- Expected matches: 20% × 25% = 5% = 50,000 rows (if independent) -- Bitmap Index Scan 1 (idx_status):-- Build bitmap where bit[i] = 1 if page i contains 'active' rows-- Approximately 4,000 pages have at least one 'active' row -- Bitmap Index Scan 2 (idx_region): -- Build bitmap where bit[i] = 1 if page i contains 'EAST' rows-- Approximately 5,000 pages have at least one 'EAST' row -- BitmapAnd operation:-- result[i] = bitmap1[i] AND bitmap2[i]-- Result: ~2,500 pages may contain matching rows -- Bitmap Heap Scan:-- Fetch 2,500 pages (in physical order)-- Recheck both conditions on all tuples in fetched pages-- Return ~50,000 tuples that satisfy both conditions -- Memory representation (lossy vs exact):-- Exact bitmap: 1 bit per tuple (1M bits = 125KB)-- Lossy bitmap: 1 bit per page (10K bits = 1.25KB)-- PostgreSQL uses work_mem to decide exact vs lossyCost Model for Index Intersection:
| Component | Cost Formula | Notes |
|---|---|---|
| Index Scan 1 | Height₁ + LeafPages₁ | First index traversal + leaf scan |
| Index Scan 2 | Height₂ + LeafPages₂ | Second index traversal + leaf scan |
| Bitmap Construction | O(matches₁) + O(matches₂) | CPU cost to build bitmaps |
| Bitmap AND | O(pages) | Fast bitwise AND operation |
| Heap Scan | PagesInIntersection | Sequential read of matching pages |
| Tuple Recheck | TuplesInPages × PredicateCost | Re-evaluate all predicates |
When Index Intersection Wins:
Index intersection is most beneficial when:
Intersection is significantly smaller than individual sets: If idx_A matches 20% and idx_B matches 25%, but intersection is 5%, we reduce heap access by 4-5×.
No composite index exists: If we only have single-column indexes, intersection combines them without requiring a new index.
Heap clustering is poor: Random heap access is expensive; reducing the number of pages fetched has high value.
Bitmaps fit in memory: When bitmaps must become lossy (page-level rather than tuple-level), the recheck cost increases.
1234567891011121314151617181920212223242526272829303132333435363738
-- When to use Intersection vs Single Index + Filter -- Scenario: Two predicates, two single-column indexes-- idx_A: covers predicate A (matches 10,000 rows)-- idx_B: covers predicate B (matches 8,000 rows)-- Intersection: ~800 rows (assuming some correlation) -- Option 1: Use idx_A + Filter BSELECT * FROM table WHERE A = ... AND B = ...;-- Cost: -- Index scan A: ~50 pages-- Heap access: 10,000 rows → ~2,000 pages (scattered)-- Filter B: 10,000 predicate evaluations-- Result: ~800 rows -- Option 2: Use idx_B + Filter A-- Cost:-- Index scan B: ~40 pages -- Heap access: 8,000 rows → ~1,600 pages (scattered)-- Filter A: 8,000 predicate evaluations-- Result: ~800 rows -- Option 3: Bitmap Index Intersection-- Cost:-- Index scan A: ~50 pages-- Index scan B: ~40 pages-- Bitmap AND: ~1 page worth of CPU-- Heap access: ~200 pages (intersection)-- Recheck: ~2,000 tuples on those pages-- Result: ~800 rows -- Analysis:-- Option 1: 50 + 2000 = 2050 pages-- Option 2: 40 + 1600 = 1640 pages-- Option 3: 50 + 40 + 200 = 290 pages -- Intersection wins dramatically when both indexes combined-- narrow down to far fewer pages than either alone!The optimizer automatically considers index intersection when it estimates lower cost than alternatives. You can hint toward intersection by setting enable_indexscan = off (forcing bitmap approach) or adjusting random_page_cost to reflect actual I/O characteristics. PostgreSQL tends to prefer bitmap intersection when matching rows are scattered across many pages.
When predicates cannot all be satisfied by indexes, the remaining filter predicates must be evaluated on each candidate tuple. The order in which these filters are evaluated affects CPU cost significantly through short-circuit evaluation.
The Short-Circuit Principle:
For a conjunction P₁ AND P₂ AND P₃:
Optimal Filter Ordering:
The optimal order balances selectivity and evaluation cost. The principle is to minimize expected evaluation cost per tuple:
1234567891011121314151617181920212223242526272829303132333435
-- Filter Predicate Ordering Optimization -- Three filter predicates to apply:-- P1: status = 'active' (selectivity: 20%, cost: 1 unit)-- P2: amount > 1000 (selectivity: 30%, cost: 1 unit)-- P3: name LIKE '%smith%' (selectivity: 5%, cost: 50 units) -- Evaluation Order Analysis (1000 candidate tuples): -- Order A: P1 → P2 → P3-- P1: 1000 × 1 = 1000 cost, 200 pass-- P2: 200 × 1 = 200 cost, 60 pass-- P3: 60 × 50 = 3000 cost, 3 pass-- Total: 4200 cost units -- Order B: P3 → P1 → P2-- P3: 1000 × 50 = 50000 cost, 50 pass-- P1: 50 × 1 = 50 cost, 10 pass-- P2: 10 × 1 = 10 cost, 3 pass-- Total: 50060 cost units -- Order C: P2 → P1 → P3-- P2: 1000 × 1 = 1000 cost, 300 pass-- P1: 300 × 1 = 300 cost, 60 pass-- P3: 60 × 50 = 3000 cost, 3 pass-- Total: 4300 cost units -- Optimal ordering formula:-- For predicates Pᵢ with selectivity sᵢ and cost cᵢ-- Order by (1 - sᵢ) / cᵢ descending-- This maximizes "elimination power per unit cost" -- P1: (1 - 0.2) / 1 = 0.8 ← First (most efficient)-- P2: (1 - 0.3) / 1 = 0.7 ← Second-- P3: (1 - 0.05) / 50 = 0.019 ← Last (expensive and selective)The Ranking Formula:
Rank = (1 - selectivity) / cost
Evaluate predicates in descending rank order. This formula captures:
A predicate that eliminates 80% of tuples (selectivity 0.2) with cost 1 has rank 0.8. A predicate that eliminates 95% (selectivity 0.05) but costs 50 has rank 0.019.
JIT evaluation order correctly places cheap eliminators before expensive ones.
| Predicate Type | Relative Cost | Ranking Impact |
|---|---|---|
| Integer equality/comparison | 1× | High rank (cheap + often selective) |
| String equality (short) | 2-3× | Good rank if selective |
| Date/timestamp comparison | 2-4× | Moderate rank |
| String prefix LIKE | 5-10× | Lower rank despite selectivity |
| String contains LIKE | 20-100× | Very low rank, evaluate last |
| Regular expression | 50-200× | Almost always evaluate last |
| User-defined function | 100-1000× | Last unless extremely selective |
Most query optimizers automatically reorder filter predicates based on estimated selectivity and cost. However, the optimizer's estimates may be inaccurate, especially for complex predicates or UDFs. If you observe slow query execution with filters, examine the predicate evaluation order in the execution plan and consider rewriting the query or providing better statistics.
Real-world queries often exhibit complex patterns that go beyond simple multi-predicate AND. Understanding these patterns enables both query writing and index design optimization.
Pattern 1: Mixed Equality and Range
123456789101112131415161718192021222324252627
-- Pattern 1: Mixed Equality and Range Predicates -- Query: Find recent active premium usersSELECT * FROM usersWHERE account_type = 'premium' -- Equality AND status = 'active' -- Equality AND last_login > NOW() - INTERVAL '30 days' -- Range AND subscription_end > NOW(); -- Range -- Optimal index design:-- Put ALL equality columns first, then range columnsCREATE INDEX idx_type_status_login_sub ON users( account_type, -- Equality (first) status, -- Equality (second) last_login, -- Range (third) subscription_end -- Cannot use this column efficiently!); -- Why subscription_end isn't useful:-- After scanning range on last_login, we're already reading-- non-contiguous index entries. Adding another range column-- doesn't further narrow the scan. -- Solution: Evaluate subscription_end as filter, or use separate index:CREATE INDEX idx_sub_end ON users(subscription_end) WHERE account_type = 'premium' AND status = 'active';-- Partial index for the specific patternPattern 2: Multi-Column Range (Spatial-like)
123456789101112131415161718192021222324252627282930313233
-- Pattern 2: Multiple Range Predicates (Non-Spatial) -- Query: Products in price AND rating rangeSELECT * FROM productsWHERE price BETWEEN 50 AND 200 AND rating BETWEEN 4.0 AND 5.0; -- B+tree limitation:-- Can efficiently use ONE range predicate, not both-- With index (price, rating):-- Scan price range 50-200, filter rating in memory-- With index (rating, price):-- Scan rating range 4.0-5.0, filter price in memory -- Neither perfectly index both ranges -- Solution approaches: -- 1. Choose index on more selective range:-- If price range matches 20% and rating range matches 10%,-- index on (rating) first reduces scan to 10% of table -- 2. Use index intersection:CREATE INDEX idx_price ON products(price);CREATE INDEX idx_rating ON products(rating);-- BitmapAnd combines both ranges -- 3. For true 2D ranges, consider spatial index (R-tree):-- Treat (price, rating) as 2D coordinates-- CREATE INDEX idx_price_rating_gist ON products -- USING GIST (point(price, rating));-- SELECT * FROM products -- WHERE point(price, rating) <@ box(point(50,4.0), point(200,5.0));Pattern 3: Partial Index for Common Filters
12345678910111213141516171819202122232425262728293031
-- Pattern 3: Partial Indexes for Frequent Conjunctions -- Common query pattern: active customers onlySELECT * FROM ordersWHERE customer_id = 12345 AND is_deleted = false; -- 99% of rows have is_deleted = false -- Problem with full index:CREATE INDEX idx_cust ON orders(customer_id);-- Index includes both deleted and non-deleted orders-- Must filter is_deleted after index access -- Solution: Partial index excluding deleted rowsCREATE INDEX idx_cust_active ON orders(customer_id) WHERE is_deleted = false; -- Benefits:-- + Index 99% smaller (excludes deleted rows)-- + No filtering needed for common case-- + Index condition automatically satisfied-- + Query: WHERE customer_id = 12345 AND is_deleted = false-- → Uses idx_cust_active directly -- PostgreSQL EXPLAIN shows:-- Index Scan using idx_cust_active on orders-- Index Cond: (customer_id = 12345)-- Note: is_deleted = false is implicit in index definition -- Combine with additional columns:CREATE INDEX idx_cust_date_active ON orders(customer_id, order_date) WHERE is_deleted = false AND status != 'cancelled';Partial indexes are powerful when a common filter eliminates a large fraction of rows AND the filtered condition rarely changes (like soft-delete flags, status='active', or date ranges for recent data). They reduce index size and maintenance cost while perfectly serving the common query pattern.
Accurate cost estimation for conjunctive selections is challenging because it requires estimating the combined selectivity of multiple predicates. The optimizer must handle predicate correlation, complex expressions, and uncertain statistics.
The Independence Assumption Revisited:
123456789101112131415161718192021222324252627282930
-- Selectivity Estimation for Conjunctions -- Given predicates:-- P1: country = 'USA' (selectivity s1 = 0.30)-- P2: state = 'California' (selectivity s2 = 0.03) -- Independence assumption:-- P(P1 AND P2) = s1 × s2 = 0.30 × 0.03 = 0.009 = 0.9% -- Reality check:-- IF state = 'California', THEN country MUST = 'USA'-- P(P1 | P2) = 1.0 (California implies USA)-- Actual P(P1 AND P2) = s2 = 0.03 = 3% -- The independence estimate (0.9%) is 3× too low!-- This causes optimizer to underestimate result cardinality,-- potentially choosing wrong access methods. -- Multi-column statistics can capture correlation:-- PostgreSQL:CREATE STATISTICS stat_country_state ON country, state FROM customers;ANALYZE customers; -- Now optimizer uses joint distribution instead of independence -- View estimated vs actual:EXPLAIN ANALYZE SELECT * FROM customers WHERE country = 'USA' AND state = 'California';-- Compare 'rows' (estimated) vs 'actual rows'| Method | Description | Accuracy | Overhead |
|---|---|---|---|
| Independence | Multiply individual selectivities | Poor for correlated | None |
| Histogram intersection | Combine single-column histograms | Moderate | Low |
| Multi-column stats | Joint distribution histograms | Good | Storage/analyze cost |
| Sampling | Sample data to estimate combined selectivity | High | Query-time cost |
| Machine learning | Trained model predicts cardinality | Varies | Training cost |
Practical Impact of Estimation Errors:
Poor selectivity estimates for conjunctions cause several problems:
Wrong access method: If estimated selectivity is too low, optimizer may choose index scan when sequential scan would be cheaper.
Wrong index: With multiple index options, incorrect estimates may favor a less efficient index.
Wrong join order: In multi-table queries, conjunction selectivity affects join cardinality estimates, cascading into poor join ordering.
Memory allocation: Hash operations size memory based on estimated cardinality; underestimates cause spilling to disk.
12345678910111213141516171819202122232425262728293031323334353637383940
-- Debugging Cardinality Estimation Issues -- PostgreSQL: Compare estimates to actualsEXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ordersWHERE customer_id = 12345 AND order_date > '2024-01-01' AND status = 'shipped'; -- Look for large differences:-- Seq Scan on orders (cost=... rows=10 ...)-- (actual ... rows=500 ...)-- ↑ ↑-- estimated actual (50× difference!) -- Possible causes:-- 1. Stale statistics: Run ANALYZE-- 2. Correlated predicates: Create extended statistics-- 3. Expression predicates: Optimizer can't estimate functions-- 4. Parameter sniffing: Plan was optimized for different values -- Fix attempts: -- 1. Fresh statisticsANALYZE orders; -- 2. Extended statistics for correlationCREATE STATISTICS order_stats ON customer_id, order_date, status FROM orders;ANALYZE orders; -- 3. Check n_distinct accuracySELECT attname, n_distinct, most_common_vals, most_common_freqsFROM pg_stats WHERE tablename = 'orders'; -- 4. If estimates still wrong, consider:-- - More frequent ANALYZE (especially for volatile tables)-- - Larger statistics target: ALTER TABLE orders -- ALTER COLUMN customer_id SET STATISTICS 1000;-- - Query hints to override optimizer (last resort)Estimation errors compound in conjunctions. If each of 3 predicates has 2× error, the combined error can be 8× or more. This is why conjunctive selections are particularly susceptible to optimization failures. Always validate estimates with EXPLAIN ANALYZE for critical queries.
We have explored conjunctive selection implementation comprehensively, covering implementation strategies, index design, intersection algorithms, and cost estimation. Let us consolidate the essential knowledge:
What's Next:
Having mastered conjunctive (AND) selections, we proceed to Disjunctive Selection—implementing selections with multiple OR-ed predicates. Disjunctions present fundamentally different optimization challenges, often requiring union strategies rather than intersection.
You now possess expert-level understanding of conjunctive selection implementation—the most common pattern in production query workloads. This knowledge enables designing effective indexes, diagnosing query plan issues, and understanding optimizer decision-making for complex predicates.