Loading learning content...
While conjunctive selections (AND) narrow result sets by requiring all conditions to be met, disjunctive selections (OR) expand results by accepting tuples matching any condition. This fundamental difference creates entirely different implementation challenges and optimization strategies.
Consider the query: "Find orders that are either from VIP customers OR have high value OR are flagged for review." Each condition independently qualifies rows, and the result is the union of all matching rows. Unlike AND operations where indexes naturally intersect, OR operations require combining—and typically deduplicating—results from multiple access paths.
Disjunctive selection mastery is essential because OR conditions are ubiquitous in real applications, poorly optimized disjunctions can severely degrade performance, and understanding OR processing reveals fundamental database execution strategies.
By the end of this page, you will understand the unique challenges of disjunctive predicate evaluation, master the bitmap OR and union-based strategies for efficient OR processing, analyze when index usage is beneficial versus harmful for disjunctions, and learn optimization techniques for IN lists and multi-valued predicates.
A disjunctive selection is a selection operation where the predicate consists of multiple conditions combined with logical OR. Formally:
σ(P₁ ∨ P₂ ∨ ... ∨ Pₙ)(R)
where P₁, P₂, ..., Pₙ are individual predicates. A tuple satisfies the disjunctive selection if it satisfies AT LEAST ONE of the predicates.
Mathematical Properties:
Disjunctive selections exhibit properties that differ significantly from conjunctions:
Selectivity Combination: Combined selectivity = 1 - (1-s₁)(1-s₂)...(1-sₙ). For two 10% predicates: 1 - (0.9 × 0.9) = 19% match (not 20% due to overlap).
Union Semantics: Results are the union of tuples matching each predicate. Duplicates must be eliminated if the same tuple matches multiple predicates.
Index Challenge: Unlike AND where multiple indexes narrow results, OR conditions expand results. Using an index for one predicate doesn't help evaluate others.
Short-Circuit Limitation: Finding a match on P₁ eliminates need to check P₂, P₃... for that tuple, but we still must find ALL matching tuples for each predicate.
123456789101112131415161718192021222324252627282930313233343536373839
-- Disjunctive Selection Examples -- Example 1: Simple OR on different columnsSELECT * FROM ordersWHERE customer_type = 'VIP' -- P1 OR total_amount > 10000 -- P2 OR priority = 'urgent'; -- P3 -- Selectivity analysis (with overlap):-- P1: VIP customers = 5%-- P2: High value orders = 3%-- P3: Urgent orders = 2%-- Naive sum: 5% + 3% + 2% = 10% -- But some overlap: VIP customers may have high-value orders-- Actual: 1 - (0.95 × 0.97 × 0.98) = 9.7% -- Example 2: OR on same column (IN list)SELECT * FROM employeesWHERE department IN ('Sales', 'Marketing', 'Support'); -- Equivalent to:SELECT * FROM employeesWHERE department = 'Sales' OR department = 'Marketing' OR department = 'Support'; -- Example 3: OR with different operatorsSELECT * FROM productsWHERE category = 'Electronics' -- Equality OR price < 10 -- Range OR name LIKE 'Premium%'; -- Pattern -- Example 4: Nested OR within ANDSELECT * FROM ordersWHERE status = 'active' AND (customer_region = 'EAST' OR customer_region = 'WEST'); -- The parenthesized OR creates a disjunction that must be-- fully evaluated for each active orderA single B+tree index cannot efficiently satisfy multiple OR predicates on different columns. An index on column A helps only the first predicate (A = x); we still need another strategy for (B = y) OR (C = z). This fundamental limitation shapes all disjunctive optimization strategies.
Database systems employ several strategies for implementing disjunctive selections, each suited to different scenarios. The choice depends on predicate structure, available indexes, and estimated selectivity.
Strategy 1: Sequential Scan with Combined Predicate
The simplest approach evaluates the entire disjunction for each tuple during a sequential scan:
12345678910111213141516171819202122232425262728
-- Strategy 1: Sequential Scan with OR Evaluation SELECT * FROM ordersWHERE customer_type = 'VIP' OR total_amount > 10000 OR priority = 'urgent'; -- Execution (sequential scan):FOR EACH tuple IN orders_table: IF tuple.customer_type = 'VIP' THEN EMIT tuple ELSE IF tuple.total_amount > 10000 THEN EMIT tuple ELSE IF tuple.priority = 'urgent' THEN EMIT tuple -- PostgreSQL EXPLAIN:-- Seq Scan on orders (cost=0.00..45847.00 rows=97000 width=102)-- Filter: (customer_type = 'VIP' OR total_amount > 10000 OR priority = 'urgent') -- When this wins:-- + Combined selectivity is high (many rows match)-- + No suitable indexes exist-- + Table is small (fits in memory)-- + Predicates span different columns without composites -- Cost: Reads all B pages, evaluates predicate on all T tuples-- CPU cost depends on predicate evaluation order (short-circuit)Strategy 2: Bitmap OR (Index Union)
When separate indexes exist on OR predicate columns, bitmap operations can combine their results:
12345678910111213141516171819202122232425262728293031323334353637383940414243
-- Strategy 2: Bitmap OR (Index Union) -- Indexes:CREATE INDEX idx_cust_type ON orders(customer_type);CREATE INDEX idx_amount ON orders(total_amount);CREATE INDEX idx_priority ON orders(priority); -- Query with OR:SELECT * FROM ordersWHERE customer_type = 'VIP' OR total_amount > 10000 OR priority = 'urgent'; -- Execution:-- 1. Bitmap Index Scan on idx_cust_type:-- bitmap1 = pages containing customer_type = 'VIP'-- 2. Bitmap Index Scan on idx_amount:-- bitmap2 = pages containing total_amount > 10000-- 3. Bitmap Index Scan on idx_priority:-- bitmap3 = pages containing priority = 'urgent'-- 4. BitmapOr:-- result_bitmap = bitmap1 OR bitmap2 OR bitmap3-- 5. Bitmap Heap Scan:-- Fetch pages in result_bitmap-- Recheck: Evaluate full predicate on each tuple -- PostgreSQL EXPLAIN:-- Bitmap Heap Scan on orders-- Recheck Cond: (customer_type = 'VIP' OR total_amount > 10000 OR ...)-- -> BitmapOr-- -> Bitmap Index Scan on idx_cust_type-- Index Cond: (customer_type = 'VIP')-- -> Bitmap Index Scan on idx_amount-- Index Cond: (total_amount > 10000)-- -> Bitmap Index Scan on idx_priority-- Index Cond: (priority = 'urgent') -- Key properties:-- + Uses all three indexes-- + Heap pages read once (union, not duplicated)-- + Sequential heap access (sorted bitmap)-- - Must scan all three index portions-- - Bitmap can become lossy for large result setsStrategy 3: UNION Query Transformation
The optimizer can transform an OR query into a UNION of separate queries:
12345678910111213141516171819202122232425262728293031323334
-- Strategy 3: UNION Query Transformation -- Original query with OR:SELECT * FROM ordersWHERE customer_type = 'VIP' OR total_amount > 10000; -- Transformed to UNION (semantically equivalent):SELECT * FROM orders WHERE customer_type = 'VIP'UNIONSELECT * FROM orders WHERE total_amount > 10000; -- Or with duplicate elimination avoided where possible:SELECT * FROM orders WHERE customer_type = 'VIP'UNION ALLSELECT * FROM orders WHERE total_amount > 10000 AND NOT (customer_type = 'VIP'); -- Exclude already-matched -- Execution:-- 1. Index scan on idx_cust_type for VIP orders-- 2. Index scan on idx_amount for high-value orders -- 3. Hash or sort to eliminate duplicates (for UNION) -- When this approach is used:-- + Each individual predicate is highly selective-- + Separate indexes exist and are efficient-- + Optimizer determines UNION cheaper than bitmap OR -- PostgreSQL may automatically do this transformation-- Some databases require manual rewriting -- Cost considerations:-- UNION: Duplicate elimination overhead-- UNION ALL with exclusion: Complex predicates but no dedup| Strategy | When Applicable | Advantages | Disadvantages |
|---|---|---|---|
| Sequential Scan | High combined selectivity, no indexes | Simple, predictable | Reads entire table |
| Bitmap OR | Multiple indexes exist | Combines indexes efficiently | Multiple index scans, bitmap overhead |
| UNION Transform | Each predicate highly selective | Uses best index per predicate | Duplicate handling cost |
| Single Index + Filter | One predicate dominates selectivity | Simple execution | Other predicates as filter only |
If combined OR selectivity > 30%, sequential scan often wins. If each predicate is <5% selective with good indexes, bitmap OR or UNION may be better. If one predicate is far more restrictive than others, consider using just that index and filtering the rest.
Bitmap OR is the primary mechanism for combining multiple indexes in disjunctive selections. Unlike Bitmap AND (which shrinks results), Bitmap OR expands the result set to include all tuples matching any predicate.
How Bitmap OR Works:
12345678910111213141516171819202122232425262728293031323334
-- Bitmap OR Internals -- Table: 1 million rows across 10,000 pages (~100 rows/page)-- idx_status: 'pending' matches 50,000 rows (5%)-- idx_region: 'EMEA' matches 200,000 rows (20%) -- Query: WHERE status = 'pending' OR region = 'EMEA' -- Bitmap Index Scan 1 (idx_status):-- For each entry with status='pending':-- Set bit for that page/tuple in bitmap1-- Result: bitmap1 has ~1,500 pages marked (some pages have multiple pending) -- Bitmap Index Scan 2 (idx_region):-- For each entry with region='EMEA':-- Set bit for that page/tuple in bitmap2-- Result: bitmap2 has ~4,000 pages marked -- BitmapOr operation:-- result_bitmap[i] = bitmap1[i] OR bitmap2[i]-- Result: ~4,800 pages (union, with some overlap) -- Bitmap Heap Scan:-- For each page in result_bitmap:-- Fetch page from heap-- For each tuple on page:-- Recheck: Does (status = 'pending' OR region = 'EMEA')?-- If yes, emit tuple -- Key insight: Unlike AND (intersection shrinks), -- OR (union) grows the result bitmap -- Total rows matched: 50,000 + 200,000 - overlap ≈ 230,000-- (Overlap = rows matching both predicates, removed by union)Lossy Bitmaps in OR Operations:
When bitmaps grow too large to fit in work_mem, they become "lossy"—storing only page numbers, not tuple offsets. This has different implications for OR versus AND:
| Aspect | Bitmap AND (Intersection) | Bitmap OR (Union) |
|---|---|---|
| Exactness | Still correct (intersect pages) | Still correct (union pages) |
| Recheck scope | Tuples on intersected pages | Tuples on union of pages |
| False positives | Tuples on page not matching all | Tuples on page not matching any |
| Typical overhead | Moderate (intersection is small) | Higher (union can be large) |
1234567891011121314151617181920212223242526
-- Bitmap Memory Configuration -- PostgreSQL work_mem controls bitmap size thresholdSET work_mem = '64MB'; -- Larger = more exact bitmaps -- Check if bitmap became lossy:EXPLAIN (ANALYZE, BUFFERS, VERBOSE)SELECT * FROM ordersWHERE status = 'pending' OR region = 'EMEA'; -- Look for:-- Heap Blocks: exact=1234 lossy=567-- ↑ ↑-- exact pages lossy pages (page-level only) -- Lossy pages require rechecking ALL tuples on the page,-- not just those in the bitmap -- For OR operations, lossy bitmaps are more costly because-- the union bitmap tends to be larger (more pages to recheck) -- Tuning approach:-- 1. Monitor lossy vs exact blocks in EXPLAIN ANALYZE-- 2. Increase work_mem if too many lossy blocks-- 3. Consider if bitmap approach is even beneficial-- (if union is most of the table, scan may be cheaper)Because OR combines result sets, the union bitmap grows with each predicate. For n predicates each matching m pages, the union size approaches n×m (minus overlaps). High-fanout OR conditions (many predicates) can create very large bitmaps, making sequential scan more attractive.
The IN predicate (column IN (v₁, v₂, ..., vₙ)) is semantically equivalent to a disjunction on the same column (column = v₁ OR column = v₂ OR ... OR column = vₙ). However, databases implement special optimizations for IN lists due to their frequency and regularity.
IN List Execution Strategies:
12345678910111213141516171819202122232425262728293031323334
-- IN List Execution Strategies -- Query:SELECT * FROM orders WHERE customer_id IN (101, 205, 307, 412, 599); -- Strategy 1: Multiple Index Probes-- With index on customer_id:-- - Probe index for customer_id = 101-- - Probe index for customer_id = 205-- - ... (5 separate index lookups)-- Cost: 5 × (tree_height + 1) page reads + heap accesses -- Strategy 2: Index Range Scan (if values clustered)-- If IN list values form a range-like pattern and correlation is high:-- - Single index scan from min to max value-- - Filter non-matching values-- Less common, optimizer must estimate benefit -- Strategy 3: Hash Match-- For very large IN lists:-- - Build hash set of IN values-- - Scan table/index, check each value against hash-- Efficient for IN lists with 100+ values -- PostgreSQL handles IN lists automatically:EXPLAIN SELECT * FROM orders WHERE customer_id IN (101, 205, 307); -- For small lists (typically < 10-20):-- -> Index Scan using idx_customer on orders-- Index Cond: (customer_id = ANY ('{101,205,307}'::integer[])) -- For large lists or when IN is not efficient:-- -> Seq Scan on orders-- Filter: (customer_id = ANY (...))Large IN List Handling:
In lists can contain hundreds or thousands of values, especially when generated programmatically. Special handling is needed:
12345678910111213141516171819202122232425262728293031323334353637383940
-- Large IN List Optimization -- Problem: 10,000 values in IN listSELECT * FROM products WHERE product_id IN (1, 2, 3, ..., 10000); -- Issues with naive approach:-- - 10,000 index probes (even if fast, overhead adds up)-- - Query parsing and planning overhead-- - Memory for storing the list -- Solution 1: VALUES join (PostgreSQL)SELECT p.* FROM products pJOIN (VALUES (1), (2), ..., (10000)) AS v(id) ON p.product_id = v.id; -- Better yet, use UNNEST with an array:SELECT p.* FROM products pJOIN UNNEST(ARRAY[1, 2, ..., 10000]) AS id ON p.product_id = id; -- Solution 2: Temporary tableCREATE TEMP TABLE lookup_ids (id INTEGER PRIMARY KEY);INSERT INTO lookup_ids VALUES (1), (2), ..., (10000); SELECT p.* FROM products pJOIN lookup_ids l ON p.product_id = l.id; -- Benefits:-- + Proper join optimization (hash/merge)-- + Statistics can be computed on temp table-- + Works for very large lists -- Solution 3: ANY array operator with GIN index-- For repeated queries on the same set pattern:CREATE INDEX idx_prod_id_gin ON products USING GIN (product_id);SELECT * FROM products WHERE product_id = ANY(ARRAY[1, 2, ..., 10000]); -- Anti-pattern: Dynamic SQL with embedded values-- BAD: "... WHERE id IN (" + str(ids) + ")"-- - SQL injection risk-- - No query plan caching-- - Parser overhead for each unique list| List Size | Recommended Approach | Notes |
|---|---|---|
| 1-10 values | Direct IN clause | Multiple index probes, efficient |
| 10-100 values | IN clause or ANY(array) | Optimizer handles well |
| 100-1000 values | ANY(array) or temp table join | Consider hash join |
| 1000+ values | Temp table join | Always prefer structured join |
Avoid generating extremely long IN lists in application code. Beyond ~1000 values, the query text itself becomes problematic (parsing cost, network transfer, query cache inefficiency). Use temp tables, table-valued parameters, or batch the query into smaller chunks.
Disjunctive predicates present unique optimization challenges. This section covers advanced techniques for improving OR query performance.
Technique 1: OR-to-UNION Rewriting
1234567891011121314151617181920212223
-- Technique 1: Manual OR-to-UNION Transformation -- Original (may not use indexes optimally):SELECT * FROM ordersWHERE (customer_region = 'EAST' AND total > 1000) OR (customer_region = 'WEST' AND total > 5000); -- Rewritten as UNION (each branch uses optimal access):SELECT * FROM orders WHERE customer_region = 'EAST' AND total > 1000UNION ALLSELECT * FROM orders WHERE customer_region = 'WEST' AND total > 5000; -- Why UNION ALL is safe here:-- The two conditions are mutually exclusive (EAST vs WEST)-- No duplicates possible, so UNION ALL avoids dedup cost -- With index on (customer_region, total):-- Each UNION branch uses optimal index range scan-- Combined cost < sequential scan with OR -- When optimizer doesn't auto-transform:-- Some databases (older versions, complex conditions) don't-- automatically rewrite OR to UNION. Manual rewriting helps.Technique 2: OR Expansion with EXISTS
123456789101112131415161718192021222324
-- Technique 2: OR Expansion Using EXISTS -- Complex OR with subquery:SELECT * FROM employees eWHERE e.department_id IN (SELECT d.id FROM departments d WHERE d.budget > 1000000) OR e.salary > 150000; -- Problem: Subquery OR with scalar condition - hard to optimize -- Rewrite using EXISTS for clarity:SELECT * FROM employees eWHERE EXISTS ( SELECT 1 FROM departments d WHERE d.id = e.department_id AND d.budget > 1000000)OR e.salary > 150000; -- Sometimes decomposing helps optimizer:SELECT * FROM employees WHERE salary > 150000UNIONSELECT e.* FROM employees eJOIN departments d ON e.department_id = d.idWHERE d.budget > 1000000 AND e.salary <= 150000; -- Exclude already matchedTechnique 3: Predicate Consolidation
123456789101112131415161718192021222324252627282930
-- Technique 3: Predicate Consolidation -- Inefficient: OR on same column with overlapping rangesSELECT * FROM productsWHERE price < 100 OR price < 200 OR price < 50; -- Consolidate to single predicate:SELECT * FROM products WHERE price < 200;-- (200 subsumes 100 and 50) -- More complex consolidation:SELECT * FROM ordersWHERE status = 'pending' OR status = 'processing' OR status = 'pending'; -- Duplicate! -- Consolidate:SELECT * FROM orders WHERE status IN ('pending', 'processing'); -- Optimizer usually does this, but not always for complex expressions:SELECT * FROM eventsWHERE (event_date > '2024-01-01' AND event_date < '2024-06-01') OR (event_date > '2024-03-01' AND event_date < '2024-09-01'); -- Human can see overlap and consolidate:SELECT * FROM eventsWHERE event_date > '2024-01-01' AND event_date < '2024-09-01';-- (Union of ranges forms single range)Technique 4: Partial Index for Common OR Patterns
12345678910111213141516171819202122232425262728
-- Technique 4: Partial Index for Frequent OR Patterns -- Frequent query pattern:SELECT * FROM ticketsWHERE status = 'open' OR status = 'in_progress'; -- Instead of:CREATE INDEX idx_status ON tickets(status);-- (Index must scan two separate key ranges) -- Create partial index matching the OR:CREATE INDEX idx_active_tickets ON tickets(created_at) WHERE status IN ('open', 'in_progress'); -- Benefits:-- + Smaller index (excludes closed/resolved tickets)-- + Predicate automatically satisfied-- + Additional columns (created_at) aid other filters -- Query automatically uses partial index:SELECT * FROM ticketsWHERE (status = 'open' OR status = 'in_progress') AND created_at > NOW() - INTERVAL '7 days'; -- EXPLAIN shows:-- Index Scan using idx_active_tickets on tickets-- Index Cond: (created_at > ...)-- (status predicate is implicit in partial index)Examine your application's most frequent OR patterns. If certain OR combinations appear repeatedly, consider creating partial indexes or materialized views for them. The key insight is that OR optimization often requires schema-level solutions, not just query-level rewrites.
Accurate cost estimation for disjunctive selections requires understanding how OR predicates interact. The cost model differs significantly from conjunctive selection.
Selectivity Estimation for OR:
For independent predicates:
s(P₁ OR P₂) = s(P₁) + s(P₂) - s(P₁) × s(P₂)
This is the inclusion-exclusion principle: add individual selectivities, subtract the overlap.
For n predicates:
s(P₁ OR P₂ OR ... OR Pₙ) = 1 - ∏(1 - sᵢ)
Equivalent to "probability at least one matches."
| P1 Selectivity | P2 Selectivity | Combined (OR) | Naive Sum | Error |
|---|---|---|---|---|
| 10% | 10% | 19% | 20% | 5% |
| 5% | 5% | 9.75% | 10% | 2.5% |
| 50% | 50% | 75% | 100% | 25% |
| 20% | 30% | 44% | 50% | 12% |
| 1% | 1% | 1.99% | 2% | 0.5% |
12345678910111213141516171819202122232425262728
-- OR Selectivity Calculation -- Given:-- P1: status = 'pending' (selectivity 0.15)-- P2: priority = 'high' (selectivity 0.10)-- P3: age > 30 (selectivity 0.20) -- Individual OR:-- s(P1 OR P2) = 0.15 + 0.10 - (0.15 × 0.10) = 0.235 = 23.5% -- Three-way OR (using complement):-- s(P1 OR P2 OR P3) = 1 - (1-0.15)(1-0.10)(1-0.20)-- = 1 - (0.85 × 0.90 × 0.80)-- = 1 - 0.612-- = 0.388 = 38.8% -- Naive sum: 0.15 + 0.10 + 0.20 = 0.45 = 45%-- Error: 45% - 38.8% = 6.2% overestimate -- The more predicates, the more overlap subtraction matters-- At extreme: 10 predicates each with 10% selectivity-- Naive: 100% (impossible!)-- Actual: 1 - 0.9^10 = 65% -- PostgreSQL validates with EXPLAIN ANALYZE:EXPLAIN ANALYZE SELECT * FROM ordersWHERE status = 'pending' OR priority = 'high' OR ...;-- Compare "rows" (estimated) vs "actual rows"Access Path Cost Comparison:
The critical decision point is whether bitmap OR or sequential scan is more efficient:
Sequential scan cost: B (total pages) + T × pred_eval_cost
Bitmap OR cost: Σ(index_scan_costᵢ) + union_pages × random_io_cost + union_tuples × recheck_cost
Where:
123456789101112131415161718192021222324252627282930
-- Cost Comparison: Bitmap OR vs Sequential Scan -- Given:-- Table: 1,000,000 rows in 10,000 pages-- P1: matches 100,000 rows (10%), spans 3,000 pages-- P2: matches 80,000 rows (8%), spans 2,500 pages-- Combined: matches ~170,000 rows, spans ~4,500 pages (some overlap) -- Sequential Scan Cost:-- Read: 10,000 pages-- Evaluate: 1,000,000 predicate evaluations (cheap)-- Total I/O: 10,000 page reads (sequential) -- Bitmap OR Cost:-- Index scan P1: ~100 pages (B+tree leaves for 100K entries)-- Index scan P2: ~80 pages-- Bitmap operations: negligible CPU-- Heap scan: 4,500 pages (sequential, sorted bitmap)-- Recheck: ~500,000 tuples on those pages-- Total I/O: 100 + 80 + 4,500 = 4,680 pages -- Analysis:-- Bitmap OR reads 4,680 pages-- Sequential reads 10,000 pages-- Bitmap OR wins! (2x fewer pages) -- But if combined selectivity were 60% (most pages):-- Bitmap OR: 100 + 80 + 9,000 = 9,180 pages-- Sequential: 10,000 pages-- Nearly equal - sequential may win due to simpler access patternHigh combined OR selectivity (>30-50%) often makes sequential scan the better choice. The optimizer may incorrectly estimate selectivity due to correlation between predicates (e.g., high-value orders are often from VIP customers), leading to poor access path selection. Always validate with EXPLAIN ANALYZE for critical OR queries.
Real-world queries often combine AND and OR in complex predicate structures. Understanding how databases process these mixed predicates is essential for optimization.
Conjunctive Normal Form (CNF) vs Disjunctive Normal Form (DNF):
Logical predicates can be normalized to standard forms:
123456789101112131415161718192021222324252627
-- Predicate Normal Forms -- Conjunctive Normal Form (CNF): AND of ORs-- (A OR B) AND (C OR D) AND (E OR F) -- Disjunctive Normal Form (DNF): OR of ANDs -- (A AND B) OR (C AND D) OR (E AND F) -- Example: Complex predicate-- Original:WHERE (status = 'active' AND region = 'US') OR (status = 'pending' AND priority = 'high') -- This is already in DNF: OR of two AND terms-- Can be processed as:-- 1. UNION of two index scans (if indexes suit each AND term)-- 2. Or sequential scan with combined predicate -- Convert to CNF (less intuitive for this case):-- Expanding: equivalent to-- (status = 'active' OR status = 'pending') -- Must be active or pending-- AND (status = 'active' OR priority = 'high') -- If not active, must be high priority-- AND (region = 'US' OR status = 'pending') -- If not pending, must be US-- AND (region = 'US' OR priority = 'high') -- If not US, must be high priority -- CNF has 4 clauses vs DNF has 2 terms-- For optimization, DNF often maps better to UNION strategiesProcessing Strategies for Mixed Predicates:
A AND (B OR C) → first filter by A using index, then check (B OR C) as filter.123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Mixed Predicate Optimization -- Pattern: Outer AND with inner ORSELECT * FROM ordersWHERE order_date > '2024-01-01' -- Outer AND (selective) AND (status = 'pending' OR status = 'processing'); -- Inner OR -- Strategy: Use outer AND with index, OR as filter-- With index on order_date:-- 1. Index scan for order_date > '2024-01-01'-- 2. Filter: status IN ('pending', 'processing') -- PostgreSQL EXPLAIN:-- Index Scan using idx_order_date on orders-- Index Cond: (order_date > '2024-01-01')-- Filter: (status = ANY ('{pending,processing}')) -- Pattern: OR of selective AND termsSELECT * FROM ordersWHERE (customer_id = 100 AND status = 'shipped') -- Term 1 OR (customer_id = 200 AND status = 'delivered'); -- Term 2 -- Strategy: Process as UNION if indexes exist-- With index on (customer_id, status):-- 1. Index scan for (100, 'shipped')-- 2. Index scan for (200, 'delivered') -- 3. Combine results (no dedup needed - different customers) -- PostgreSQL may choose BitmapOr:-- Bitmap Heap Scan on orders-- -> BitmapOr-- -> Bitmap Index Scan (customer_id = 100 AND status = 'shipped')-- -> Bitmap Index Scan (customer_id = 200 AND status = 'delivered') -- Pattern: Complex nestingSELECT * FROM ordersWHERE region = 'US' AND ((status = 'pending' AND total > 1000) OR (status = 'urgent' AND total > 500)); -- Rewrite for clarity and potential optimization:-- Factor out region, then UNION the inner terms:(SELECT * FROM orders WHERE region = 'US' AND status = 'pending' AND total > 1000)UNION ALL(SELECT * FROM orders WHERE region = 'US' AND status = 'urgent' AND total > 500);When you have AND surrounding OR, factor out the AND predicates and apply them first using the best available index. The OR then becomes a simple filter on a reduced result set. This works well when the outer AND is highly selective.
We have explored disjunctive selection implementation comprehensively, from fundamental union semantics through bitmap OR operations to complex mixed predicate handling. Let us consolidate the essential knowledge:
What's Next:
With comprehensive understanding of both conjunctive and disjunctive selection, we now proceed to Cost Comparison—synthesizing our knowledge to understand how the optimizer chooses among all available strategies for any given selection predicate.
You now possess expert-level understanding of disjunctive selection implementation—the technique for handling OR predicates that appear throughout real-world queries. This knowledge enables writing efficient OR queries, designing appropriate indexes, and understanding optimizer behavior for complex predicates.