Physical OperatorsSelection Implementation

Selection Implementation

LevelIntermediate

Duration90 mins

TopicSelection Implementation

4 / 5

Disjunctive Selection

Combining Multiple Conditions with OR

While conjunctive selections (AND) narrow result sets by requiring all conditions to be met, disjunctive selections (OR) expand results by accepting tuples matching any condition. This fundamental difference creates entirely different implementation challenges and optimization strategies.

Consider the query: "Find orders that are either from VIP customers OR have high value OR are flagged for review." Each condition independently qualifies rows, and the result is the union of all matching rows. Unlike AND operations where indexes naturally intersect, OR operations require combining—and typically deduplicating—results from multiple access paths.

Disjunctive selection mastery is essential because OR conditions are ubiquitous in real applications, poorly optimized disjunctions can severely degrade performance, and understanding OR processing reveals fundamental database execution strategies.

What You Will Learn

By the end of this page, you will understand the unique challenges of disjunctive predicate evaluation, master the bitmap OR and union-based strategies for efficient OR processing, analyze when index usage is beneficial versus harmful for disjunctions, and learn optimization techniques for IN lists and multi-valued predicates.

Disjunctive Selection Fundamentals

A disjunctive selection is a selection operation where the predicate consists of multiple conditions combined with logical OR. Formally:

σ(P₁ ∨ P₂ ∨ ... ∨ Pₙ)(R)

where P₁, P₂, ..., Pₙ are individual predicates. A tuple satisfies the disjunctive selection if it satisfies AT LEAST ONE of the predicates.

Mathematical Properties:

Disjunctive selections exhibit properties that differ significantly from conjunctions:

Selectivity Combination: Combined selectivity = 1 - (1-s₁)(1-s₂)...(1-sₙ). For two 10% predicates: 1 - (0.9 × 0.9) = 19% match (not 20% due to overlap).
Union Semantics: Results are the union of tuples matching each predicate. Duplicates must be eliminated if the same tuple matches multiple predicates.
Index Challenge: Unlike AND where multiple indexes narrow results, OR conditions expand results. Using an index for one predicate doesn't help evaluate others.
Short-Circuit Limitation: Finding a match on P₁ eliminates need to check P₂, P₃... for that tuple, but we still must find ALL matching tuples for each predicate.

disjunctive_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Disjunctive Selection Examples
 
-- Example 1: Simple OR on different columns
SELECT * FROM orders
WHERE customer_type = 'VIP'        -- P1
   OR total_amount > 10000         -- P2
   OR priority = 'urgent';         -- P3
 
-- Selectivity analysis (with overlap):
-- P1: VIP customers = 5%
-- P2: High value orders = 3%
-- P3: Urgent orders = 2%
-- Naive sum: 5% + 3% + 2% = 10% 
-- But some overlap: VIP customers may have high-value orders
-- Actual: 1 - (0.95 × 0.97 × 0.98) = 9.7%
 
-- Example 2: OR on same column (IN list)
SELECT * FROM employees
WHERE department IN ('Sales', 'Marketing', 'Support');
 
-- Equivalent to:
SELECT * FROM employees
WHERE department = 'Sales'
   OR department = 'Marketing'
   OR department = 'Support';
 
-- Example 3: OR with different operators
SELECT * FROM products
WHERE category = 'Electronics'     -- Equality
   OR price < 10                   -- Range
   OR name LIKE 'Premium%';        -- Pattern
 
-- Example 4: Nested OR within AND
SELECT * FROM orders
WHERE status = 'active'
  AND (customer_region = 'EAST' OR customer_region = 'WEST');
 
-- The parenthesized OR creates a disjunction that must be
-- fully evaluated for each active order

The OR Index Problem

A single B+tree index cannot efficiently satisfy multiple OR predicates on different columns. An index on column A helps only the first predicate (A = x); we still need another strategy for (B = y) OR (C = z). This fundamental limitation shapes all disjunctive optimization strategies.

Implementation Strategies for Disjunctive Selection

Database systems employ several strategies for implementing disjunctive selections, each suited to different scenarios. The choice depends on predicate structure, available indexes, and estimated selectivity.

Strategy 1: Sequential Scan with Combined Predicate

The simplest approach evaluates the entire disjunction for each tuple during a sequential scan:

sequential_scan_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Strategy 1: Sequential Scan with OR Evaluation
 
SELECT * FROM orders
WHERE customer_type = 'VIP'
   OR total_amount > 10000
   OR priority = 'urgent';
 
-- Execution (sequential scan):
FOR EACH tuple IN orders_table:
    IF tuple.customer_type = 'VIP' THEN
        EMIT tuple
    ELSE IF tuple.total_amount > 10000 THEN
        EMIT tuple
    ELSE IF tuple.priority = 'urgent' THEN
        EMIT tuple
 
-- PostgreSQL EXPLAIN:
-- Seq Scan on orders  (cost=0.00..45847.00 rows=97000 width=102)
--   Filter: (customer_type = 'VIP' OR total_amount > 10000 OR priority = 'urgent')
 
-- When this wins:
-- + Combined selectivity is high (many rows match)
-- + No suitable indexes exist
-- + Table is small (fits in memory)
-- + Predicates span different columns without composites
 
-- Cost: Reads all B pages, evaluates predicate on all T tuples
-- CPU cost depends on predicate evaluation order (short-circuit)

Strategy 2: Bitmap OR (Index Union)

When separate indexes exist on OR predicate columns, bitmap operations can combine their results:

bitmap_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Strategy 2: Bitmap OR (Index Union)
 
-- Indexes:
CREATE INDEX idx_cust_type ON orders(customer_type);
CREATE INDEX idx_amount ON orders(total_amount);
CREATE INDEX idx_priority ON orders(priority);
 
-- Query with OR:
SELECT * FROM orders
WHERE customer_type = 'VIP'
   OR total_amount > 10000
   OR priority = 'urgent';
 
-- Execution:
-- 1. Bitmap Index Scan on idx_cust_type:
--    bitmap1 = pages containing customer_type = 'VIP'
-- 2. Bitmap Index Scan on idx_amount:
--    bitmap2 = pages containing total_amount > 10000
-- 3. Bitmap Index Scan on idx_priority:
--    bitmap3 = pages containing priority = 'urgent'
-- 4. BitmapOr:
--    result_bitmap = bitmap1 OR bitmap2 OR bitmap3
-- 5. Bitmap Heap Scan:
--    Fetch pages in result_bitmap
--    Recheck: Evaluate full predicate on each tuple
 
-- PostgreSQL EXPLAIN:
-- Bitmap Heap Scan on orders
--   Recheck Cond: (customer_type = 'VIP' OR total_amount > 10000 OR ...)
--   -> BitmapOr
--        -> Bitmap Index Scan on idx_cust_type
--             Index Cond: (customer_type = 'VIP')
--        -> Bitmap Index Scan on idx_amount
--             Index Cond: (total_amount > 10000)
--        -> Bitmap Index Scan on idx_priority
--             Index Cond: (priority = 'urgent')
 
-- Key properties:
-- + Uses all three indexes
-- + Heap pages read once (union, not duplicated)
-- + Sequential heap access (sorted bitmap)
-- - Must scan all three index portions
-- - Bitmap can become lossy for large result sets

Strategy 3: UNION Query Transformation

The optimizer can transform an OR query into a UNION of separate queries:

union_transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Strategy 3: UNION Query Transformation
 
-- Original query with OR:
SELECT * FROM orders
WHERE customer_type = 'VIP'
   OR total_amount > 10000;
 
-- Transformed to UNION (semantically equivalent):
SELECT * FROM orders WHERE customer_type = 'VIP'
UNION
SELECT * FROM orders WHERE total_amount > 10000;
 
-- Or with duplicate elimination avoided where possible:
SELECT * FROM orders WHERE customer_type = 'VIP'
UNION ALL
SELECT * FROM orders WHERE total_amount > 10000
  AND NOT (customer_type = 'VIP');  -- Exclude already-matched
 
-- Execution:
-- 1. Index scan on idx_cust_type for VIP orders
-- 2. Index scan on idx_amount for high-value orders  
-- 3. Hash or sort to eliminate duplicates (for UNION)
 
-- When this approach is used:
-- + Each individual predicate is highly selective
-- + Separate indexes exist and are efficient
-- + Optimizer determines UNION cheaper than bitmap OR
 
-- PostgreSQL may automatically do this transformation
-- Some databases require manual rewriting
 
-- Cost considerations:
-- UNION: Duplicate elimination overhead
-- UNION ALL with exclusion: Complex predicates but no dedup

Disjunctive Selection Strategy Comparison
Strategy	When Applicable	Advantages	Disadvantages
Sequential Scan	High combined selectivity, no indexes	Simple, predictable	Reads entire table
Bitmap OR	Multiple indexes exist	Combines indexes efficiently	Multiple index scans, bitmap overhead
UNION Transform	Each predicate highly selective	Uses best index per predicate	Duplicate handling cost
Single Index + Filter	One predicate dominates selectivity	Simple execution	Other predicates as filter only

Strategy Selection Heuristic

If combined OR selectivity > 30%, sequential scan often wins. If each predicate is <5% selective with good indexes, bitmap OR or UNION may be better. If one predicate is far more restrictive than others, consider using just that index and filtering the rest.

Bitmap OR: The Index Union Technique

Bitmap OR is the primary mechanism for combining multiple indexes in disjunctive selections. Unlike Bitmap AND (which shrinks results), Bitmap OR expands the result set to include all tuples matching any predicate.

How Bitmap OR Works:

bitmap_or_internals.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Bitmap OR Internals
 
-- Table: 1 million rows across 10,000 pages (~100 rows/page)
-- idx_status: 'pending' matches 50,000 rows (5%)
-- idx_region: 'EMEA' matches 200,000 rows (20%)
 
-- Query: WHERE status = 'pending' OR region = 'EMEA'
 
-- Bitmap Index Scan 1 (idx_status):
-- For each entry with status='pending':
--   Set bit for that page/tuple in bitmap1
-- Result: bitmap1 has ~1,500 pages marked (some pages have multiple pending)
 
-- Bitmap Index Scan 2 (idx_region):
-- For each entry with region='EMEA':
--   Set bit for that page/tuple in bitmap2
-- Result: bitmap2 has ~4,000 pages marked
 
-- BitmapOr operation:
-- result_bitmap[i] = bitmap1[i] OR bitmap2[i]
-- Result: ~4,800 pages (union, with some overlap)
 
-- Bitmap Heap Scan:
-- For each page in result_bitmap:
--   Fetch page from heap
--   For each tuple on page:
--     Recheck: Does (status = 'pending' OR region = 'EMEA')?
--     If yes, emit tuple
 
-- Key insight: Unlike AND (intersection shrinks), 
-- OR (union) grows the result bitmap
 
-- Total rows matched: 50,000 + 200,000 - overlap ≈ 230,000
-- (Overlap = rows matching both predicates, removed by union)

Lossy Bitmaps in OR Operations:

When bitmaps grow too large to fit in work_mem, they become "lossy"—storing only page numbers, not tuple offsets. This has different implications for OR versus AND:

Lossy Bitmap Impact: OR vs AND
Aspect	Bitmap AND (Intersection)	Bitmap OR (Union)
Exactness	Still correct (intersect pages)	Still correct (union pages)
Recheck scope	Tuples on intersected pages	Tuples on union of pages
False positives	Tuples on page not matching all	Tuples on page not matching any
Typical overhead	Moderate (intersection is small)	Higher (union can be large)

bitmap_memory.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Bitmap Memory Configuration
 
-- PostgreSQL work_mem controls bitmap size threshold
SET work_mem = '64MB';  -- Larger = more exact bitmaps
 
-- Check if bitmap became lossy:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM orders
WHERE status = 'pending' OR region = 'EMEA';
 
-- Look for:
--   Heap Blocks: exact=1234 lossy=567
--                     ↑            ↑
--              exact pages    lossy pages (page-level only)
 
-- Lossy pages require rechecking ALL tuples on the page,
-- not just those in the bitmap
 
-- For OR operations, lossy bitmaps are more costly because
-- the union bitmap tends to be larger (more pages to recheck)
 
-- Tuning approach:
-- 1. Monitor lossy vs exact blocks in EXPLAIN ANALYZE
-- 2. Increase work_mem if too many lossy blocks
-- 3. Consider if bitmap approach is even beneficial
--    (if union is most of the table, scan may be cheaper)

OR Bitmap Size Growth

Because OR combines result sets, the union bitmap grows with each predicate. For n predicates each matching m pages, the union size approaches n×m (minus overlaps). High-fanout OR conditions (many predicates) can create very large bitmaps, making sequential scan more attractive.

IN List Optimization

The IN predicate (column IN (v₁, v₂, ..., vₙ)) is semantically equivalent to a disjunction on the same column (column = v₁ OR column = v₂ OR ... OR column = vₙ). However, databases implement special optimizations for IN lists due to their frequency and regularity.

IN List Execution Strategies:

in_list_strategies.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- IN List Execution Strategies
 
-- Query:
SELECT * FROM orders WHERE customer_id IN (101, 205, 307, 412, 599);
 
-- Strategy 1: Multiple Index Probes
-- With index on customer_id:
-- - Probe index for customer_id = 101
-- - Probe index for customer_id = 205
-- - ... (5 separate index lookups)
-- Cost: 5 × (tree_height + 1) page reads + heap accesses
 
-- Strategy 2: Index Range Scan (if values clustered)
-- If IN list values form a range-like pattern and correlation is high:
-- - Single index scan from min to max value
-- - Filter non-matching values
-- Less common, optimizer must estimate benefit
 
-- Strategy 3: Hash Match
-- For very large IN lists:
-- - Build hash set of IN values
-- - Scan table/index, check each value against hash
-- Efficient for IN lists with 100+ values
 
-- PostgreSQL handles IN lists automatically:
EXPLAIN SELECT * FROM orders WHERE customer_id IN (101, 205, 307);
 
-- For small lists (typically < 10-20):
-- -> Index Scan using idx_customer on orders
--      Index Cond: (customer_id = ANY ('{101,205,307}'::integer[]))
 
-- For large lists or when IN is not efficient:
-- -> Seq Scan on orders
--      Filter: (customer_id = ANY (...))

Large IN List Handling:

In lists can contain hundreds or thousands of values, especially when generated programmatically. Special handling is needed:

large_in_list.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Large IN List Optimization
 
-- Problem: 10,000 values in IN list
SELECT * FROM products WHERE product_id IN (1, 2, 3, ..., 10000);
 
-- Issues with naive approach:
-- - 10,000 index probes (even if fast, overhead adds up)
-- - Query parsing and planning overhead
-- - Memory for storing the list
 
-- Solution 1: VALUES join (PostgreSQL)
SELECT p.* FROM products p
JOIN (VALUES (1), (2), ..., (10000)) AS v(id) ON p.product_id = v.id;
 
-- Better yet, use UNNEST with an array:
SELECT p.* FROM products p
JOIN UNNEST(ARRAY[1, 2, ..., 10000]) AS id ON p.product_id = id;
 
-- Solution 2: Temporary table
CREATE TEMP TABLE lookup_ids (id INTEGER PRIMARY KEY);
INSERT INTO lookup_ids VALUES (1), (2), ..., (10000);
 
SELECT p.* FROM products p
JOIN lookup_ids l ON p.product_id = l.id;
 
-- Benefits:
-- + Proper join optimization (hash/merge)
-- + Statistics can be computed on temp table
-- + Works for very large lists
 
-- Solution 3: ANY array operator with GIN index
-- For repeated queries on the same set pattern:
CREATE INDEX idx_prod_id_gin ON products USING GIN (product_id);
SELECT * FROM products WHERE product_id = ANY(ARRAY[1, 2, ..., 10000]);
 
-- Anti-pattern: Dynamic SQL with embedded values
-- BAD: "... WHERE id IN (" + str(ids) + ")"
-- - SQL injection risk
-- - No query plan caching
-- - Parser overhead for each unique list

IN List Size Optimization Guidelines
List Size	Recommended Approach	Notes
1-10 values	Direct IN clause	Multiple index probes, efficient
10-100 values	IN clause or ANY(array)	Optimizer handles well
100-1000 values	ANY(array) or temp table join	Consider hash join
1000+ values	Temp table join	Always prefer structured join

IN List Anti-Pattern

Avoid generating extremely long IN lists in application code. Beyond ~1000 values, the query text itself becomes problematic (parsing cost, network transfer, query cache inefficiency). Use temp tables, table-valued parameters, or batch the query into smaller chunks.

OR Optimization Techniques

Disjunctive predicates present unique optimization challenges. This section covers advanced techniques for improving OR query performance.

Technique 1: OR-to-UNION Rewriting

or_to_union.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Technique 1: Manual OR-to-UNION Transformation
 
-- Original (may not use indexes optimally):
SELECT * FROM orders
WHERE (customer_region = 'EAST' AND total > 1000)
   OR (customer_region = 'WEST' AND total > 5000);
 
-- Rewritten as UNION (each branch uses optimal access):
SELECT * FROM orders WHERE customer_region = 'EAST' AND total > 1000
UNION ALL
SELECT * FROM orders WHERE customer_region = 'WEST' AND total > 5000;
 
-- Why UNION ALL is safe here:
-- The two conditions are mutually exclusive (EAST vs WEST)
-- No duplicates possible, so UNION ALL avoids dedup cost
 
-- With index on (customer_region, total):
-- Each UNION branch uses optimal index range scan
-- Combined cost < sequential scan with OR
 
-- When optimizer doesn't auto-transform:
-- Some databases (older versions, complex conditions) don't
-- automatically rewrite OR to UNION. Manual rewriting helps.

Technique 2: OR Expansion with EXISTS

or_exists.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Technique 2: OR Expansion Using EXISTS
 
-- Complex OR with subquery:
SELECT * FROM employees e
WHERE e.department_id IN (SELECT d.id FROM departments d WHERE d.budget > 1000000)
   OR e.salary > 150000;
 
-- Problem: Subquery OR with scalar condition - hard to optimize
 
-- Rewrite using EXISTS for clarity:
SELECT * FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d 
    WHERE d.id = e.department_id AND d.budget > 1000000
)
OR e.salary > 150000;
 
-- Sometimes decomposing helps optimizer:
SELECT * FROM employees WHERE salary > 150000
UNION
SELECT e.* FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.budget > 1000000
  AND e.salary <= 150000;  -- Exclude already matched

Technique 3: Predicate Consolidation

predicate_consolidation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Technique 3: Predicate Consolidation
 
-- Inefficient: OR on same column with overlapping ranges
SELECT * FROM products
WHERE price < 100
   OR price < 200
   OR price < 50;
 
-- Consolidate to single predicate:
SELECT * FROM products WHERE price < 200;
-- (200 subsumes 100 and 50)
 
-- More complex consolidation:
SELECT * FROM orders
WHERE status = 'pending'
   OR status = 'processing'
   OR status = 'pending';  -- Duplicate!
 
-- Consolidate:
SELECT * FROM orders WHERE status IN ('pending', 'processing');
 
-- Optimizer usually does this, but not always for complex expressions:
SELECT * FROM events
WHERE (event_date > '2024-01-01' AND event_date < '2024-06-01')
   OR (event_date > '2024-03-01' AND event_date < '2024-09-01');
 
-- Human can see overlap and consolidate:
SELECT * FROM events
WHERE event_date > '2024-01-01' AND event_date < '2024-09-01';
-- (Union of ranges forms single range)

Technique 4: Partial Index for Common OR Patterns

partial_index_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Technique 4: Partial Index for Frequent OR Patterns
 
-- Frequent query pattern:
SELECT * FROM tickets
WHERE status = 'open' OR status = 'in_progress';
 
-- Instead of:
CREATE INDEX idx_status ON tickets(status);
-- (Index must scan two separate key ranges)
 
-- Create partial index matching the OR:
CREATE INDEX idx_active_tickets ON tickets(created_at)
    WHERE status IN ('open', 'in_progress');
 
-- Benefits:
-- + Smaller index (excludes closed/resolved tickets)
-- + Predicate automatically satisfied
-- + Additional columns (created_at) aid other filters
 
-- Query automatically uses partial index:
SELECT * FROM tickets
WHERE (status = 'open' OR status = 'in_progress')
  AND created_at > NOW() - INTERVAL '7 days';
 
-- EXPLAIN shows:
-- Index Scan using idx_active_tickets on tickets
--   Index Cond: (created_at > ...)
-- (status predicate is implicit in partial index)

Query Pattern Analysis

Examine your application's most frequent OR patterns. If certain OR combinations appear repeatedly, consider creating partial indexes or materialized views for them. The key insight is that OR optimization often requires schema-level solutions, not just query-level rewrites.

Cost Model for Disjunctive Selection

Accurate cost estimation for disjunctive selections requires understanding how OR predicates interact. The cost model differs significantly from conjunctive selection.

Selectivity Estimation for OR:

For independent predicates:

s(P₁ OR P₂) = s(P₁) + s(P₂) - s(P₁) × s(P₂)

This is the inclusion-exclusion principle: add individual selectivities, subtract the overlap.

For n predicates:

s(P₁ OR P₂ OR ... OR Pₙ) = 1 - ∏(1 - sᵢ)

Equivalent to "probability at least one matches."

OR Selectivity Examples
P1 Selectivity	P2 Selectivity	Combined (OR)	Naive Sum	Error
10%	10%	19%	20%	5%
5%	5%	9.75%	10%	2.5%
50%	50%	75%	100%	25%
20%	30%	44%	50%	12%
1%	1%	1.99%	2%	0.5%

or_selectivity.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- OR Selectivity Calculation
 
-- Given:
-- P1: status = 'pending' (selectivity 0.15)
-- P2: priority = 'high' (selectivity 0.10)
-- P3: age > 30 (selectivity 0.20)
 
-- Individual OR:
-- s(P1 OR P2) = 0.15 + 0.10 - (0.15 × 0.10) = 0.235 = 23.5%
 
-- Three-way OR (using complement):
-- s(P1 OR P2 OR P3) = 1 - (1-0.15)(1-0.10)(1-0.20)
--                    = 1 - (0.85 × 0.90 × 0.80)
--                    = 1 - 0.612
--                    = 0.388 = 38.8%
 
-- Naive sum: 0.15 + 0.10 + 0.20 = 0.45 = 45%
-- Error: 45% - 38.8% = 6.2% overestimate
 
-- The more predicates, the more overlap subtraction matters
-- At extreme: 10 predicates each with 10% selectivity
-- Naive: 100% (impossible!)
-- Actual: 1 - 0.9^10 = 65%
 
-- PostgreSQL validates with EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM orders
WHERE status = 'pending' OR priority = 'high' OR ...;
-- Compare "rows" (estimated) vs "actual rows"

Access Path Cost Comparison:

The critical decision point is whether bitmap OR or sequential scan is more efficient:

Sequential scan cost: B (total pages) + T × pred_eval_cost

Bitmap OR cost: Σ(index_scan_costᵢ) + union_pages × random_io_cost + union_tuples × recheck_cost

Where:

Σ(index_scan_costᵢ) = sum of all individual index scans
union_pages = pages in bitmap union (can approach B for high selectivity)
recheck_cost = CPU cost to verify predicate

cost_comparison_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Cost Comparison: Bitmap OR vs Sequential Scan
 
-- Given:
-- Table: 1,000,000 rows in 10,000 pages
-- P1: matches 100,000 rows (10%), spans 3,000 pages
-- P2: matches 80,000 rows (8%), spans 2,500 pages
-- Combined: matches ~170,000 rows, spans ~4,500 pages (some overlap)
 
-- Sequential Scan Cost:
-- Read: 10,000 pages
-- Evaluate: 1,000,000 predicate evaluations (cheap)
-- Total I/O: 10,000 page reads (sequential)
 
-- Bitmap OR Cost:
-- Index scan P1: ~100 pages (B+tree leaves for 100K entries)
-- Index scan P2: ~80 pages
-- Bitmap operations: negligible CPU
-- Heap scan: 4,500 pages (sequential, sorted bitmap)
-- Recheck: ~500,000 tuples on those pages
-- Total I/O: 100 + 80 + 4,500 = 4,680 pages
 
-- Analysis:
-- Bitmap OR reads 4,680 pages
-- Sequential reads 10,000 pages
-- Bitmap OR wins! (2x fewer pages)
 
-- But if combined selectivity were 60% (most pages):
-- Bitmap OR: 100 + 80 + 9,000 = 9,180 pages
-- Sequential: 10,000 pages
-- Nearly equal - sequential may win due to simpler access pattern

The OR Selectivity Trap

High combined OR selectivity (>30-50%) often makes sequential scan the better choice. The optimizer may incorrectly estimate selectivity due to correlation between predicates (e.g., high-value orders are often from VIP customers), leading to poor access path selection. Always validate with EXPLAIN ANALYZE for critical OR queries.

Mixed AND-OR Predicates

Real-world queries often combine AND and OR in complex predicate structures. Understanding how databases process these mixed predicates is essential for optimization.

Conjunctive Normal Form (CNF) vs Disjunctive Normal Form (DNF):

Logical predicates can be normalized to standard forms:

normal_forms.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Predicate Normal Forms
 
-- Conjunctive Normal Form (CNF): AND of ORs
-- (A OR B) AND (C OR D) AND (E OR F)
 
-- Disjunctive Normal Form (DNF): OR of ANDs  
-- (A AND B) OR (C AND D) OR (E AND F)
 
-- Example: Complex predicate
-- Original:
WHERE (status = 'active' AND region = 'US')
   OR (status = 'pending' AND priority = 'high')
 
-- This is already in DNF: OR of two AND terms
-- Can be processed as:
-- 1. UNION of two index scans (if indexes suit each AND term)
-- 2. Or sequential scan with combined predicate
 
-- Convert to CNF (less intuitive for this case):
-- Expanding: equivalent to
-- (status = 'active' OR status = 'pending')       -- Must be active or pending
-- AND (status = 'active' OR priority = 'high')    -- If not active, must be high priority
-- AND (region = 'US' OR status = 'pending')       -- If not pending, must be US
-- AND (region = 'US' OR priority = 'high')        -- If not US, must be high priority
 
-- CNF has 4 clauses vs DNF has 2 terms
-- For optimization, DNF often maps better to UNION strategies

Processing Strategies for Mixed Predicates:

Mixed Predicate Processing Strategies

•DNF-based UNION: Convert to DNF, execute each AND-term as a separate subquery, UNION results. Works well when each term is selective.
•CNF-based bitmap intersection: Convert to CNF, use bitmap OR for each clause, then bitmap AND across clauses. Can be complex but sometimes optimal.
•Factor out common AND: If outer AND applies to all: A AND (B OR C) → first filter by A using index, then check (B OR C) as filter.
•Sequential scan with combined evaluation: For complex predicates or high overall selectivity, evaluate entire predicate during table scan.

mixed_optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Mixed Predicate Optimization
 
-- Pattern: Outer AND with inner OR
SELECT * FROM orders
WHERE order_date > '2024-01-01'              -- Outer AND (selective)
  AND (status = 'pending' OR status = 'processing');  -- Inner OR
 
-- Strategy: Use outer AND with index, OR as filter
-- With index on order_date:
-- 1. Index scan for order_date > '2024-01-01'
-- 2. Filter: status IN ('pending', 'processing')
 
-- PostgreSQL EXPLAIN:
-- Index Scan using idx_order_date on orders
--   Index Cond: (order_date > '2024-01-01')
--   Filter: (status = ANY ('{pending,processing}'))
 
-- Pattern: OR of selective AND terms
SELECT * FROM orders
WHERE (customer_id = 100 AND status = 'shipped')    -- Term 1
   OR (customer_id = 200 AND status = 'delivered'); -- Term 2
 
-- Strategy: Process as UNION if indexes exist
-- With index on (customer_id, status):
-- 1. Index scan for (100, 'shipped')
-- 2. Index scan for (200, 'delivered')  
-- 3. Combine results (no dedup needed - different customers)
 
-- PostgreSQL may choose BitmapOr:
-- Bitmap Heap Scan on orders
--   -> BitmapOr
--        -> Bitmap Index Scan (customer_id = 100 AND status = 'shipped')
--        -> Bitmap Index Scan (customer_id = 200 AND status = 'delivered')
 
-- Pattern: Complex nesting
SELECT * FROM orders
WHERE region = 'US'
  AND ((status = 'pending' AND total > 1000) 
       OR (status = 'urgent' AND total > 500));
 
-- Rewrite for clarity and potential optimization:
-- Factor out region, then UNION the inner terms:
(SELECT * FROM orders WHERE region = 'US' AND status = 'pending' AND total > 1000)
UNION ALL
(SELECT * FROM orders WHERE region = 'US' AND status = 'urgent' AND total > 500);

Factoring Heuristic

When you have AND surrounding OR, factor out the AND predicates and apply them first using the best available index. The OR then becomes a simple filter on a reduced result set. This works well when the outer AND is highly selective.

Summary: Mastering Disjunctive Selection

We have explored disjunctive selection implementation comprehensively, from fundamental union semantics through bitmap OR operations to complex mixed predicate handling. Let us consolidate the essential knowledge:

Key Takeaways

•Disjunctive selections combine predicates with OR. Unlike AND (intersection), OR creates union of results, expanding rather than narrowing the result set.
•Three main strategies exist: sequential scan with OR evaluation, bitmap OR to combine index results, and UNION transformation for processing each term independently.
•Bitmap OR unions page-level bitmaps from multiple index scans, enabling combined index usage for OR predicates on different columns.
•IN lists are a special case of disjunction on a single column, with specialized optimizations including direct index probing and hash-based matching.
•OR selectivity requires inclusion-exclusion calculation: s(A OR B) = s(A) + s(B) - s(A)×s(B). High combined selectivity often favors sequential scan.
•Optimization techniques include OR-to-UNION rewriting, predicate consolidation, and partial indexes for common OR patterns.
•Mixed AND-OR predicates benefit from factoring strategy: apply selective outer AND first, then evaluate inner OR as filter.

What's Next:

With comprehensive understanding of both conjunctive and disjunctive selection, we now proceed to Cost Comparison—synthesizing our knowledge to understand how the optimizer chooses among all available strategies for any given selection predicate.

Page Complete

You now possess expert-level understanding of disjunctive selection implementation—the technique for handling OR predicates that appear throughout real-world queries. This knowledge enables writing efficient OR queries, designing appropriate indexes, and understanding optimizer behavior for complex predicates.

4 / 5

Loading learning content...

Physical OperatorsSelection Implementation

Selection Implementation

LevelIntermediate

Duration90 mins

TopicSelection Implementation

4 / 5

Disjunctive Selection

Combining Multiple Conditions with OR

What You Will Learn

Disjunctive Selection Fundamentals

A disjunctive selection is a selection operation where the predicate consists of multiple conditions combined with logical OR. Formally:

σ(P₁ ∨ P₂ ∨ ... ∨ Pₙ)(R)

where P₁, P₂, ..., Pₙ are individual predicates. A tuple satisfies the disjunctive selection if it satisfies AT LEAST ONE of the predicates.

Mathematical Properties:

Disjunctive selections exhibit properties that differ significantly from conjunctions:

Selectivity Combination: Combined selectivity = 1 - (1-s₁)(1-s₂)...(1-sₙ). For two 10% predicates: 1 - (0.9 × 0.9) = 19% match (not 20% due to overlap).
Union Semantics: Results are the union of tuples matching each predicate. Duplicates must be eliminated if the same tuple matches multiple predicates.
Index Challenge: Unlike AND where multiple indexes narrow results, OR conditions expand results. Using an index for one predicate doesn't help evaluate others.
Short-Circuit Limitation: Finding a match on P₁ eliminates need to check P₂, P₃... for that tuple, but we still must find ALL matching tuples for each predicate.

disjunctive_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Disjunctive Selection Examples
 
-- Example 1: Simple OR on different columns
SELECT * FROM orders
WHERE customer_type = 'VIP'        -- P1
   OR total_amount > 10000         -- P2
   OR priority = 'urgent';         -- P3
 
-- Selectivity analysis (with overlap):
-- P1: VIP customers = 5%
-- P2: High value orders = 3%
-- P3: Urgent orders = 2%
-- Naive sum: 5% + 3% + 2% = 10% 
-- But some overlap: VIP customers may have high-value orders
-- Actual: 1 - (0.95 × 0.97 × 0.98) = 9.7%
 
-- Example 2: OR on same column (IN list)
SELECT * FROM employees
WHERE department IN ('Sales', 'Marketing', 'Support');
 
-- Equivalent to:
SELECT * FROM employees
WHERE department = 'Sales'
   OR department = 'Marketing'
   OR department = 'Support';
 
-- Example 3: OR with different operators
SELECT * FROM products
WHERE category = 'Electronics'     -- Equality
   OR price < 10                   -- Range
   OR name LIKE 'Premium%';        -- Pattern
 
-- Example 4: Nested OR within AND
SELECT * FROM orders
WHERE status = 'active'
  AND (customer_region = 'EAST' OR customer_region = 'WEST');
 
-- The parenthesized OR creates a disjunction that must be
-- fully evaluated for each active order

The OR Index Problem

Implementation Strategies for Disjunctive Selection

Strategy 1: Sequential Scan with Combined Predicate

The simplest approach evaluates the entire disjunction for each tuple during a sequential scan:

sequential_scan_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Strategy 1: Sequential Scan with OR Evaluation
 
SELECT * FROM orders
WHERE customer_type = 'VIP'
   OR total_amount > 10000
   OR priority = 'urgent';
 
-- Execution (sequential scan):
FOR EACH tuple IN orders_table:
    IF tuple.customer_type = 'VIP' THEN
        EMIT tuple
    ELSE IF tuple.total_amount > 10000 THEN
        EMIT tuple
    ELSE IF tuple.priority = 'urgent' THEN
        EMIT tuple
 
-- PostgreSQL EXPLAIN:
-- Seq Scan on orders  (cost=0.00..45847.00 rows=97000 width=102)
--   Filter: (customer_type = 'VIP' OR total_amount > 10000 OR priority = 'urgent')
 
-- When this wins:
-- + Combined selectivity is high (many rows match)
-- + No suitable indexes exist
-- + Table is small (fits in memory)
-- + Predicates span different columns without composites
 
-- Cost: Reads all B pages, evaluates predicate on all T tuples
-- CPU cost depends on predicate evaluation order (short-circuit)

Strategy 2: Bitmap OR (Index Union)

When separate indexes exist on OR predicate columns, bitmap operations can combine their results:

bitmap_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Strategy 2: Bitmap OR (Index Union)
 
-- Indexes:
CREATE INDEX idx_cust_type ON orders(customer_type);
CREATE INDEX idx_amount ON orders(total_amount);
CREATE INDEX idx_priority ON orders(priority);
 
-- Query with OR:
SELECT * FROM orders
WHERE customer_type = 'VIP'
   OR total_amount > 10000
   OR priority = 'urgent';
 
-- Execution:
-- 1. Bitmap Index Scan on idx_cust_type:
--    bitmap1 = pages containing customer_type = 'VIP'
-- 2. Bitmap Index Scan on idx_amount:
--    bitmap2 = pages containing total_amount > 10000
-- 3. Bitmap Index Scan on idx_priority:
--    bitmap3 = pages containing priority = 'urgent'
-- 4. BitmapOr:
--    result_bitmap = bitmap1 OR bitmap2 OR bitmap3
-- 5. Bitmap Heap Scan:
--    Fetch pages in result_bitmap
--    Recheck: Evaluate full predicate on each tuple
 
-- PostgreSQL EXPLAIN:
-- Bitmap Heap Scan on orders
--   Recheck Cond: (customer_type = 'VIP' OR total_amount > 10000 OR ...)
--   -> BitmapOr
--        -> Bitmap Index Scan on idx_cust_type
--             Index Cond: (customer_type = 'VIP')
--        -> Bitmap Index Scan on idx_amount
--             Index Cond: (total_amount > 10000)
--        -> Bitmap Index Scan on idx_priority
--             Index Cond: (priority = 'urgent')
 
-- Key properties:
-- + Uses all three indexes
-- + Heap pages read once (union, not duplicated)
-- + Sequential heap access (sorted bitmap)
-- - Must scan all three index portions
-- - Bitmap can become lossy for large result sets

Strategy 3: UNION Query Transformation

The optimizer can transform an OR query into a UNION of separate queries:

union_transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Strategy 3: UNION Query Transformation
 
-- Original query with OR:
SELECT * FROM orders
WHERE customer_type = 'VIP'
   OR total_amount > 10000;
 
-- Transformed to UNION (semantically equivalent):
SELECT * FROM orders WHERE customer_type = 'VIP'
UNION
SELECT * FROM orders WHERE total_amount > 10000;
 
-- Or with duplicate elimination avoided where possible:
SELECT * FROM orders WHERE customer_type = 'VIP'
UNION ALL
SELECT * FROM orders WHERE total_amount > 10000
  AND NOT (customer_type = 'VIP');  -- Exclude already-matched
 
-- Execution:
-- 1. Index scan on idx_cust_type for VIP orders
-- 2. Index scan on idx_amount for high-value orders  
-- 3. Hash or sort to eliminate duplicates (for UNION)
 
-- When this approach is used:
-- + Each individual predicate is highly selective
-- + Separate indexes exist and are efficient
-- + Optimizer determines UNION cheaper than bitmap OR
 
-- PostgreSQL may automatically do this transformation
-- Some databases require manual rewriting
 
-- Cost considerations:
-- UNION: Duplicate elimination overhead
-- UNION ALL with exclusion: Complex predicates but no dedup

Disjunctive Selection Strategy Comparison
Strategy	When Applicable	Advantages	Disadvantages
Sequential Scan	High combined selectivity, no indexes	Simple, predictable	Reads entire table
Bitmap OR	Multiple indexes exist	Combines indexes efficiently	Multiple index scans, bitmap overhead
UNION Transform	Each predicate highly selective	Uses best index per predicate	Duplicate handling cost
Single Index + Filter	One predicate dominates selectivity	Simple execution	Other predicates as filter only

Strategy Selection Heuristic

Bitmap OR: The Index Union Technique

How Bitmap OR Works:

bitmap_or_internals.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Bitmap OR Internals
 
-- Table: 1 million rows across 10,000 pages (~100 rows/page)
-- idx_status: 'pending' matches 50,000 rows (5%)
-- idx_region: 'EMEA' matches 200,000 rows (20%)
 
-- Query: WHERE status = 'pending' OR region = 'EMEA'
 
-- Bitmap Index Scan 1 (idx_status):
-- For each entry with status='pending':
--   Set bit for that page/tuple in bitmap1
-- Result: bitmap1 has ~1,500 pages marked (some pages have multiple pending)
 
-- Bitmap Index Scan 2 (idx_region):
-- For each entry with region='EMEA':
--   Set bit for that page/tuple in bitmap2
-- Result: bitmap2 has ~4,000 pages marked
 
-- BitmapOr operation:
-- result_bitmap[i] = bitmap1[i] OR bitmap2[i]
-- Result: ~4,800 pages (union, with some overlap)
 
-- Bitmap Heap Scan:
-- For each page in result_bitmap:
--   Fetch page from heap
--   For each tuple on page:
--     Recheck: Does (status = 'pending' OR region = 'EMEA')?
--     If yes, emit tuple
 
-- Key insight: Unlike AND (intersection shrinks), 
-- OR (union) grows the result bitmap
 
-- Total rows matched: 50,000 + 200,000 - overlap ≈ 230,000
-- (Overlap = rows matching both predicates, removed by union)

Lossy Bitmaps in OR Operations:

When bitmaps grow too large to fit in work_mem, they become "lossy"—storing only page numbers, not tuple offsets. This has different implications for OR versus AND:

Lossy Bitmap Impact: OR vs AND
Aspect	Bitmap AND (Intersection)	Bitmap OR (Union)
Exactness	Still correct (intersect pages)	Still correct (union pages)
Recheck scope	Tuples on intersected pages	Tuples on union of pages
False positives	Tuples on page not matching all	Tuples on page not matching any
Typical overhead	Moderate (intersection is small)	Higher (union can be large)

bitmap_memory.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Bitmap Memory Configuration
 
-- PostgreSQL work_mem controls bitmap size threshold
SET work_mem = '64MB';  -- Larger = more exact bitmaps
 
-- Check if bitmap became lossy:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM orders
WHERE status = 'pending' OR region = 'EMEA';
 
-- Look for:
--   Heap Blocks: exact=1234 lossy=567
--                     ↑            ↑
--              exact pages    lossy pages (page-level only)
 
-- Lossy pages require rechecking ALL tuples on the page,
-- not just those in the bitmap
 
-- For OR operations, lossy bitmaps are more costly because
-- the union bitmap tends to be larger (more pages to recheck)
 
-- Tuning approach:
-- 1. Monitor lossy vs exact blocks in EXPLAIN ANALYZE
-- 2. Increase work_mem if too many lossy blocks
-- 3. Consider if bitmap approach is even beneficial
--    (if union is most of the table, scan may be cheaper)

OR Bitmap Size Growth

IN List Optimization

IN List Execution Strategies:

in_list_strategies.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- IN List Execution Strategies
 
-- Query:
SELECT * FROM orders WHERE customer_id IN (101, 205, 307, 412, 599);
 
-- Strategy 1: Multiple Index Probes
-- With index on customer_id:
-- - Probe index for customer_id = 101
-- - Probe index for customer_id = 205
-- - ... (5 separate index lookups)
-- Cost: 5 × (tree_height + 1) page reads + heap accesses
 
-- Strategy 2: Index Range Scan (if values clustered)
-- If IN list values form a range-like pattern and correlation is high:
-- - Single index scan from min to max value
-- - Filter non-matching values
-- Less common, optimizer must estimate benefit
 
-- Strategy 3: Hash Match
-- For very large IN lists:
-- - Build hash set of IN values
-- - Scan table/index, check each value against hash
-- Efficient for IN lists with 100+ values
 
-- PostgreSQL handles IN lists automatically:
EXPLAIN SELECT * FROM orders WHERE customer_id IN (101, 205, 307);
 
-- For small lists (typically < 10-20):
-- -> Index Scan using idx_customer on orders
--      Index Cond: (customer_id = ANY ('{101,205,307}'::integer[]))
 
-- For large lists or when IN is not efficient:
-- -> Seq Scan on orders
--      Filter: (customer_id = ANY (...))

Large IN List Handling:

In lists can contain hundreds or thousands of values, especially when generated programmatically. Special handling is needed:

large_in_list.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Large IN List Optimization
 
-- Problem: 10,000 values in IN list
SELECT * FROM products WHERE product_id IN (1, 2, 3, ..., 10000);
 
-- Issues with naive approach:
-- - 10,000 index probes (even if fast, overhead adds up)
-- - Query parsing and planning overhead
-- - Memory for storing the list
 
-- Solution 1: VALUES join (PostgreSQL)
SELECT p.* FROM products p
JOIN (VALUES (1), (2), ..., (10000)) AS v(id) ON p.product_id = v.id;
 
-- Better yet, use UNNEST with an array:
SELECT p.* FROM products p
JOIN UNNEST(ARRAY[1, 2, ..., 10000]) AS id ON p.product_id = id;
 
-- Solution 2: Temporary table
CREATE TEMP TABLE lookup_ids (id INTEGER PRIMARY KEY);
INSERT INTO lookup_ids VALUES (1), (2), ..., (10000);
 
SELECT p.* FROM products p
JOIN lookup_ids l ON p.product_id = l.id;
 
-- Benefits:
-- + Proper join optimization (hash/merge)
-- + Statistics can be computed on temp table
-- + Works for very large lists
 
-- Solution 3: ANY array operator with GIN index
-- For repeated queries on the same set pattern:
CREATE INDEX idx_prod_id_gin ON products USING GIN (product_id);
SELECT * FROM products WHERE product_id = ANY(ARRAY[1, 2, ..., 10000]);
 
-- Anti-pattern: Dynamic SQL with embedded values
-- BAD: "... WHERE id IN (" + str(ids) + ")"
-- - SQL injection risk
-- - No query plan caching
-- - Parser overhead for each unique list

IN List Size Optimization Guidelines
List Size	Recommended Approach	Notes
1-10 values	Direct IN clause	Multiple index probes, efficient
10-100 values	IN clause or ANY(array)	Optimizer handles well
100-1000 values	ANY(array) or temp table join	Consider hash join
1000+ values	Temp table join	Always prefer structured join

IN List Anti-Pattern

OR Optimization Techniques

Disjunctive predicates present unique optimization challenges. This section covers advanced techniques for improving OR query performance.

Technique 1: OR-to-UNION Rewriting

or_to_union.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Technique 1: Manual OR-to-UNION Transformation
 
-- Original (may not use indexes optimally):
SELECT * FROM orders
WHERE (customer_region = 'EAST' AND total > 1000)
   OR (customer_region = 'WEST' AND total > 5000);
 
-- Rewritten as UNION (each branch uses optimal access):
SELECT * FROM orders WHERE customer_region = 'EAST' AND total > 1000
UNION ALL
SELECT * FROM orders WHERE customer_region = 'WEST' AND total > 5000;
 
-- Why UNION ALL is safe here:
-- The two conditions are mutually exclusive (EAST vs WEST)
-- No duplicates possible, so UNION ALL avoids dedup cost
 
-- With index on (customer_region, total):
-- Each UNION branch uses optimal index range scan
-- Combined cost < sequential scan with OR
 
-- When optimizer doesn't auto-transform:
-- Some databases (older versions, complex conditions) don't
-- automatically rewrite OR to UNION. Manual rewriting helps.

Technique 2: OR Expansion with EXISTS

or_exists.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Technique 2: OR Expansion Using EXISTS
 
-- Complex OR with subquery:
SELECT * FROM employees e
WHERE e.department_id IN (SELECT d.id FROM departments d WHERE d.budget > 1000000)
   OR e.salary > 150000;
 
-- Problem: Subquery OR with scalar condition - hard to optimize
 
-- Rewrite using EXISTS for clarity:
SELECT * FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d 
    WHERE d.id = e.department_id AND d.budget > 1000000
)
OR e.salary > 150000;
 
-- Sometimes decomposing helps optimizer:
SELECT * FROM employees WHERE salary > 150000
UNION
SELECT e.* FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.budget > 1000000
  AND e.salary <= 150000;  -- Exclude already matched

Technique 3: Predicate Consolidation

predicate_consolidation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Technique 3: Predicate Consolidation
 
-- Inefficient: OR on same column with overlapping ranges
SELECT * FROM products
WHERE price < 100
   OR price < 200
   OR price < 50;
 
-- Consolidate to single predicate:
SELECT * FROM products WHERE price < 200;
-- (200 subsumes 100 and 50)
 
-- More complex consolidation:
SELECT * FROM orders
WHERE status = 'pending'
   OR status = 'processing'
   OR status = 'pending';  -- Duplicate!
 
-- Consolidate:
SELECT * FROM orders WHERE status IN ('pending', 'processing');
 
-- Optimizer usually does this, but not always for complex expressions:
SELECT * FROM events
WHERE (event_date > '2024-01-01' AND event_date < '2024-06-01')
   OR (event_date > '2024-03-01' AND event_date < '2024-09-01');
 
-- Human can see overlap and consolidate:
SELECT * FROM events
WHERE event_date > '2024-01-01' AND event_date < '2024-09-01';
-- (Union of ranges forms single range)

Technique 4: Partial Index for Common OR Patterns

partial_index_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Technique 4: Partial Index for Frequent OR Patterns
 
-- Frequent query pattern:
SELECT * FROM tickets
WHERE status = 'open' OR status = 'in_progress';
 
-- Instead of:
CREATE INDEX idx_status ON tickets(status);
-- (Index must scan two separate key ranges)
 
-- Create partial index matching the OR:
CREATE INDEX idx_active_tickets ON tickets(created_at)
    WHERE status IN ('open', 'in_progress');
 
-- Benefits:
-- + Smaller index (excludes closed/resolved tickets)
-- + Predicate automatically satisfied
-- + Additional columns (created_at) aid other filters
 
-- Query automatically uses partial index:
SELECT * FROM tickets
WHERE (status = 'open' OR status = 'in_progress')
  AND created_at > NOW() - INTERVAL '7 days';
 
-- EXPLAIN shows:
-- Index Scan using idx_active_tickets on tickets
--   Index Cond: (created_at > ...)
-- (status predicate is implicit in partial index)

Query Pattern Analysis

Cost Model for Disjunctive Selection

Accurate cost estimation for disjunctive selections requires understanding how OR predicates interact. The cost model differs significantly from conjunctive selection.

Selectivity Estimation for OR:

For independent predicates:

s(P₁ OR P₂) = s(P₁) + s(P₂) - s(P₁) × s(P₂)

This is the inclusion-exclusion principle: add individual selectivities, subtract the overlap.

For n predicates:

s(P₁ OR P₂ OR ... OR Pₙ) = 1 - ∏(1 - sᵢ)

Equivalent to "probability at least one matches."

OR Selectivity Examples
P1 Selectivity	P2 Selectivity	Combined (OR)	Naive Sum	Error
10%	10%	19%	20%	5%
5%	5%	9.75%	10%	2.5%
50%	50%	75%	100%	25%
20%	30%	44%	50%	12%
1%	1%	1.99%	2%	0.5%

or_selectivity.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- OR Selectivity Calculation
 
-- Given:
-- P1: status = 'pending' (selectivity 0.15)
-- P2: priority = 'high' (selectivity 0.10)
-- P3: age > 30 (selectivity 0.20)
 
-- Individual OR:
-- s(P1 OR P2) = 0.15 + 0.10 - (0.15 × 0.10) = 0.235 = 23.5%
 
-- Three-way OR (using complement):
-- s(P1 OR P2 OR P3) = 1 - (1-0.15)(1-0.10)(1-0.20)
--                    = 1 - (0.85 × 0.90 × 0.80)
--                    = 1 - 0.612
--                    = 0.388 = 38.8%
 
-- Naive sum: 0.15 + 0.10 + 0.20 = 0.45 = 45%
-- Error: 45% - 38.8% = 6.2% overestimate
 
-- The more predicates, the more overlap subtraction matters
-- At extreme: 10 predicates each with 10% selectivity
-- Naive: 100% (impossible!)
-- Actual: 1 - 0.9^10 = 65%
 
-- PostgreSQL validates with EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM orders
WHERE status = 'pending' OR priority = 'high' OR ...;
-- Compare "rows" (estimated) vs "actual rows"

Access Path Cost Comparison:

The critical decision point is whether bitmap OR or sequential scan is more efficient:

Sequential scan cost: B (total pages) + T × pred_eval_cost

Bitmap OR cost: Σ(index_scan_costᵢ) + union_pages × random_io_cost + union_tuples × recheck_cost

Where:

Σ(index_scan_costᵢ) = sum of all individual index scans
union_pages = pages in bitmap union (can approach B for high selectivity)
recheck_cost = CPU cost to verify predicate

cost_comparison_or.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Cost Comparison: Bitmap OR vs Sequential Scan
 
-- Given:
-- Table: 1,000,000 rows in 10,000 pages
-- P1: matches 100,000 rows (10%), spans 3,000 pages
-- P2: matches 80,000 rows (8%), spans 2,500 pages
-- Combined: matches ~170,000 rows, spans ~4,500 pages (some overlap)
 
-- Sequential Scan Cost:
-- Read: 10,000 pages
-- Evaluate: 1,000,000 predicate evaluations (cheap)
-- Total I/O: 10,000 page reads (sequential)
 
-- Bitmap OR Cost:
-- Index scan P1: ~100 pages (B+tree leaves for 100K entries)
-- Index scan P2: ~80 pages
-- Bitmap operations: negligible CPU
-- Heap scan: 4,500 pages (sequential, sorted bitmap)
-- Recheck: ~500,000 tuples on those pages
-- Total I/O: 100 + 80 + 4,500 = 4,680 pages
 
-- Analysis:
-- Bitmap OR reads 4,680 pages
-- Sequential reads 10,000 pages
-- Bitmap OR wins! (2x fewer pages)
 
-- But if combined selectivity were 60% (most pages):
-- Bitmap OR: 100 + 80 + 9,000 = 9,180 pages
-- Sequential: 10,000 pages
-- Nearly equal - sequential may win due to simpler access pattern

The OR Selectivity Trap

Mixed AND-OR Predicates

Real-world queries often combine AND and OR in complex predicate structures. Understanding how databases process these mixed predicates is essential for optimization.

Conjunctive Normal Form (CNF) vs Disjunctive Normal Form (DNF):

Logical predicates can be normalized to standard forms:

normal_forms.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Predicate Normal Forms
 
-- Conjunctive Normal Form (CNF): AND of ORs
-- (A OR B) AND (C OR D) AND (E OR F)
 
-- Disjunctive Normal Form (DNF): OR of ANDs  
-- (A AND B) OR (C AND D) OR (E AND F)
 
-- Example: Complex predicate
-- Original:
WHERE (status = 'active' AND region = 'US')
   OR (status = 'pending' AND priority = 'high')
 
-- This is already in DNF: OR of two AND terms
-- Can be processed as:
-- 1. UNION of two index scans (if indexes suit each AND term)
-- 2. Or sequential scan with combined predicate
 
-- Convert to CNF (less intuitive for this case):
-- Expanding: equivalent to
-- (status = 'active' OR status = 'pending')       -- Must be active or pending
-- AND (status = 'active' OR priority = 'high')    -- If not active, must be high priority
-- AND (region = 'US' OR status = 'pending')       -- If not pending, must be US
-- AND (region = 'US' OR priority = 'high')        -- If not US, must be high priority
 
-- CNF has 4 clauses vs DNF has 2 terms
-- For optimization, DNF often maps better to UNION strategies

Processing Strategies for Mixed Predicates:

Mixed Predicate Processing Strategies

•DNF-based UNION: Convert to DNF, execute each AND-term as a separate subquery, UNION results. Works well when each term is selective.
•CNF-based bitmap intersection: Convert to CNF, use bitmap OR for each clause, then bitmap AND across clauses. Can be complex but sometimes optimal.
•Factor out common AND: If outer AND applies to all: A AND (B OR C) → first filter by A using index, then check (B OR C) as filter.
•Sequential scan with combined evaluation: For complex predicates or high overall selectivity, evaluate entire predicate during table scan.

mixed_optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Mixed Predicate Optimization
 
-- Pattern: Outer AND with inner OR
SELECT * FROM orders
WHERE order_date > '2024-01-01'              -- Outer AND (selective)
  AND (status = 'pending' OR status = 'processing');  -- Inner OR
 
-- Strategy: Use outer AND with index, OR as filter
-- With index on order_date:
-- 1. Index scan for order_date > '2024-01-01'
-- 2. Filter: status IN ('pending', 'processing')
 
-- PostgreSQL EXPLAIN:
-- Index Scan using idx_order_date on orders
--   Index Cond: (order_date > '2024-01-01')
--   Filter: (status = ANY ('{pending,processing}'))
 
-- Pattern: OR of selective AND terms
SELECT * FROM orders
WHERE (customer_id = 100 AND status = 'shipped')    -- Term 1
   OR (customer_id = 200 AND status = 'delivered'); -- Term 2
 
-- Strategy: Process as UNION if indexes exist
-- With index on (customer_id, status):
-- 1. Index scan for (100, 'shipped')
-- 2. Index scan for (200, 'delivered')  
-- 3. Combine results (no dedup needed - different customers)
 
-- PostgreSQL may choose BitmapOr:
-- Bitmap Heap Scan on orders
--   -> BitmapOr
--        -> Bitmap Index Scan (customer_id = 100 AND status = 'shipped')
--        -> Bitmap Index Scan (customer_id = 200 AND status = 'delivered')
 
-- Pattern: Complex nesting
SELECT * FROM orders
WHERE region = 'US'
  AND ((status = 'pending' AND total > 1000) 
       OR (status = 'urgent' AND total > 500));
 
-- Rewrite for clarity and potential optimization:
-- Factor out region, then UNION the inner terms:
(SELECT * FROM orders WHERE region = 'US' AND status = 'pending' AND total > 1000)
UNION ALL
(SELECT * FROM orders WHERE region = 'US' AND status = 'urgent' AND total > 500);

Factoring Heuristic

Summary: Mastering Disjunctive Selection

Key Takeaways

•Disjunctive selections combine predicates with OR. Unlike AND (intersection), OR creates union of results, expanding rather than narrowing the result set.
•Three main strategies exist: sequential scan with OR evaluation, bitmap OR to combine index results, and UNION transformation for processing each term independently.
•Bitmap OR unions page-level bitmaps from multiple index scans, enabling combined index usage for OR predicates on different columns.
•IN lists are a special case of disjunction on a single column, with specialized optimizations including direct index probing and hash-based matching.
•OR selectivity requires inclusion-exclusion calculation: s(A OR B) = s(A) + s(B) - s(A)×s(B). High combined selectivity often favors sequential scan.
•Optimization techniques include OR-to-UNION rewriting, predicate consolidation, and partial indexes for common OR patterns.
•Mixed AND-OR predicates benefit from factoring strategy: apply selective outer AND first, then evaluate inner OR as filter.

What's Next:

Page Complete

4 / 5