Database Management SystemsMulti-Column Indexes

Multi-Column Indexes

LevelIntermediate

Duration60 mins

TopicMulti-Column Indexes

4 / 5

Index Intersection

When One Index Isn't Enough

Sometimes, a single index—composite or otherwise—cannot efficiently satisfy a query. Consider a query with complex predicates across columns that don't share a common index prefix, or conditions that combine AND with OR in ways that defeat any single index's coverage.

In these situations, modern databases employ a technique called index intersection (also known as index merge)—combining results from multiple indexes to produce the final result set more efficiently than any single index could alone.

This page explores how index intersection works, when it's beneficial, its limitations, and how it interacts with composite index design decisions.

What You Will Learn

By the end of this page, you will understand the mechanics of index intersection, the different types (AND vs OR intersection), when the optimizer chooses intersection over single-index scans, and how to design your index strategy to leverage or avoid intersection appropriately.

What is Index Intersection?

Index intersection is a query execution strategy where the database uses multiple indexes on the same table and combines their results to satisfy a single query. Instead of relying on one index to find all matching rows, the optimizer uses two or more indexes, each filtering on different predicates, then merges the results.

The Two Main Types:

Index Intersection (AND) — Combines results from multiple indexes using set intersection. A row must appear in ALL index result sets to be included. Used for queries with multiple AND conditions.
Index Union (OR) — Combines results from multiple indexes using set union. A row appearing in ANY index result set is included. Used for queries with OR conditions.

Converting Mermaid diagram...

index_intersection_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Table: products (with separate single-column indexes)
CREATE INDEX idx_category ON products (category_id);
CREATE INDEX idx_price ON products (price);
 
-- Query that benefits from index intersection:
SELECT * FROM products
WHERE category_id = 50 AND price < 100.00;
 
-- Without index intersection:
-- Option A: Use idx_category, then filter price → scans many rows
-- Option B: Use idx_price, then filter category → scans many rows
-- Option C: Full table scan → worst case
 
-- With index intersection:
-- Step 1: idx_category finds row IDs where category_id = 50
-- Step 2: idx_price finds row IDs where price < 100
-- Step 3: Intersect these row ID sets
-- Step 4: Fetch only the matching rows from the table
 
-- The intersection is efficient when:
-- - Each index is selective (few rows each)
-- - The intersection is significantly smaller than either set

How Index Intersection Works

The mechanics of index intersection depend on how indexes store row identifiers and how efficiently these can be combined.

Step-by-Step Process (AND Intersection):

Index Scan 1: Scan the first index, collecting row identifiers (RIDs) matching the predicate
Index Scan 2: Scan the second index, collecting RIDs matching its predicate
Intersection: Compute the set intersection of the two RID sets
Table Access: Fetch only the rows whose RIDs appear in the intersection

The key efficiency comes from Step 3—if both indexes are selective, the intersection is small, and Step 4 fetches very few rows.

Intersection Algorithms:

Databases use various algorithms to compute set intersection:

Algorithm	Time Complexity	Best When
Sort-Merge	O(n log n + m log m)	RID sets are unsorted or similar size
Hash Intersection	O(n + m)	Memory available, one set much smaller
Bitmap AND	O(n + m)	Bitmap indexes or RID bitmaps used
Zigzag Merge	O(min(n, m))	RIDs already sorted, sets overlap significantly

intersection_algorithms.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Sort-Merge Intersection
function sortMergeIntersection(rids1, rids2):
    sort(rids1)  // O(n log n)
    sort(rids2)  // O(m log m)
    result = []
    i = 0, j = 0
    while i < len(rids1) and j < len(rids2):
        if rids1[i] == rids2[j]:
            result.append(rids1[i])
            i++, j++
        else if rids1[i] < rids2[j]:
            i++
        else:
            j++
    return result
 
// Hash Intersection
function hashIntersection(rids1, rids2):
    // Build hash set from smaller set
    smaller = (len(rids1) < len(rids2)) ? rids1 : rids2
    larger = (len(rids1) < len(rids2)) ? rids2 : rids1
    
    hashSet = new HashSet()
    for rid in smaller:
        hashSet.add(rid)     // O(n) for smaller set
    
    result = []
    for rid in larger:
        if hashSet.contains(rid):  // O(1) lookup
            result.append(rid)
    return result
 
// Zigzag Merge (already sorted, seek-based)
function zigzagMerge(index1, index2, pred1, pred2):
    cursor1 = index1.seek(pred1)  // Position at first match
    cursor2 = index2.seek(pred2)
    result = []
    
    while cursor1.valid() and cursor2.valid():
        rid1 = cursor1.rid()
        rid2 = cursor2.rid()
        
        if rid1 == rid2:
            result.append(rid1)
            cursor1.next()
            cursor2.next()
        else if rid1 < rid2:
            cursor1.seek(rid2)  // Skip ahead to rid2
        else:
            cursor2.seek(rid1)  // Skip ahead to rid1
    
    return result

Sorted Index Advantage

When indexes store RIDs in sorted order (common for clustering keys or RID-based organization), zigzag merge becomes highly efficient. The algorithm can skip large portions of each index, making intersection much faster than visiting every entry.

Index Union for OR Queries

Index union handles OR predicates, which are traditionally problematic for index usage. Without index union, OR conditions often force table scans because no single index can satisfy both branches.

Index Union Process:

Index Scan per OR branch: Each OR branch uses its best-fit index
Union RID sets: Combine all RID sets (with duplicate removal)
Table Access: Fetch rows for all RIDs in the union

Terminology Across Databases:

Database	Feature Name
MySQL	Index Merge (type: union, intersection, sort_union)
PostgreSQL	BitmapOr, BitmapAnd
SQL Server	Index Intersection, Index Union
Oracle	INDEX COMBINE, BITMAP CONVERSION

index_union_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Separate indexes
CREATE INDEX idx_status ON orders (status);
CREATE INDEX idx_priority ON orders (priority);
 
-- OR query that benefits from index union:
SELECT * FROM orders
WHERE status = 'pending' OR priority = 'high';
 
-- MySQL EXPLAIN output showing index merge:
-- Extra: Using union(idx_status,idx_priority); Using where
 
-- PostgreSQL equivalent using bitmap operations:
-- Bitmap Heap Scan on orders
--   Recheck Cond: ((status = 'pending'::text) OR (priority = 'high'::text))
--   ->  BitmapOr
--         ->  Bitmap Index Scan on idx_status
--               Index Cond: (status = 'pending'::text)
--         ->  Bitmap Index Scan on idx_priority
--               Index Cond: (priority = 'high'::text)
 
 
-- Complex OR with AND combinations:
SELECT * FROM orders
WHERE (customer_id = 100 AND status = 'pending')
   OR (priority = 'critical');
 
-- This may use:
-- - Composite index on (customer_id, status) for first branch
-- - Single index on priority for second branch
-- - Union of results

OR Query Performance

Index union for OR queries is generally less efficient than AND intersection because: (1) union produces more rows, not fewer; (2) deduplication adds overhead; (3) if any OR branch is non-selective, the union includes many rows. For high-performance OR queries, consider UNION ALL with separate queries.

or_rewrite_optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Original OR query
SELECT * FROM orders
WHERE status = 'pending' OR priority = 'high';
 
-- Manual UNION ALL rewrite (sometimes more efficient):
SELECT * FROM orders WHERE status = 'pending'
UNION
SELECT * FROM orders WHERE priority = 'high';
 
-- If results don't overlap (mutually exclusive conditions):
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE priority = 'high' AND status != 'pending';
 
-- The rewrite can use optimal indexes for each branch independently
-- and avoids the index merge overhead in some optimizer implementations

When Does the Optimizer Choose Intersection?

The query optimizer uses cost-based analysis to decide whether index intersection is beneficial. Intersection is not always the best choice—it adds overhead that must be offset by reduced row access.

Cost Factors the Optimizer Considers:

Intersection Cost Components

•Index Scan Costs — Reading entries from each index (I/O and CPU)
•Intersection Computation — Memory and CPU for set intersection algorithm
•Result Set Size — Fewer rows after intersection = more benefit
•Table Access Pattern — Random I/O for fetching rows by RID (expensive)
•Alternative Costs — Full scan, single index scan + filter, or composite index options

Intersection Decision Matrix
Scenario	Likely Decision	Reason
Both indexes highly selective	✅ Use intersection	Intersection is small, few random I/Os
One index highly selective	❓ Maybe single index	More selective index + filter may be cheaper
Both indexes non-selective	❌ Full table scan	Intersection still large, random I/O expensive
Composite index exists	✅ Use composite	One index covers both predicates efficiently
Covering index available	✅ Use covering	No table access needed at all

intersection_cost_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Analyze intersection decision with EXPLAIN
 
-- MySQL: Check for "Using intersect" or "Using union"
EXPLAIN SELECT * FROM products
WHERE category_id = 50 AND price < 100;
 
-- PostgreSQL: Look for BitmapAnd or BitmapOr nodes
EXPLAIN (ANALYZE, COSTS)
SELECT * FROM products
WHERE category_id = 50 AND price < 100;
 
-- Example output showing intersection choice:
/*
Bitmap Heap Scan on products  (cost=24.50..120.75 rows=50 ...)
  Recheck Cond: ((category_id = 50) AND (price < 100))
  ->  BitmapAnd
        ->  Bitmap Index Scan on idx_category  (cost=0.00..5.20 rows=200 ...)
              Index Cond: (category_id = 50)
        ->  Bitmap Index Scan on idx_price  (cost=0.00..18.80 rows=500 ...)
              Index Cond: (price < 100)
*/
 
-- Cost breakdown:
-- idx_category: 200 matching rows
-- idx_price: 500 matching rows
-- After intersection: ~50 rows (estimate)
-- Without intersection (using just idx_category): 200 rows to fetch
 
 
-- Force optimizer to use or avoid intersection (PostgreSQL):
SET enable_bitmapscan = OFF;  -- Disable bitmap/intersection
EXPLAIN SELECT * FROM products WHERE category_id = 50 AND price < 100;
SET enable_bitmapscan = ON;   -- Re-enable

The Selectivity Product Rule

Index intersection is typically beneficial when selectivity(Index1) × selectivity(Index2) < selectivity(either index alone). If multiplying the selectivities produces a significantly smaller result set, intersection pays off.

Intersection vs Composite Indexes

A natural question arises: if index intersection can combine multiple single-column indexes, why bother with composite indexes at all?

The answer lies in efficiency trade-offs:

Composite Index Advantages

•Single index scan — One B+tree traversal vs multiple
•No intersection overhead — No memory for RID sets, no merge operation
•ORDER BY support — Can satisfy multi-column sorts
•Covering potential — Can include all needed columns
•Predictable performance — No optimizer decision variance

Index Intersection Advantages

•Flexibility — Separate indexes serve diverse query patterns
•Lower storage — Two single-column indexes may be smaller
•Lower maintenance — Updates touch smaller index structures
•Ad-hoc query support — Unknown query patterns handled dynamically
•OR query capability — Union operation handles OR predicates

composite_vs_intersection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Scenario: orders table with queries on customer_id and order_date
 
-- Option A: Two single-column indexes
CREATE INDEX idx_customer ON orders (customer_id);
CREATE INDEX idx_date ON orders (order_date);
-- Uses intersection for: WHERE customer_id = ? AND order_date = ?
 
-- Option B: One composite index
CREATE INDEX idx_customer_date ON orders (customer_id, order_date);
-- Directly navigates to matching entries
 
-- Performance comparison for: 
-- SELECT * FROM orders WHERE customer_id = 100 AND order_date = '2023-01-15'
 
-- Option A (Intersection):
-- 1. Scan idx_customer: ~100 RIDs (1ms)
-- 2. Scan idx_date: ~1000 RIDs (3ms)
-- 3. Intersect: ~5 RIDs (0.1ms)
-- 4. Random I/O: 5 row accesses (0.5ms)
-- Total: ~4.6ms
 
-- Option B (Composite):
-- 1. Seek idx_customer_date: ~5 entries (0.3ms)
-- 2. Sequential I/O or direct pointer (0.2ms)
-- Total: ~0.5ms
 
-- Composite index is ~9x faster for this query pattern!
 
 
-- When intersection wins:
-- Diverse query patterns where composite prefix doesn't help
 
-- Query 1: WHERE customer_id = ?        -- Uses idx_customer ✅
-- Query 2: WHERE order_date = ?         -- Uses idx_date ✅  
-- Query 3: WHERE customer_id = ? AND order_date = ?  -- Uses intersection
 
-- With composite (customer_id, order_date):
-- Query 1: ✅ Uses composite prefix
-- Query 2: ❌ Cannot use (wrong prefix)
-- Query 3: ✅ Uses composite fully
 
-- Separate indexes provide more flexibility for Query 2

Design Guideline

For frequently-executed queries with known patterns, composite indexes are almost always superior. Reserve index intersection for flexibility with ad-hoc queries or when storage constraints prevent creating all needed composite indexes.

Bitmap Index Intersection

Bitmap indexes are particularly suited for intersection operations because bitmaps can be combined with simple bitwise AND/OR operations—extremely fast on modern CPUs.

How Bitmap Intersection Works:

Each index value maps to a bitmap where bit position N is 1 if row N contains that value:

value 'A': [1,0,1,0,0,1,0,1,0,0]  // Rows 0,2,5,7
value 'B': [0,1,1,0,1,0,0,1,0,0]  // Rows 1,2,4,7

AND operation:
          [0,0,1,0,0,0,0,1,0,0]  // Rows 2,7 (intersection)

OR operation:
          [1,1,1,0,1,1,0,1,0,0]  // Rows 0,1,2,4,5,7 (union)

Bitmap vs B-tree Index Intersection
Aspect	Bitmap Intersection	B-tree Intersection
Operation	Bitwise AND/OR (hardware optimized)	Sort-merge or hash join
Speed	Extremely fast	Moderate
Memory	Constant (bitmap size)	Proportional to RID count
Cardinality requirement	Low cardinality columns	Any cardinality
Available in	Oracle, PostgreSQL (runtime bitmaps)	All major databases
Update cost	High (full bitmap rebuild)	Incremental updates

bitmap_intersection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- PostgreSQL uses runtime bitmaps for B-tree index intersection:
 
EXPLAIN (ANALYZE) SELECT * FROM products
WHERE category = 'Electronics' AND brand = 'Acme' AND price < 500;
 
/*
Bitmap Heap Scan on products 
  Recheck Cond: ((category = 'Electronics') AND (brand = 'Acme') AND (price < 500))
  ->  BitmapAnd
        ->  Bitmap Index Scan on idx_category
              Index Cond: (category = 'Electronics')
        ->  Bitmap Index Scan on idx_brand  
              Index Cond: (brand = 'Acme')
        ->  Bitmap Index Scan on idx_price
              Index Cond: (price < 500)
*/
 
-- Each B-tree index scan creates a runtime bitmap
-- Bitmaps are ANDed together
-- Result bitmap used for heap access
 
-- Oracle with actual bitmap indexes:
CREATE BITMAP INDEX bmp_category ON products (category);
CREATE BITMAP INDEX bmp_brand ON products (brand);
CREATE BITMAP INDEX bmp_region ON products (region);
 
-- Complex AND/OR query leverages bitmap operations natively:
SELECT * FROM products
WHERE (category = 'Electronics' OR category = 'Appliances')
  AND brand = 'Acme'
  AND region IN ('North', 'South');
 
-- Oracle execution: BITMAP AND + BITMAP OR operations

Runtime Bitmaps in PostgreSQL

PostgreSQL doesn't have dedicated bitmap indexes, but creates runtime bitmaps during query execution from regular B-tree indexes. This provides bitmap intersection benefits without the update overhead of maintaining persistent bitmap structures.

Practical Intersection Strategies

When should you rely on intersection versus creating composite indexes? Here are practical guidelines:

When to Rely on Index Intersection

•Ad-hoc query environments — Data warehouses or analytics systems where query patterns are unpredictable. Creating composite indexes for every combination is impractical.
•OR predicates — Queries with OR conditions across different columns. Composite indexes cannot help; intersection (union) is the only mechanism.
•Storage-constrained systems — When disk space is limited and multiple single-column indexes are smaller than multiple composite indexes.
•Write-heavy workloads — Maintaining fewer, smaller indexes reduces write amplification.
•Transitional periods — During performance tuning before optimal composite indexes are identified.

When to Avoid Index Intersection

•High-frequency, known query patterns — Create targeted composite indexes for queries that run thousands of times per minute.
•Latency-sensitive applications — Intersection adds overhead; composite indexes provide more predictable performance.
•OLTP systems — Transaction processing benefits from optimized, specific indexes rather than flexible, general ones.
•When covering indexes are possible — Covering indexes eliminate table access entirely, which intersection cannot do.
•ORDER BY requirements — Intersection cannot provide sorted output; composite indexes can.

intersection_strategy_checklist.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Decision checklist for intersection vs composite:
 
-- 1. Identify the query pattern
--    Query: WHERE A = ? AND B = ?
 
-- 2. Check query frequency
SHOW STATUS LIKE 'Questions';  -- MySQL
-- If this query runs > 100/sec, favor composite index
 
-- 3. Analyze column selectivity
SELECT COUNT(DISTINCT A) / COUNT(*) AS selectivity_A,
       COUNT(DISTINCT B) / COUNT(*) AS selectivity_B
FROM table;
-- If both selective (< 5%), intersection may work well
 
-- 4. Check if OR predicates are involved
-- Query: WHERE A = ? OR B = ?
-- Composite index CAN'T help; need separate indexes + union
 
-- 5. Consider covering index opportunity
-- Query: SELECT C, D FROM t WHERE A = ? AND B = ?
CREATE INDEX idx_covering ON t (A, B, C, D);  -- Best solution
 
-- 6. Test both approaches
EXPLAIN ANALYZE SELECT ...;  -- With current indexes
-- Create composite, test again
CREATE INDEX idx_composite ON t (A, B);
EXPLAIN ANALYZE SELECT ...;  -- Compare execution times

Summary: Index Intersection Mastery

Index intersection provides flexibility in query processing by combining multiple indexes. Here are the key concepts:

Key Takeaways

•Index intersection combines results from multiple indexes using set operations (AND → intersection, OR → union).
•Highly selective indexes are required for intersection to be cost-effective. The combined selectivity must significantly reduce row access.
•Composite indexes are usually faster for known, frequent query patterns. They avoid intersection overhead and can support ORDER BY and covering.
•Intersection excels for flexibility — ad-hoc queries, OR predicates, and environments where predicting query patterns is difficult.
•Bitmap operations accelerate intersection — either through native bitmap indexes or runtime bitmaps from B-tree scans.
•The optimizer decides — cost-based analysis determines when intersection beats single-index scans or table scans.
•Monitor and tune — Use EXPLAIN to verify the optimizer's intersection decisions and adjust index strategy accordingly.

What's Next:

The final page of this module covers design guidelines—practical principles and best practices for designing multi-column indexes in real-world systems, synthesizing everything you've learned about composite indexes, column ordering, prefix matching, and intersection.

Page Complete

You now understand how index intersection works, when the optimizer chooses it, and how it compares to composite indexes. This knowledge helps you design index strategies that balance flexibility with performance for diverse query workloads.

4 / 5

Loading learning content...

Database Management SystemsMulti-Column Indexes

Multi-Column Indexes

LevelIntermediate

Duration60 mins

TopicMulti-Column Indexes

4 / 5

Index Intersection

When One Index Isn't Enough

This page explores how index intersection works, when it's beneficial, its limitations, and how it interacts with composite index design decisions.

What You Will Learn

What is Index Intersection?

The Two Main Types:

Index Intersection (AND) — Combines results from multiple indexes using set intersection. A row must appear in ALL index result sets to be included. Used for queries with multiple AND conditions.
Index Union (OR) — Combines results from multiple indexes using set union. A row appearing in ANY index result set is included. Used for queries with OR conditions.

Converting Mermaid diagram...

index_intersection_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Table: products (with separate single-column indexes)
CREATE INDEX idx_category ON products (category_id);
CREATE INDEX idx_price ON products (price);
 
-- Query that benefits from index intersection:
SELECT * FROM products
WHERE category_id = 50 AND price < 100.00;
 
-- Without index intersection:
-- Option A: Use idx_category, then filter price → scans many rows
-- Option B: Use idx_price, then filter category → scans many rows
-- Option C: Full table scan → worst case
 
-- With index intersection:
-- Step 1: idx_category finds row IDs where category_id = 50
-- Step 2: idx_price finds row IDs where price < 100
-- Step 3: Intersect these row ID sets
-- Step 4: Fetch only the matching rows from the table
 
-- The intersection is efficient when:
-- - Each index is selective (few rows each)
-- - The intersection is significantly smaller than either set

How Index Intersection Works

The mechanics of index intersection depend on how indexes store row identifiers and how efficiently these can be combined.

Step-by-Step Process (AND Intersection):

Index Scan 1: Scan the first index, collecting row identifiers (RIDs) matching the predicate
Index Scan 2: Scan the second index, collecting RIDs matching its predicate
Intersection: Compute the set intersection of the two RID sets
Table Access: Fetch only the rows whose RIDs appear in the intersection

The key efficiency comes from Step 3—if both indexes are selective, the intersection is small, and Step 4 fetches very few rows.

Intersection Algorithms:

Databases use various algorithms to compute set intersection:

Algorithm	Time Complexity	Best When
Sort-Merge	O(n log n + m log m)	RID sets are unsorted or similar size
Hash Intersection	O(n + m)	Memory available, one set much smaller
Bitmap AND	O(n + m)	Bitmap indexes or RID bitmaps used
Zigzag Merge	O(min(n, m))	RIDs already sorted, sets overlap significantly

intersection_algorithms.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Sort-Merge Intersection
function sortMergeIntersection(rids1, rids2):
    sort(rids1)  // O(n log n)
    sort(rids2)  // O(m log m)
    result = []
    i = 0, j = 0
    while i < len(rids1) and j < len(rids2):
        if rids1[i] == rids2[j]:
            result.append(rids1[i])
            i++, j++
        else if rids1[i] < rids2[j]:
            i++
        else:
            j++
    return result
 
// Hash Intersection
function hashIntersection(rids1, rids2):
    // Build hash set from smaller set
    smaller = (len(rids1) < len(rids2)) ? rids1 : rids2
    larger = (len(rids1) < len(rids2)) ? rids2 : rids1
    
    hashSet = new HashSet()
    for rid in smaller:
        hashSet.add(rid)     // O(n) for smaller set
    
    result = []
    for rid in larger:
        if hashSet.contains(rid):  // O(1) lookup
            result.append(rid)
    return result
 
// Zigzag Merge (already sorted, seek-based)
function zigzagMerge(index1, index2, pred1, pred2):
    cursor1 = index1.seek(pred1)  // Position at first match
    cursor2 = index2.seek(pred2)
    result = []
    
    while cursor1.valid() and cursor2.valid():
        rid1 = cursor1.rid()
        rid2 = cursor2.rid()
        
        if rid1 == rid2:
            result.append(rid1)
            cursor1.next()
            cursor2.next()
        else if rid1 < rid2:
            cursor1.seek(rid2)  // Skip ahead to rid2
        else:
            cursor2.seek(rid1)  // Skip ahead to rid1
    
    return result

Sorted Index Advantage

Index Union for OR Queries

Index union handles OR predicates, which are traditionally problematic for index usage. Without index union, OR conditions often force table scans because no single index can satisfy both branches.

Index Union Process:

Index Scan per OR branch: Each OR branch uses its best-fit index
Union RID sets: Combine all RID sets (with duplicate removal)
Table Access: Fetch rows for all RIDs in the union

Terminology Across Databases:

Database	Feature Name
MySQL	Index Merge (type: union, intersection, sort_union)
PostgreSQL	BitmapOr, BitmapAnd
SQL Server	Index Intersection, Index Union
Oracle	INDEX COMBINE, BITMAP CONVERSION

index_union_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Separate indexes
CREATE INDEX idx_status ON orders (status);
CREATE INDEX idx_priority ON orders (priority);
 
-- OR query that benefits from index union:
SELECT * FROM orders
WHERE status = 'pending' OR priority = 'high';
 
-- MySQL EXPLAIN output showing index merge:
-- Extra: Using union(idx_status,idx_priority); Using where
 
-- PostgreSQL equivalent using bitmap operations:
-- Bitmap Heap Scan on orders
--   Recheck Cond: ((status = 'pending'::text) OR (priority = 'high'::text))
--   ->  BitmapOr
--         ->  Bitmap Index Scan on idx_status
--               Index Cond: (status = 'pending'::text)
--         ->  Bitmap Index Scan on idx_priority
--               Index Cond: (priority = 'high'::text)
 
 
-- Complex OR with AND combinations:
SELECT * FROM orders
WHERE (customer_id = 100 AND status = 'pending')
   OR (priority = 'critical');
 
-- This may use:
-- - Composite index on (customer_id, status) for first branch
-- - Single index on priority for second branch
-- - Union of results

OR Query Performance

or_rewrite_optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Original OR query
SELECT * FROM orders
WHERE status = 'pending' OR priority = 'high';
 
-- Manual UNION ALL rewrite (sometimes more efficient):
SELECT * FROM orders WHERE status = 'pending'
UNION
SELECT * FROM orders WHERE priority = 'high';
 
-- If results don't overlap (mutually exclusive conditions):
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE priority = 'high' AND status != 'pending';
 
-- The rewrite can use optimal indexes for each branch independently
-- and avoids the index merge overhead in some optimizer implementations

When Does the Optimizer Choose Intersection?

Cost Factors the Optimizer Considers:

Intersection Cost Components

•Index Scan Costs — Reading entries from each index (I/O and CPU)
•Intersection Computation — Memory and CPU for set intersection algorithm
•Result Set Size — Fewer rows after intersection = more benefit
•Table Access Pattern — Random I/O for fetching rows by RID (expensive)
•Alternative Costs — Full scan, single index scan + filter, or composite index options

Intersection Decision Matrix
Scenario	Likely Decision	Reason
Both indexes highly selective	✅ Use intersection	Intersection is small, few random I/Os
One index highly selective	❓ Maybe single index	More selective index + filter may be cheaper
Both indexes non-selective	❌ Full table scan	Intersection still large, random I/O expensive
Composite index exists	✅ Use composite	One index covers both predicates efficiently
Covering index available	✅ Use covering	No table access needed at all

intersection_cost_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Analyze intersection decision with EXPLAIN
 
-- MySQL: Check for "Using intersect" or "Using union"
EXPLAIN SELECT * FROM products
WHERE category_id = 50 AND price < 100;
 
-- PostgreSQL: Look for BitmapAnd or BitmapOr nodes
EXPLAIN (ANALYZE, COSTS)
SELECT * FROM products
WHERE category_id = 50 AND price < 100;
 
-- Example output showing intersection choice:
/*
Bitmap Heap Scan on products  (cost=24.50..120.75 rows=50 ...)
  Recheck Cond: ((category_id = 50) AND (price < 100))
  ->  BitmapAnd
        ->  Bitmap Index Scan on idx_category  (cost=0.00..5.20 rows=200 ...)
              Index Cond: (category_id = 50)
        ->  Bitmap Index Scan on idx_price  (cost=0.00..18.80 rows=500 ...)
              Index Cond: (price < 100)
*/
 
-- Cost breakdown:
-- idx_category: 200 matching rows
-- idx_price: 500 matching rows
-- After intersection: ~50 rows (estimate)
-- Without intersection (using just idx_category): 200 rows to fetch
 
 
-- Force optimizer to use or avoid intersection (PostgreSQL):
SET enable_bitmapscan = OFF;  -- Disable bitmap/intersection
EXPLAIN SELECT * FROM products WHERE category_id = 50 AND price < 100;
SET enable_bitmapscan = ON;   -- Re-enable

The Selectivity Product Rule

Intersection vs Composite Indexes

A natural question arises: if index intersection can combine multiple single-column indexes, why bother with composite indexes at all?

The answer lies in efficiency trade-offs:

Composite Index Advantages

•Single index scan — One B+tree traversal vs multiple
•No intersection overhead — No memory for RID sets, no merge operation
•ORDER BY support — Can satisfy multi-column sorts
•Covering potential — Can include all needed columns
•Predictable performance — No optimizer decision variance

Index Intersection Advantages

•Flexibility — Separate indexes serve diverse query patterns
•Lower storage — Two single-column indexes may be smaller
•Lower maintenance — Updates touch smaller index structures
•Ad-hoc query support — Unknown query patterns handled dynamically
•OR query capability — Union operation handles OR predicates

composite_vs_intersection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Scenario: orders table with queries on customer_id and order_date
 
-- Option A: Two single-column indexes
CREATE INDEX idx_customer ON orders (customer_id);
CREATE INDEX idx_date ON orders (order_date);
-- Uses intersection for: WHERE customer_id = ? AND order_date = ?
 
-- Option B: One composite index
CREATE INDEX idx_customer_date ON orders (customer_id, order_date);
-- Directly navigates to matching entries
 
-- Performance comparison for: 
-- SELECT * FROM orders WHERE customer_id = 100 AND order_date = '2023-01-15'
 
-- Option A (Intersection):
-- 1. Scan idx_customer: ~100 RIDs (1ms)
-- 2. Scan idx_date: ~1000 RIDs (3ms)
-- 3. Intersect: ~5 RIDs (0.1ms)
-- 4. Random I/O: 5 row accesses (0.5ms)
-- Total: ~4.6ms
 
-- Option B (Composite):
-- 1. Seek idx_customer_date: ~5 entries (0.3ms)
-- 2. Sequential I/O or direct pointer (0.2ms)
-- Total: ~0.5ms
 
-- Composite index is ~9x faster for this query pattern!
 
 
-- When intersection wins:
-- Diverse query patterns where composite prefix doesn't help
 
-- Query 1: WHERE customer_id = ?        -- Uses idx_customer ✅
-- Query 2: WHERE order_date = ?         -- Uses idx_date ✅  
-- Query 3: WHERE customer_id = ? AND order_date = ?  -- Uses intersection
 
-- With composite (customer_id, order_date):
-- Query 1: ✅ Uses composite prefix
-- Query 2: ❌ Cannot use (wrong prefix)
-- Query 3: ✅ Uses composite fully
 
-- Separate indexes provide more flexibility for Query 2

Design Guideline

Bitmap Index Intersection

Bitmap indexes are particularly suited for intersection operations because bitmaps can be combined with simple bitwise AND/OR operations—extremely fast on modern CPUs.

How Bitmap Intersection Works:

Each index value maps to a bitmap where bit position N is 1 if row N contains that value:

value 'A': [1,0,1,0,0,1,0,1,0,0]  // Rows 0,2,5,7
value 'B': [0,1,1,0,1,0,0,1,0,0]  // Rows 1,2,4,7

AND operation:
          [0,0,1,0,0,0,0,1,0,0]  // Rows 2,7 (intersection)

OR operation:
          [1,1,1,0,1,1,0,1,0,0]  // Rows 0,1,2,4,5,7 (union)

Bitmap vs B-tree Index Intersection
Aspect	Bitmap Intersection	B-tree Intersection
Operation	Bitwise AND/OR (hardware optimized)	Sort-merge or hash join
Speed	Extremely fast	Moderate
Memory	Constant (bitmap size)	Proportional to RID count
Cardinality requirement	Low cardinality columns	Any cardinality
Available in	Oracle, PostgreSQL (runtime bitmaps)	All major databases
Update cost	High (full bitmap rebuild)	Incremental updates

bitmap_intersection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- PostgreSQL uses runtime bitmaps for B-tree index intersection:
 
EXPLAIN (ANALYZE) SELECT * FROM products
WHERE category = 'Electronics' AND brand = 'Acme' AND price < 500;
 
/*
Bitmap Heap Scan on products 
  Recheck Cond: ((category = 'Electronics') AND (brand = 'Acme') AND (price < 500))
  ->  BitmapAnd
        ->  Bitmap Index Scan on idx_category
              Index Cond: (category = 'Electronics')
        ->  Bitmap Index Scan on idx_brand  
              Index Cond: (brand = 'Acme')
        ->  Bitmap Index Scan on idx_price
              Index Cond: (price < 500)
*/
 
-- Each B-tree index scan creates a runtime bitmap
-- Bitmaps are ANDed together
-- Result bitmap used for heap access
 
-- Oracle with actual bitmap indexes:
CREATE BITMAP INDEX bmp_category ON products (category);
CREATE BITMAP INDEX bmp_brand ON products (brand);
CREATE BITMAP INDEX bmp_region ON products (region);
 
-- Complex AND/OR query leverages bitmap operations natively:
SELECT * FROM products
WHERE (category = 'Electronics' OR category = 'Appliances')
  AND brand = 'Acme'
  AND region IN ('North', 'South');
 
-- Oracle execution: BITMAP AND + BITMAP OR operations

Runtime Bitmaps in PostgreSQL

Practical Intersection Strategies

When should you rely on intersection versus creating composite indexes? Here are practical guidelines:

When to Rely on Index Intersection

•Ad-hoc query environments — Data warehouses or analytics systems where query patterns are unpredictable. Creating composite indexes for every combination is impractical.
•OR predicates — Queries with OR conditions across different columns. Composite indexes cannot help; intersection (union) is the only mechanism.
•Storage-constrained systems — When disk space is limited and multiple single-column indexes are smaller than multiple composite indexes.
•Write-heavy workloads — Maintaining fewer, smaller indexes reduces write amplification.
•Transitional periods — During performance tuning before optimal composite indexes are identified.

When to Avoid Index Intersection

•High-frequency, known query patterns — Create targeted composite indexes for queries that run thousands of times per minute.
•Latency-sensitive applications — Intersection adds overhead; composite indexes provide more predictable performance.
•OLTP systems — Transaction processing benefits from optimized, specific indexes rather than flexible, general ones.
•When covering indexes are possible — Covering indexes eliminate table access entirely, which intersection cannot do.
•ORDER BY requirements — Intersection cannot provide sorted output; composite indexes can.

intersection_strategy_checklist.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Decision checklist for intersection vs composite:
 
-- 1. Identify the query pattern
--    Query: WHERE A = ? AND B = ?
 
-- 2. Check query frequency
SHOW STATUS LIKE 'Questions';  -- MySQL
-- If this query runs > 100/sec, favor composite index
 
-- 3. Analyze column selectivity
SELECT COUNT(DISTINCT A) / COUNT(*) AS selectivity_A,
       COUNT(DISTINCT B) / COUNT(*) AS selectivity_B
FROM table;
-- If both selective (< 5%), intersection may work well
 
-- 4. Check if OR predicates are involved
-- Query: WHERE A = ? OR B = ?
-- Composite index CAN'T help; need separate indexes + union
 
-- 5. Consider covering index opportunity
-- Query: SELECT C, D FROM t WHERE A = ? AND B = ?
CREATE INDEX idx_covering ON t (A, B, C, D);  -- Best solution
 
-- 6. Test both approaches
EXPLAIN ANALYZE SELECT ...;  -- With current indexes
-- Create composite, test again
CREATE INDEX idx_composite ON t (A, B);
EXPLAIN ANALYZE SELECT ...;  -- Compare execution times

Summary: Index Intersection Mastery

Index intersection provides flexibility in query processing by combining multiple indexes. Here are the key concepts:

Key Takeaways

•Index intersection combines results from multiple indexes using set operations (AND → intersection, OR → union).
•Highly selective indexes are required for intersection to be cost-effective. The combined selectivity must significantly reduce row access.
•Composite indexes are usually faster for known, frequent query patterns. They avoid intersection overhead and can support ORDER BY and covering.
•Intersection excels for flexibility — ad-hoc queries, OR predicates, and environments where predicting query patterns is difficult.
•Bitmap operations accelerate intersection — either through native bitmap indexes or runtime bitmaps from B-tree scans.
•The optimizer decides — cost-based analysis determines when intersection beats single-index scans or table scans.
•Monitor and tune — Use EXPLAIN to verify the optimizer's intersection decisions and adjust index strategy accordingly.

What's Next:

Page Complete

4 / 5