Loading learning content...
We've seen that table scans offer efficient sequential I/O but examine all rows, while index scans provide selectivity but suffer from random I/O overhead. What if we could combine the benefits of both approaches?
Bitmap scans do exactly this. They use indexes to identify which rows match query conditions, but instead of immediately fetching each row, they build an in-memory bitmap indicating which pages (or rows) contain matches. Then, they scan those pages in physical order, converting random I/O into sequential I/O.
Even more powerfully, bitmaps can be combined with logical operations (AND, OR), enabling efficient execution of queries that would otherwise require multiple separate index scans.
By the end of this page, you will understand how bitmap scans work, when they outperform both table scans and index scans, how bitmaps enable combining multiple indexes for complex queries, and the internal mechanics of bitmap creation and execution.
A bitmap scan operates in three distinct phases:
Phase 1: Bitmap Index Scan
First, the database scans the index(es) to identify matching index entries. Instead of immediately fetching rows, it builds an in-memory bitmap where each bit corresponds to a table page (or row, depending on implementation):
Phase 2: Bitmap Combination (Optional)
If multiple conditions exist, multiple bitmaps can be created and combined:
Phase 3: Bitmap Heap Scan
Finally, the database reads table pages in physical order, but only pages where the bitmap bit is set. All rows on each fetched page are checked against the original predicate (recheck condition), and matching rows are returned.
123456789101112131415161718192021222324252627282930313233
function BitmapScan(table, indexes, predicates): // Phase 1: Build bitmap from index scans combined_bitmap = None for each (index, predicate) in (indexes, predicates): bitmap = new Bitmap(table.page_count) // One bit per page // Scan index and set bits for matching pages for each entry in index.scan(predicate): page_id = entry.rid.page_id bitmap.set(page_id, 1) // Phase 2: Combine with previous bitmap if combined_bitmap is None: combined_bitmap = bitmap else if predicate_is_AND: combined_bitmap = combined_bitmap AND bitmap else: // OR combined_bitmap = combined_bitmap OR bitmap // Phase 3: Scan table pages in order, only where bitmap is set results = [] for page_id in 0 to table.page_count: if combined_bitmap.get(page_id) == 1: page = buffer_pool.fetch(page_id) for each row in page.rows: // Recheck: page-level bitmap may include non-matching rows if full_predicate_matches(row, predicates): results.append(row) return resultsThe bitmap typically tracks pages, not individual rows. When a page contains both matching and non-matching rows, we must 'recheck' each row after fetching the page. The recheck is visible in PostgreSQL EXPLAIN output as 'Recheck Cond'. Some implementations use row-level bitmaps to avoid recheck, at the cost of more memory.
Bitmap scans occupy a sweet spot between index scans and table scans. They're optimal in several scenarios:
Scenario 1: Medium Selectivity Queries
When selectivity is too high for pure index scan (random I/O overhead too great) but too low for table scan (too many wasted page reads), bitmap scans shine:
| Selectivity | Matching Rows | Best Access Method | Reason |
|---|---|---|---|
| < 1% | Few rows | Index Scan | Low random I/O count |
| 1% - 20% | Moderate rows | Bitmap Scan | Sequential page access with selectivity |
20% | Many rows | Table Scan | Nearly all pages needed anyway |
Scenario 2: Multiple Conditions on Different Columns
When a query has multiple conditions and separate indexes exist for each, bitmap scans can combine them:
12345678910111213141516171819202122232425
-- Indexes: idx_products_category (category_id)-- idx_products_price (price) -- Query with two conditions on different indexed columnsSELECT * FROM products WHERE category_id = 5 AND price < 100; -- Without bitmap: Must choose ONE index, then filter-- Option A: Use category index, filter by price → may read many prices > 100-- Option B: Use price index, filter by category → may read many wrong categories -- With bitmap:EXPLAIN SELECT * FROM products WHERE category_id = 5 AND price < 100; /*Bitmap Heap Scan on products Recheck Cond: ((category_id = 5) AND (price < 100)) -> BitmapAnd -> Bitmap Index Scan on idx_products_category Index Cond: (category_id = 5) -> Bitmap Index Scan on idx_products_price Index Cond: (price < 100)*/ -- Both indexes are used! Only pages satisfying BOTH conditions are fetched.Scenario 3: IN Lists and OR Conditions
Bitmap scans efficiently handle multiple values by combining separate index lookups:
123456789101112131415161718192021222324
-- Query with IN listSELECT * FROM orders WHERE status IN ('pending', 'processing', 'review'); /*Bitmap Heap Scan on orders Recheck Cond: (status = ANY ('{pending,processing,review}')) -> Bitmap Index Scan on idx_orders_status Index Cond: (status = ANY ('{pending,processing,review}'))*/ -- Each status value produces a bitmap, combined with OR -- Similar for explicit OR:SELECT * FROM orders WHERE customer_id = 123 OR customer_id = 456; /*Bitmap Heap Scan on orders Recheck Cond: ((customer_id = 123) OR (customer_id = 456)) -> BitmapOr -> Bitmap Index Scan on idx_orders_customer Index Cond: (customer_id = 123) -> Bitmap Index Scan on idx_orders_customer Index Cond: (customer_id = 456)*/The key insight is that bitmap scans decouple 'finding matching entries' from 'fetching matching rows'. By collecting all matching page IDs first, then fetching in order, we convert random I/O into more sequential I/O, which is 10-50× faster on HDDs and still beneficial on SSDs.
Understanding the cost model for bitmap scans helps predict when they'll be chosen by the optimizer.
Component Costs:
Bitmap Index Scan Cost: Scanning index leaf pages to build the bitmap
C_bitmap_build = IndexPages × t_seq
Bitmap Combination Cost: CPU time to AND/OR bitmaps (very fast, typically negligible)
C_combine ≈ TablePages / 8 (number of bytes in bitmap)
Bitmap Heap Scan Cost: Reading table pages where bits are set
C_heap = PagesWithMatches × t_seq
Total Bitmap Scan Cost:
C_bitmap = C_bitmap_build + C_combine + C_heap
| Component | Index Scan | Bitmap Scan |
|---|---|---|
| Index traversal | 3 pages | 3 pages (to start) |
| Index leaf scan | 1,000 pages | 1,000 pages |
| Bitmap operation | N/A | ~1KB bitmap, negligible |
| Row fetches (random) | 100,000 × random | N/A |
| Page fetches (sequential) | N/A | ~8,000 pages × sequential |
| Total I/O cost (HDD) | 100,000 × 10ms = 1000 sec | 9,000 × 0.5ms = 4.5 sec |
The Crossover Point
The bitmap scan becomes preferable to index scan when the random I/O savings outweigh the extra pages read due to page-level (rather than row-level) granularity.
Factors favoring bitmap scan:
Factors favoring index scan:
Bitmap scans require memory to hold the bitmap. For very large tables, the bitmap itself can become large (1 bit per page × millions of pages = megabytes). If memory is limited, the bitmap may become 'lossy' (tracking page ranges instead of individual pages), requiring more recheck work.
The power of bitmap scans comes from the ability to combine multiple bitmaps using logical operations. This enables efficient processing of complex predicates.
BitmapAnd (Intersection)
For AND predicates, the resulting bitmap has 1 only where ALL source bitmaps have 1:
Result[page] = 1 iff Bitmap1[page] = 1 AND Bitmap2[page] = 1 AND ...
This operation is extremely efficient: bitwise AND on entire bytes, processing 8 pages at a time.
1234567891011121314151617181920212223242526
-- Query: Find electronics under $50 from verified sellersSELECT * FROM productsWHERE category = 'electronics' AND price < 50 AND seller_verified = true; -- Assume indexes on all three columns /*Execution Plan:Bitmap Heap Scan on products Recheck Cond: (...) -> BitmapAnd -> Bitmap Index Scan on idx_category Index Cond: (category = 'electronics') -- Bitmap 1: [0,1,1,0,1,1,0,0,1,0,...] (pages with electronics) -> Bitmap Index Scan on idx_price Index Cond: (price < 50) -- Bitmap 2: [1,1,0,1,0,1,0,1,1,0,...] (pages with low prices) -> Bitmap Index Scan on idx_seller_verified Index Cond: (seller_verified = true) -- Bitmap 3: [1,1,1,0,1,0,0,0,1,1,...] (pages with verified sellers) Result Bitmap (AND of all three): [0,1,0,0,0,0,0,0,1,0,...]-- Only pages 1 and 8 need to be fetched!*/BitmapOr (Union)
For OR predicates, the resulting bitmap has 1 where ANY source bitmap has 1:
Result[page] = 1 iff Bitmap1[page] = 1 OR Bitmap2[page] = 1 OR ...
12345678910111213141516171819
-- Query: Find products that are either on sale OR highly ratedSELECT * FROM productsWHERE on_sale = true OR rating >= 4.5; /*Execution Plan:Bitmap Heap Scan on products Recheck Cond: ((on_sale = true) OR (rating >= 4.5)) -> BitmapOr -> Bitmap Index Scan on idx_on_sale Index Cond: (on_sale = true) -- Bitmap 1: [0,1,0,0,1,0,0,0,1,0,...] (pages with sale items) -> Bitmap Index Scan on idx_rating Index Cond: (rating >= 4.5) -- Bitmap 2: [1,0,0,1,1,0,0,1,0,0,...] (pages with high ratings) Result Bitmap (OR of both): [1,1,0,1,1,0,0,1,1,0,...]-- Pages 0,1,3,4,7,8 need to be fetched*/Complex Predicate Transformation
The optimizer transforms complex boolean expressions into bitmap operation trees:
123456789101112131415
-- Complex querySELECT * FROM ordersWHERE (status = 'pending' AND priority = 'high') OR (status = 'processing' AND created_at > '2024-01-01'); /*Bitmap Operation Tree: BitmapOr / \ BitmapAnd BitmapAnd / \ / \ status priority status created_at ='pending' ='high' ='processing' >'2024-01-01'*/Without bitmap scans, complex predicates often can't fully utilize multiple indexes. Bitmap operations allow the database to leverage all available indexes, combining their selectivity. This is why sometimes adding a 'redundant' index on a single column can dramatically speed up queries with complex WHERE clauses.
For very large tables, maintaining an exact bitmap (one bit per page) can require substantial memory. PostgreSQL addresses this with lossy bitmaps that trade precision for memory efficiency.
Exact vs Lossy Bitmaps:
With lossy bitmaps, the recheck phase must filter out more non-matching rows because the granularity is coarser.
123456789101112131415161718192021222324252627
-- Check current work_mem settingSHOW work_mem; -- Default: 4MB -- For a table with 1 million pages, exact bitmap needs:-- 1,000,000 bits = 125 KB (easily fits in 4MB) -- For a table with 100 million pages:-- 100,000,000 bits = 12.5 MB (exceeds default work_mem)-- PostgreSQL would create a lossy bitmap -- Observe bitmap behavior in EXPLAINEXPLAIN (ANALYZE, BUFFERS) SELECT * FROM huge_table WHERE indexed_column < 1000; /*Bitmap Heap Scan on huge_table Recheck Cond: (indexed_column < 1000) Rows Removed by Index Recheck: 15234 -- <-- High number indicates lossy bitmap Heap Blocks: exact=12345 lossy=67890 -- <-- Shows exact vs lossy blocks -> Bitmap Index Scan on idx_huge Index Cond: (indexed_column < 1000)*/ -- Increase work_mem to enable exact bitmaps (for this session)SET work_mem = '128MB'; -- Re-run query - should show fewer lossy blocksLossy Bitmap Tradeoffs:
| Aspect | Exact Bitmap | Lossy Bitmap |
|---|---|---|
| Memory | Higher | Lower |
| Precision | Exact pages | Page ranges |
| Recheck overhead | Minimal | Higher (more false positives) |
| I/O efficiency | Optimal | Slightly worse |
| Suitable for | Normal tables | Very large tables |
When Lossy Bitmaps Cause Problems:
If you see high 'Rows Removed by Index Recheck' counts, the lossy bitmap is fetching many pages that don't actually contain matches. Consider:
work_mem to allow exact bitmapsIncreasing work_mem helps bitmap scans but also affects other operations (sorts, hash joins). Each query can use work_mem for each operation, so very high values can lead to memory pressure under concurrent load. Tune carefully based on your workload.
Different database systems implement bitmap-style operations with varying terminology and capabilities.
| Database | Feature Name | Index Combination | Notes |
|---|---|---|---|
| PostgreSQL | Bitmap Heap Scan / Bitmap Index Scan | BitmapAnd, BitmapOr | Full-featured; supports lossy bitmaps for memory management |
| Oracle | BITMAP CONVERSION, INDEX COMBINE | BITMAP AND, BITMAP OR | Automatic bitmap conversion from B-tree indexes; native bitmap indexes also available |
| SQL Server | No direct equivalent | Index Intersection (similar concept) | Uses RID-based intersection rather than page-level bitmaps |
| MySQL | Index Merge | Intersection, Union | Limited capability; often creates temporary table instead of true bitmap |
123456789101112131415161718192021222324252627282930313233343536
=== PostgreSQL ===-- Bitmap scan clearly shown in EXPLAINEXPLAIN SELECT * FROM orders WHERE status = 'pending' AND amount > 100;/*Bitmap Heap Scan on orders -> BitmapAnd -> Bitmap Index Scan on idx_status -> Bitmap Index Scan on idx_amount*/ === Oracle ===-- Oracle can convert B-tree indexes to bitmaps internallyEXPLAIN PLAN FORSELECT * FROM orders WHERE status = 'pending' AND amount > 100; -- Look for operations like:-- BITMAP CONVERSION FROM ROWIDS-- BITMAP AND-- TABLE ACCESS BY INDEX ROWID BATCHED === SQL Server ===-- SQL Server uses "Index Intersection" for similar effectSET SHOWPLAN_TEXT ON;SELECT * FROM orders WHERE status = 'pending' AND amount > 100; -- May show:-- Index Seek (idx_status)-- Index Seek (idx_amount) -- Hash Match (Inner Join on RID) -- RID intersection === MySQL ===-- MySQL "Index Merge" is the closest featureEXPLAIN SELECT * FROM orders WHERE status = 'pending' AND amount > 100; -- May show type: index_merge-- Extra: Using intersect(idx_status,idx_amount); Using wherePostgreSQL's bitmap scan is particularly powerful due to its ability to gracefully degrade to lossy mode, combine arbitrary numbers of indexes, and handle complex boolean expressions efficiently. Other databases offer similar capabilities but often with more restrictions.
Let's examine real-world scenarios where bitmap scans provide significant performance benefits.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
-- Scenario: User applies multiple filters on product catalog-- Table: products (10 million rows)-- Indexes: category, brand, price_range, rating, in_stock -- User's filter: Electronics, Sony or Samsung, Under $500, 4+ starsSELECT product_id, name, price, ratingFROM productsWHERE category = 'electronics' AND brand IN ('sony', 'samsung') AND price < 500 AND rating >= 4.0 AND in_stock = true; -- Without bitmap scans: Pick one index, filter the rest-- With bitmap scans: EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT product_id, name, price, ratingFROM productsWHERE category = 'electronics' AND brand IN ('sony', 'samsung') AND price < 500 AND rating >= 4.0 AND in_stock = true; /*Bitmap Heap Scan on products (actual time=12.345..89.123 rows=2,847 loops=1) Recheck Cond: ((category = 'electronics') AND (brand = ANY (...)) AND (price < 500) AND (rating >= 4.0) AND in_stock) Heap Blocks: exact=2134 Buffers: shared hit=2567 -> BitmapAnd (actual time=11.234..11.234 rows=0 loops=1) -> Bitmap Index Scan on idx_category (actual time=1.234..1.234 rows=0 loops=1) Index Cond: (category = 'electronics') -- 800,000 products are electronics (8% selectivity) -> BitmapOr (actual time=2.345..2.345 rows=0 loops=1) -> Bitmap Index Scan on idx_brand Index Cond: (brand = 'sony') -> Bitmap Index Scan on idx_brand Index Cond: (brand = 'samsung') -- 150,000 products are Sony or Samsung -> Bitmap Index Scan on idx_price (actual time=1.567..1.567 rows=0 loops=1) Index Cond: (price < 500) -- 3,000,000 products under $500 -> Bitmap Index Scan on idx_rating (actual time=1.234..1.234 rows=0 loops=1) Index Cond: (rating >= 4.0) -- 2,500,000 products have 4+ rating -> Bitmap Index Scan on idx_stock (actual time=0.567..0.567 rows=0 loops=1) Index Cond: (in_stock = true) -- 6,000,000 products in stock Planning Time: 0.456 msExecution Time: 89.567 ms*/ -- Combined selectivity: only 2,847 matching rows across 2,134 pages-- Without bitmaps: would need to pick ONE index, then filter millions of rows123456789101112131415161718192021222324252627282930
-- Scenario: Find error logs in specific time window from multiple sources-- Table: application_logs (500 million rows, heavily partitioned) SELECT log_id, timestamp, message, sourceFROM application_logsWHERE timestamp BETWEEN '2024-01-15 00:00' AND '2024-01-15 23:59' AND log_level = 'ERROR' AND source IN ('auth-service', 'payment-service', 'order-service'); /*Bitmap Heap Scan on application_logs (actual time=45.123..234.567 rows=8,923 loops=1) Recheck Cond: ((timestamp >= ...) AND (timestamp <= ...) AND (log_level = 'ERROR') AND (source = ANY (...))) Heap Blocks: exact=7234 Buffers: shared hit=8234 read=567 -> BitmapAnd (actual time=42.123..42.123 rows=0 loops=1) -> Bitmap Index Scan on idx_logs_timestamp Index Cond: ((timestamp >= ...) AND (timestamp <= ...)) -- ~2M entries for that day -> Bitmap Index Scan on idx_logs_level Index Cond: (log_level = 'ERROR') -- ~1% of all entries are errors -> BitmapOr -> Bitmap Index Scan on idx_logs_source (source='auth-service') -> Bitmap Index Scan on idx_logs_source (source='payment-service') -> Bitmap Index Scan on idx_logs_source (source='order-service') Execution Time: 234.567 ms*/ -- Without bitmap: Would need table partitioned by time, or very slow scanYou now have comprehensive knowledge of bitmap scans—the hybrid access method that combines index selectivity with sequential access efficiency. Here are the essential takeaways:
What's Next:
The final page of this module covers Access Method Selection—how query optimizers decide which access method to use for each query. We'll explore the decision process, cost-based comparisons, and how to influence access method choices through query hints and statistics.
You now understand bitmap scans at a deep level—their mechanics, optimal use cases, combination operations, and memory considerations. This knowledge enables you to recognize when bitmap scans are the right choice and design indexes and queries that leverage their power.