Bitmap Indexes - Learning Module

Loading content...

0/252

Low Cardinality Columns: The Bitmap Sweet Spot

Why Cardinality Determines Bitmap Viability

Throughout our exploration of bitmap indexes, we've repeatedly mentioned that they excel for low-cardinality columns. But what exactly does "low cardinality" mean? How low is low enough? And what happens when cardinality increases?

This page provides definitive answers. We'll quantify the relationship between cardinality and bitmap effectiveness, establish practical thresholds for real-world decision-making, and understand the mathematical basis for why cardinality matters so profoundly.

Cardinality refers to the number of distinct values in a column. A column with cardinality 3 has three unique values (like Status = {'Active', 'Pending', 'Closed'}). A column with cardinality 1 million has a million unique values (like UserID in a large system).

What You Will Learn

By the end of this page, you will understand: (1) The mathematical relationship between cardinality and bitmap space consumption, (2) Practical cardinality thresholds for bitmap index decisions, (3) How selectivity interacts with cardinality, (4) The crossover point where B+-trees become more efficient, and (5) Strategies for handling medium-cardinality columns.

The Cardinality-Space Relationship

The space consumed by a bitmap index is directly proportional to the cardinality of the indexed column. Understanding this relationship is crucial for capacity planning and index selection.

Uncompressed bitmap space formula:

For a table with N rows and a column with C distinct values (cardinality C):

Total bits = N × C
Total bytes = (N × C) / 8
Total megabytes = (N × C) / 8,000,000

Example calculations:

Table Size (N)	Cardinality (C)	Bitmap Size
1 million	2 (boolean)	250 KB
1 million	10 (status codes)	1.25 MB
1 million	100 (categories)	12.5 MB
1 million	1,000 (products)	125 MB
1 million	10,000 (customers)	1.25 GB
1 million	1,000,000 (unique IDs)	125 GB

The growth is linear in both N and C, but notice how dramatically space increases with cardinality:

From C=10 to C=10,000 (1000× increase): space grows from 1.25 MB to 1.25 GB
At C=N (every value unique), bitmap size equals N²/8—utterly impractical

The Quadratic Trap

When cardinality approaches row count (C ≈ N), uncompressed bitmap space becomes O(N²). For a billion-row table with a billion unique values, you'd need over 100 petabytes—physically impossible. This is why bitmap indexes are strictly limited to low-cardinality scenarios without compression.

Space comparison with B+-tree:

A B+-tree index on the same column requires approximately:

B+-tree size ≈ N × (key_size + pointer_size) / fill_factor
           ≈ N × (8 + 8) / 0.67
           ≈ 24 × N bytes

For 1 million rows: ~24 MB (constant regardless of cardinality)

Crossover analysis:

Bitmaps are smaller when:

(N × C) / 8 < 24 × N
C / 8 < 24
C < 192

For uncompressed bitmaps, the break-even point is roughly C < 200. Above this cardinality, B+-trees use less space than uncompressed bitmaps.

With compression, bitmaps remain competitive to higher cardinalities (often C < 10,000 or more), as we'll explore in the compression section.

What Qualifies as Low Cardinality?

The term "low cardinality" is context-dependent. A column with 1,000 distinct values might be low-cardinality in a billion-row table but high-cardinality in a thousand-row table. The key metric is the cardinality ratio:

Cardinality Ratio = C / N

Where C = number of distinct values and N = number of rows

Cardinality Ratio	Classification	Bitmap Suitability
< 0.01% (C/N < 0.0001)	Very Low	Excellent
0.01% - 1%	Low	Good
1% - 10%	Medium	Consider with compression
10% - 50%	High	Generally not suitable
50%	Very High	Avoid bitmaps entirely

Practical examples:

Very Low Cardinality (< 0.01%):

Gender in a 100 million row table (2 values, ratio = 0.000002%)
Boolean flags (Active/Inactive, Verified/Unverified)
Status codes in large processing tables (5-10 statuses, millions of rows)

Low Cardinality (0.01% - 1%):

Country codes in a global user table (200 countries, millions of users)
Product categories in an e-commerce catalog (100 categories, 100K products)
Payment methods (10-20 methods, millions of transactions)

Medium Cardinality (1% - 10%):

US state codes in a regional dataset (50 states, 5,000 records)
Day of year (365 values, 10,000-50,000 rows)
Age values (0-120, 5,000 person records)

High Cardinality (> 10%):

Email domains in a small user table (1,000 domains, 2,000 users)
Product SKUs (50,000 SKUs, 100,000 orders)
User IDs (almost always unique)

The 1% Rule of Thumb

A practical guideline: consider bitmap indexes when cardinality is less than 1% of row count, and strongly favor them when cardinality is less than 0.1%. Above 1%, carefully analyze compression benefits and query patterns before committing to bitmaps.

Why Low Cardinality Matters: The Three Factors

Low cardinality benefits bitmap indexes in three interconnected ways: space efficiency, query efficiency, and compression effectiveness. Let's examine each:

Factor 1: Space Efficiency

•Fewer bitmaps needed — With only 10 distinct values, you only store 10 bitmaps. With 10,000 distinct values, you store 10,000 bitmaps.
•Manageable total size — Even for billion-row tables, 10 bitmaps × 1 billion bits = 1.25 GB (manageable) vs. 10,000 bitmaps × 1 billion bits = 1.25 TB (problematic).
•Fits in memory — Low-cardinality bitmap indexes often fit entirely in RAM, enabling in-memory operations without I/O.

Factor 2: Query Efficiency

•Dense bitmaps — With low cardinality, each bitmap has many 1-bits (high density). A 'Status' bitmap with 3 values has roughly 33% 1-bits each.
•Faster popcount — Dense bitmaps make population count (counting matching rows) more cache-efficient—fewer sparse regions to skip.
•Meaningful intersections — ANDing two dense bitmaps likely produces non-empty results. ANDing two sparse bitmaps often yields zero matches.

Factor 3: Compression Effectiveness

•Long runs of 0s or 1s — Low cardinality means each value appears in many consecutive rows (if data is clustered), creating compressible patterns.
•Higher compression ratios — Run-length encoding can compress 11111111111111111111 (20 ones) to just 'run of 20 ones' regardless of length.
•Predictable patterns — Low-cardinality columns often have regular patterns (e.g., status changes systematically) that compress well.

The density insight:

For a column with cardinality C and N rows (assuming uniform distribution):

Each value appears in approximately N/C rows
Each bitmap has approximately N/C bits set to 1
Bitmap density = (N/C) / N = 1/C

Cardinality	Bitmap Density	Bits set per million rows
2	50%	500,000
10	10%	100,000
100	1%	10,000
1,000	0.1%	1,000
10,000	0.01%	100

At high cardinality, bitmaps become extremely sparse—mostly zeros with rare ones. Operations become overhead-dominated, and compression becomes essential just to achieve reasonable sizes.

Selectivity and Cardinality Interaction

Selectivity measures what fraction of rows a predicate selects. It interacts with cardinality in important ways that affect query planning:

Selectivity formulas:

For equality predicates on uniformly distributed data:

Selectivity(col = value) = 1 / C

Where C is cardinality.

Cardinality	Selectivity per value	Rows selected (1M table)
2	50%	500,000
10	10%	100,000
100	1%	10,000
1,000	0.1%	1,000
10,000	0.01%	100
100,000	0.001%	10

The selectivity-cardinality paradox:

High-cardinality columns have high selectivity (each value selects few rows)—which is traditionally good for indexing. But for bitmaps, this is a problem:

Low cardinality (good for bitmaps):

Few bitmaps to store
Each bitmap is dense
But each equality query matches many rows (low selectivity)

High cardinality (bad for bitmaps):

Many bitmaps to store
Each bitmap is sparse
But each equality query matches few rows (high selectivity)

This is the opposite of B+-tree behavior! B+-trees excel when selectivity is high (quickly navigating to specific rows). Bitmaps excel when selectivity is low but multiple predicates combine to create high overall selectivity.

The Bitmap Sweet Spot

Bitmaps shine when individual predicates have low selectivity (match many rows) but combining multiple predicates via AND produces high selectivity (match few rows). This is the quintessential analytical query pattern: 'Sales in Q4 AND Region=West AND Category=Electronics AND Channel=Online'.

Combined selectivity example:

A sales data warehouse query:

SELECT COUNT(*) FROM Sales
WHERE Region = 'West'           -- 20% selectivity (5 regions)
  AND Quarter = 'Q4'            -- 25% selectivity (4 quarters)  
  AND Category = 'Electronics'  -- 10% selectivity (10 categories)
  AND Channel = 'Online'        -- 33% selectivity (3 channels)

Combined selectivity (assuming independence):

0.20 × 0.25 × 0.10 × 0.33 = 0.00165 = 0.165%

On a 100 million row table, this matches ~165,000 rows.

With B+-tree indexes:

Use one index to find 20 million Region='West' rows
Fetch all 20 million to apply remaining predicates
Or use intersection of multiple index results (expensive)

With bitmap indexes:

AND four bitmaps together (microseconds of CPU work)
Result bitmap directly identifies the 165,000 matching rows
No row fetches until the final count/aggregation

Real-World Column Analysis

Let's analyze typical columns in real database schemas to assess bitmap suitability:

Bitmap Suitability Analysis for Common Columns
Column Type	Typical Cardinality	Bitmap Verdict	Reasoning
Boolean flags	2	✅ Excellent	Maximum density, minimal storage
Gender	2-3	✅ Excellent	Very low cardinality, high frequency
Status/State	3-20	✅ Excellent	Common in workflow systems
Priority levels	3-5	✅ Excellent	Low, high, urgent, critical, etc.
Day of week	7	✅ Excellent	Fixed, low cardinality
Month	12	✅ Excellent	Fixed, low cardinality
Quarter	4	✅ Excellent	Very low cardinality
Year	5-50	✅ Good	Depends on data range
Country code	200-250	✅ Good	Manageable cardinality
US State	50	✅ Good	Low cardinality
Product category	10-500	⚠️ Consider	Depends on catalog size
Age	0-120	⚠️ Consider	Medium cardinality, consider ranges
Hour of day	24	✅ Good	Fixed, manageable
Minute of hour	60	⚠️ Consider	Borderline, evaluate queries
Department	10-100	✅ Good	Typically low cardinality
Brand	100-10,000	⚠️ Consider	Varies widely by industry
City	1,000-100,000	❌ Avoid	Usually too high
ZIP code	~42,000 (US)	❌ Avoid	High cardinality
Date (exact)	365-36,500	❌ Avoid	Use range encoding or partition
Timestamp	Very high	❌ Avoid	Near-unique values
Email	~unique	❌ Avoid	Cardinality ≈ row count
User ID	~unique	❌ Avoid	Primary key, use B+-tree
Phone number	~unique	❌ Avoid	Near-unique values

Query Pattern Override

The analysis above assumes equality queries. If your workload heavily uses range queries on dates or numeric columns, consider range-encoded bitmaps instead of equality bitmaps, or use B+-trees. Always profile with actual query patterns.

The Medium-Cardinality Challenge

Columns with medium cardinality (roughly 100-10,000 distinct values) present a dilemma. They're too high for simple bitmap efficiency but may still benefit from bitmap operations. Several strategies address this challenge:

Strategy 1: Bitmap Compression

Compression dramatically extends bitmap viability. Run-length encoding (RLE) and Word-Aligned Hybrid (WAH) can reduce bitmap sizes by 10-100x for sparse bitmaps.

Example:

Column: ProductID with 5,000 distinct values
Table: 10 million rows
Uncompressed: 5,000 × 10M bits = 6.25 GB
With WAH compression (typical 20:1 ratio for sparse bitmaps): ~300 MB

We'll cover compression in detail in the next page.

Strategy 2: Value Grouping

Group individual values into logical categories and index the groups:

Instead of:

Bitmap for each of 5,000 products

Use:

Bitmap for each of 50 product categories
Filter: Category = 'Electronics' (bitmap operation)
Then scan matching rows for specific ProductID if needed

This hybrid approach uses bitmaps for coarse filtering and secondary mechanisms for fine filtering.

Strategy 3: Range/Interval Encoding

For numeric columns with medium cardinality, use interval bitmaps:

Instead of:

100 age bitmaps (age 0, 1, 2, ..., 99)

Use:

10 interval bitmaps (0-9, 10-19, ..., 90-99)
Query age BETWEEN 25 AND 45: OR intervals [20-29], [30-39], [40-49]
Then apply refined filter if exact age needed

Strategy 4: Hybrid Indexing

Maintain both bitmap and B+-tree indexes on the same column:

Bitmap index: For analytical queries combining this column with others
B+-tree index: For point lookups and range scans on just this column

The query optimizer chooses the appropriate index based on query structure.

Cost: Double index maintenance overhead Benefit: Optimal performance for both query types

Modern Database Solutions

Many modern analytical databases (Vertica, ClickHouse, SingleStore) automatically handle medium-cardinality columns with adaptive encoding—starting with dictionary encoding, switching to bitmaps for frequent values, and using other structures for rare values. Manual optimization is becoming less necessary.

Cardinality-Based Decision Framework

When deciding whether to use bitmap indexes, follow this systematic framework:

Step 1: Calculate cardinality ratio

ratio = COUNT(DISTINCT column) / COUNT(*)

Step 2: Classify the cardinality

IF ratio < 0.0001 (0.01%)  → Very Low  → STRONG bitmap candidate
IF ratio < 0.01 (1%)       → Low       → Good bitmap candidate  
IF ratio < 0.1 (10%)       → Medium    → Consider with compression
IF ratio >= 0.1            → High      → Avoid bitmaps

Step 3: Evaluate query patterns

IF queries heavily use multi-column AND → +1 for bitmaps
IF queries use range predicates         → Consider range encoding
IF queries are point lookups            → B+-tree likely better
IF queries are aggregations/counts      → +1 for bitmaps

Step 4: Assess update patterns

IF table is append-only or batch-loaded → +1 for bitmaps
IF table has frequent random updates    → -1 for bitmaps
IF table is partitioned by time         → Bitmaps on historical partitions

Decision matrix:

Cardinality	Query Pattern	Update Pattern	Recommendation
Very Low	Multi-column AND	Batch load	✅ Bitmap (ideal case)
Very Low	Point lookup	Batch load	⚠️ Bitmap or B+-tree
Very Low	Any	Frequent updates	⚠️ Consider update cost
Low	Multi-column AND	Batch load	✅ Bitmap
Low	Multi-column AND	Frequent updates	⚠️ Bitmap with caution
Low	Range queries	Any	⚠️ Range-encoded bitmap
Medium	Multi-column AND	Batch load	⚠️ Bitmap with compression
Medium	Any	Frequent updates	❌ B+-tree
High	Any	Any	❌ B+-tree or hash

The Golden Rule

Use bitmap indexes when: (1) Cardinality ratio is below 1%, (2) Queries combine multiple low-cardinality predicates, (3) The table is read-heavy or batch-loaded, and (4) Workload is analytical rather than transactional.

Summary: Cardinality as the Gatekeeper

We've established why cardinality is the critical factor in bitmap index suitability. Let's consolidate the key insights:

Key Takeaways

•Bitmap space grows linearly with cardinality — Total size = N × C bits. High cardinality leads to prohibitive storage.
•The cardinality ratio (C/N) is the key metric — Below 1% is generally suitable; above 10% is generally unsuitable.
•Low cardinality creates dense bitmaps — Dense bitmaps compress better, operate faster, and have more meaningful intersections.
•Selectivity works differently for bitmaps — Individual low selectivity (matching many rows) plus combined high selectivity is the sweet spot.
•Medium cardinality requires strategies — Compression, grouping, interval encoding, or hybrid approaches.
•Real-world analysis is essential — Evaluate your specific columns, table sizes, query patterns, and update frequencies.
•The crossover with B+-trees is around C=200 — Without compression, B+-trees become more space-efficient above this threshold.

What's next:

With cardinality fundamentals established, we'll explore bitmap operations in depth—the AND, OR, NOT, and XOR operations that make bitmaps powerful, including SIMD optimizations and operation ordering strategies.

Page Complete

You now understand why cardinality is the essential factor in bitmap index decisions—affecting space, performance, and compression effectiveness. Next, we'll explore the bit-level operations that give bitmap indexes their remarkable query performance.

Low Cardinality Columns: The Bitmap Sweet Spot

Why Cardinality Determines Bitmap Viability

What You Will Learn

The Cardinality-Space Relationship

The space consumed by a bitmap index is directly proportional to the cardinality of the indexed column. Understanding this relationship is crucial for capacity planning and index selection.

Uncompressed bitmap space formula:

For a table with N rows and a column with C distinct values (cardinality C):

Total bits = N × C
Total bytes = (N × C) / 8
Total megabytes = (N × C) / 8,000,000

Example calculations:

Table Size (N)	Cardinality (C)	Bitmap Size
1 million	2 (boolean)	250 KB
1 million	10 (status codes)	1.25 MB
1 million	100 (categories)	12.5 MB
1 million	1,000 (products)	125 MB
1 million	10,000 (customers)	1.25 GB
1 million	1,000,000 (unique IDs)	125 GB

The growth is linear in both N and C, but notice how dramatically space increases with cardinality:

From C=10 to C=10,000 (1000× increase): space grows from 1.25 MB to 1.25 GB
At C=N (every value unique), bitmap size equals N²/8—utterly impractical

The Quadratic Trap

Space comparison with B+-tree:

A B+-tree index on the same column requires approximately:

B+-tree size ≈ N × (key_size + pointer_size) / fill_factor
           ≈ N × (8 + 8) / 0.67
           ≈ 24 × N bytes

For 1 million rows: ~24 MB (constant regardless of cardinality)

Crossover analysis:

Bitmaps are smaller when:

(N × C) / 8 < 24 × N
C / 8 < 24
C < 192

For uncompressed bitmaps, the break-even point is roughly C < 200. Above this cardinality, B+-trees use less space than uncompressed bitmaps.

With compression, bitmaps remain competitive to higher cardinalities (often C < 10,000 or more), as we'll explore in the compression section.

What Qualifies as Low Cardinality?

Cardinality Ratio = C / N

Where C = number of distinct values and N = number of rows

Cardinality Ratio	Classification	Bitmap Suitability
< 0.01% (C/N < 0.0001)	Very Low	Excellent
0.01% - 1%	Low	Good
1% - 10%	Medium	Consider with compression
10% - 50%	High	Generally not suitable
50%	Very High	Avoid bitmaps entirely

Practical examples:

Very Low Cardinality (< 0.01%):

Gender in a 100 million row table (2 values, ratio = 0.000002%)
Boolean flags (Active/Inactive, Verified/Unverified)
Status codes in large processing tables (5-10 statuses, millions of rows)

Low Cardinality (0.01% - 1%):

Country codes in a global user table (200 countries, millions of users)
Product categories in an e-commerce catalog (100 categories, 100K products)
Payment methods (10-20 methods, millions of transactions)

Medium Cardinality (1% - 10%):

US state codes in a regional dataset (50 states, 5,000 records)
Day of year (365 values, 10,000-50,000 rows)
Age values (0-120, 5,000 person records)

High Cardinality (> 10%):

Email domains in a small user table (1,000 domains, 2,000 users)
Product SKUs (50,000 SKUs, 100,000 orders)
User IDs (almost always unique)

The 1% Rule of Thumb

Why Low Cardinality Matters: The Three Factors

Low cardinality benefits bitmap indexes in three interconnected ways: space efficiency, query efficiency, and compression effectiveness. Let's examine each:

Factor 1: Space Efficiency

•Fewer bitmaps needed — With only 10 distinct values, you only store 10 bitmaps. With 10,000 distinct values, you store 10,000 bitmaps.
•Manageable total size — Even for billion-row tables, 10 bitmaps × 1 billion bits = 1.25 GB (manageable) vs. 10,000 bitmaps × 1 billion bits = 1.25 TB (problematic).
•Fits in memory — Low-cardinality bitmap indexes often fit entirely in RAM, enabling in-memory operations without I/O.

Factor 2: Query Efficiency

•Dense bitmaps — With low cardinality, each bitmap has many 1-bits (high density). A 'Status' bitmap with 3 values has roughly 33% 1-bits each.
•Faster popcount — Dense bitmaps make population count (counting matching rows) more cache-efficient—fewer sparse regions to skip.
•Meaningful intersections — ANDing two dense bitmaps likely produces non-empty results. ANDing two sparse bitmaps often yields zero matches.

Factor 3: Compression Effectiveness

•Long runs of 0s or 1s — Low cardinality means each value appears in many consecutive rows (if data is clustered), creating compressible patterns.
•Higher compression ratios — Run-length encoding can compress 11111111111111111111 (20 ones) to just 'run of 20 ones' regardless of length.
•Predictable patterns — Low-cardinality columns often have regular patterns (e.g., status changes systematically) that compress well.

The density insight:

For a column with cardinality C and N rows (assuming uniform distribution):

Each value appears in approximately N/C rows
Each bitmap has approximately N/C bits set to 1
Bitmap density = (N/C) / N = 1/C

Cardinality	Bitmap Density	Bits set per million rows
2	50%	500,000
10	10%	100,000
100	1%	10,000
1,000	0.1%	1,000
10,000	0.01%	100

At high cardinality, bitmaps become extremely sparse—mostly zeros with rare ones. Operations become overhead-dominated, and compression becomes essential just to achieve reasonable sizes.

Selectivity and Cardinality Interaction

Selectivity measures what fraction of rows a predicate selects. It interacts with cardinality in important ways that affect query planning:

Selectivity formulas:

For equality predicates on uniformly distributed data:

Selectivity(col = value) = 1 / C

Where C is cardinality.

Cardinality	Selectivity per value	Rows selected (1M table)
2	50%	500,000
10	10%	100,000
100	1%	10,000
1,000	0.1%	1,000
10,000	0.01%	100
100,000	0.001%	10

The selectivity-cardinality paradox:

High-cardinality columns have high selectivity (each value selects few rows)—which is traditionally good for indexing. But for bitmaps, this is a problem:

Low cardinality (good for bitmaps):

Few bitmaps to store
Each bitmap is dense
But each equality query matches many rows (low selectivity)

High cardinality (bad for bitmaps):

Many bitmaps to store
Each bitmap is sparse
But each equality query matches few rows (high selectivity)

The Bitmap Sweet Spot

Combined selectivity example:

A sales data warehouse query:

SELECT COUNT(*) FROM Sales
WHERE Region = 'West'           -- 20% selectivity (5 regions)
  AND Quarter = 'Q4'            -- 25% selectivity (4 quarters)  
  AND Category = 'Electronics'  -- 10% selectivity (10 categories)
  AND Channel = 'Online'        -- 33% selectivity (3 channels)

Combined selectivity (assuming independence):

0.20 × 0.25 × 0.10 × 0.33 = 0.00165 = 0.165%

On a 100 million row table, this matches ~165,000 rows.

With B+-tree indexes:

Use one index to find 20 million Region='West' rows
Fetch all 20 million to apply remaining predicates
Or use intersection of multiple index results (expensive)

With bitmap indexes:

AND four bitmaps together (microseconds of CPU work)
Result bitmap directly identifies the 165,000 matching rows
No row fetches until the final count/aggregation

Real-World Column Analysis

Let's analyze typical columns in real database schemas to assess bitmap suitability:

Bitmap Suitability Analysis for Common Columns
Column Type	Typical Cardinality	Bitmap Verdict	Reasoning
Boolean flags	2	✅ Excellent	Maximum density, minimal storage
Gender	2-3	✅ Excellent	Very low cardinality, high frequency
Status/State	3-20	✅ Excellent	Common in workflow systems
Priority levels	3-5	✅ Excellent	Low, high, urgent, critical, etc.
Day of week	7	✅ Excellent	Fixed, low cardinality
Month	12	✅ Excellent	Fixed, low cardinality
Quarter	4	✅ Excellent	Very low cardinality
Year	5-50	✅ Good	Depends on data range
Country code	200-250	✅ Good	Manageable cardinality
US State	50	✅ Good	Low cardinality
Product category	10-500	⚠️ Consider	Depends on catalog size
Age	0-120	⚠️ Consider	Medium cardinality, consider ranges
Hour of day	24	✅ Good	Fixed, manageable
Minute of hour	60	⚠️ Consider	Borderline, evaluate queries
Department	10-100	✅ Good	Typically low cardinality
Brand	100-10,000	⚠️ Consider	Varies widely by industry
City	1,000-100,000	❌ Avoid	Usually too high
ZIP code	~42,000 (US)	❌ Avoid	High cardinality
Date (exact)	365-36,500	❌ Avoid	Use range encoding or partition
Timestamp	Very high	❌ Avoid	Near-unique values
Email	~unique	❌ Avoid	Cardinality ≈ row count
User ID	~unique	❌ Avoid	Primary key, use B+-tree
Phone number	~unique	❌ Avoid	Near-unique values

Query Pattern Override

The Medium-Cardinality Challenge

Strategy 1: Bitmap Compression

Compression dramatically extends bitmap viability. Run-length encoding (RLE) and Word-Aligned Hybrid (WAH) can reduce bitmap sizes by 10-100x for sparse bitmaps.

Example:

Column: ProductID with 5,000 distinct values
Table: 10 million rows
Uncompressed: 5,000 × 10M bits = 6.25 GB
With WAH compression (typical 20:1 ratio for sparse bitmaps): ~300 MB

We'll cover compression in detail in the next page.

Strategy 2: Value Grouping

Group individual values into logical categories and index the groups:

Instead of:

Bitmap for each of 5,000 products

Use:

Bitmap for each of 50 product categories
Filter: Category = 'Electronics' (bitmap operation)
Then scan matching rows for specific ProductID if needed

This hybrid approach uses bitmaps for coarse filtering and secondary mechanisms for fine filtering.

Strategy 3: Range/Interval Encoding

For numeric columns with medium cardinality, use interval bitmaps:

Instead of:

100 age bitmaps (age 0, 1, 2, ..., 99)

Use:

10 interval bitmaps (0-9, 10-19, ..., 90-99)
Query age BETWEEN 25 AND 45: OR intervals [20-29], [30-39], [40-49]
Then apply refined filter if exact age needed

Strategy 4: Hybrid Indexing

Maintain both bitmap and B+-tree indexes on the same column:

Bitmap index: For analytical queries combining this column with others
B+-tree index: For point lookups and range scans on just this column

The query optimizer chooses the appropriate index based on query structure.

Cost: Double index maintenance overhead Benefit: Optimal performance for both query types

Modern Database Solutions

Cardinality-Based Decision Framework

When deciding whether to use bitmap indexes, follow this systematic framework:

Step 1: Calculate cardinality ratio

ratio = COUNT(DISTINCT column) / COUNT(*)

Step 2: Classify the cardinality

IF ratio < 0.0001 (0.01%)  → Very Low  → STRONG bitmap candidate
IF ratio < 0.01 (1%)       → Low       → Good bitmap candidate  
IF ratio < 0.1 (10%)       → Medium    → Consider with compression
IF ratio >= 0.1            → High      → Avoid bitmaps

Step 3: Evaluate query patterns

IF queries heavily use multi-column AND → +1 for bitmaps
IF queries use range predicates         → Consider range encoding
IF queries are point lookups            → B+-tree likely better
IF queries are aggregations/counts      → +1 for bitmaps

Step 4: Assess update patterns

IF table is append-only or batch-loaded → +1 for bitmaps
IF table has frequent random updates    → -1 for bitmaps
IF table is partitioned by time         → Bitmaps on historical partitions

Decision matrix:

Cardinality	Query Pattern	Update Pattern	Recommendation
Very Low	Multi-column AND	Batch load	✅ Bitmap (ideal case)
Very Low	Point lookup	Batch load	⚠️ Bitmap or B+-tree
Very Low	Any	Frequent updates	⚠️ Consider update cost
Low	Multi-column AND	Batch load	✅ Bitmap
Low	Multi-column AND	Frequent updates	⚠️ Bitmap with caution
Low	Range queries	Any	⚠️ Range-encoded bitmap
Medium	Multi-column AND	Batch load	⚠️ Bitmap with compression
Medium	Any	Frequent updates	❌ B+-tree
High	Any	Any	❌ B+-tree or hash

The Golden Rule

Summary: Cardinality as the Gatekeeper

We've established why cardinality is the critical factor in bitmap index suitability. Let's consolidate the key insights:

Key Takeaways

•Bitmap space grows linearly with cardinality — Total size = N × C bits. High cardinality leads to prohibitive storage.
•The cardinality ratio (C/N) is the key metric — Below 1% is generally suitable; above 10% is generally unsuitable.
•Low cardinality creates dense bitmaps — Dense bitmaps compress better, operate faster, and have more meaningful intersections.
•Selectivity works differently for bitmaps — Individual low selectivity (matching many rows) plus combined high selectivity is the sweet spot.
•Medium cardinality requires strategies — Compression, grouping, interval encoding, or hybrid approaches.
•Real-world analysis is essential — Evaluate your specific columns, table sizes, query patterns, and update frequencies.
•The crossover with B+-trees is around C=200 — Without compression, B+-trees become more space-efficient above this threshold.

What's next:

Page Complete