Loading learning content...
The true power of bitmap indexes lies not in storage but in operations. The ability to combine bitmaps using logical operations—AND, OR, NOT, XOR—transforms complex multi-predicate queries into simple, blazingly fast bit manipulations.
This page provides a comprehensive treatment of bitmap operations: from the fundamental logic to SIMD-accelerated implementations to advanced optimization strategies. By the end, you'll understand how a modern database can evaluate complex analytical queries on billions of rows in milliseconds.
By the end of this page, you will understand: (1) All fundamental bitmap operations and their query semantics, (2) How modern CPUs execute bitmap operations with SIMD instructions, (3) Operation ordering and short-circuit optimization techniques, (4) Population count (POPCNT) for result processing, and (5) Advanced operations like ANDNOT and bitmap arithmetic.
Four fundamental operations form the basis of all bitmap query processing. Each corresponds directly to SQL predicate semantics:
AND Operation (Intersection)
The AND operation finds rows that satisfy all conditions simultaneously.
Bitmap A: 1 0 1 1 0 1 0 0
Bitmap B: 1 1 0 1 0 0 0 1
──────────────────────────────
A AND B: 1 0 0 1 0 0 0 0
SQL correspondence:
WHERE Region = 'North' AND Status = 'Active'
Becomes: Bitmap_North AND Bitmap_Active
Truth table:
| A | B | A AND B |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
Only rows with both conditions produce a 1-bit in the result.
OR Operation (Union)
The OR operation finds rows that satisfy any of the conditions.
Bitmap A: 1 0 1 1 0 1 0 0
Bitmap B: 1 1 0 1 0 0 0 1
──────────────────────────────
A OR B: 1 1 1 1 0 1 0 1
SQL correspondence:
WHERE Region = 'North' OR Region = 'South'
Becomes: Bitmap_North OR Bitmap_South
Truth table:
| A | B | A OR B |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |
Rows with either condition (or both) produce a 1-bit.
NOT Operation (Complement)
The NOT operation inverts all bits, finding rows that don't satisfy a condition.
Bitmap A: 1 0 1 1 0 1 0 0
──────────────────────────────
NOT A: 0 1 0 0 1 0 1 1
SQL correspondence:
WHERE Status != 'Cancelled'
Becomes: NOT(Bitmap_Cancelled)
Important caveat: NOT alone can include NULL values. Safe NOT usually requires:
NOT(Bitmap_Value) AND NOT(Bitmap_NULL)
Truth table:
| A | NOT A |
|---|---|
| 0 | 1 |
| 1 | 0 |
XOR Operation (Exclusive Or)
The XOR operation finds rows where exactly one condition is true (not both).
Bitmap A: 1 0 1 1 0 1 0 0
Bitmap B: 1 1 0 1 0 0 0 1
──────────────────────────────
A XOR B: 0 1 1 0 0 1 0 1
SQL correspondence:
-- Rows where exactly one of two conditions is true
WHERE (A AND NOT B) OR (B AND NOT A)
XOR is less common in SQL queries but useful for:
Truth table:
| A | B | A XOR B |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Many bitmap libraries include ANDNOT as a primitive: A ANDNOT B = A AND (NOT B). This is common enough (e.g., 'Region = North AND Status != Cancelled') that a single-pass implementation is worthwhile. It avoids creating the intermediate NOT(B) bitmap.
Bitmap operations are remarkable because they map directly to CPU instructions. A single instruction can process 64 bits (64 row comparisons) simultaneously, and with SIMD extensions, hundreds of bits per instruction.
Basic 64-bit operations:
On a 64-bit CPU, bitmaps are processed in 64-bit words:
uint64_t word_a = bitmap_a[i]; // 64 bits from bitmap A
uint64_t word_b = bitmap_b[i]; // 64 bits from bitmap B
uint64_t result_and = word_a & word_b; // 64 AND operations in 1 instruction
uint64_t result_or = word_a | word_b; // 64 OR operations in 1 instruction
uint64_t result_xor = word_a ^ word_b; // 64 XOR operations in 1 instruction
uint64_t result_not = ~word_a; // 64 NOT operations in 1 instruction
Instructions and latency:
| Operation | x86-64 Instruction | Cycles (typical) | Bits processed |
|---|---|---|---|
| AND | AND | 1 | 64 |
| OR | OR | 1 | 64 |
| XOR | XOR | 1 | 64 |
| NOT | NOT | 1 | 64 |
| POPCNT | POPCNT | 3 | 64 |
At 3 GHz, AND operations process: 3×10⁹ × 64 = 192 billion row comparisons per second on a single core.
SIMD Acceleration:
Modern CPUs include SIMD (Single Instruction, Multiple Data) extensions that process even wider vectors:
SSE2 (128-bit vectors):
__m128i vec_a = _mm_load_si128((__m128i*)&bitmap_a[i]);
__m128i vec_b = _mm_load_si128((__m128i*)&bitmap_b[i]);
__m128i result = _mm_and_si128(vec_a, vec_b); // 128 bits in 1 instruction
_mm_store_si128((__m128i*)&result_bitmap[i], result);
AVX2 (256-bit vectors):
__m256i vec_a = _mm256_load_si256((__m256i*)&bitmap_a[i]);
__m256i vec_b = _mm256_load_si256((__m256i*)&bitmap_b[i]);
__m256i result = _mm256_and_si256(vec_a, vec_b); // 256 bits in 1 instruction
AVX-512 (512-bit vectors):
__m512i vec_a = _mm512_load_si512((__m512i*)&bitmap_a[i]);
__m512i vec_b = _mm512_load_si512((__m512i*)&bitmap_b[i]);
__m512i result = _mm512_and_si512(vec_a, vec_b); // 512 bits in 1 instruction!
With AVX-512, a single instruction compares 512 rows. At 3 GHz: 1.5 trillion row comparisons per second per core.
CPUs can process bits faster than memory can supply them. For large bitmaps that don't fit in cache, memory bandwidth becomes the bottleneck. Optimization shifts to compression (reducing bytes transferred), prefetching, and cache-conscious access patterns.
After bitmap operations, we often need to count the matching rows rather than enumerate them. The population count (popcount) operation counts the number of 1-bits in a word.
Hardware POPCNT instruction:
Modern x86-64 CPUs have a dedicated POPCNT instruction:
uint64_t word = 0b1011010100110101; // Example 64-bit word
int count = __builtin_popcountll(word); // Returns 9 (nine 1-bits)
Counting across a full bitmap:
int64_t count_bitmap(uint64_t* bitmap, size_t word_count) {
int64_t total = 0;
for (size_t i = 0; i < word_count; i++) {
total += __builtin_popcountll(bitmap[i]);
}
return total;
}
For a 1-billion-row bitmap (about 125 MB), counting all 1-bits takes:
With loop unrolling and SIMD, this drops to ~2-5 milliseconds.
SIMD-accelerated counting:
// AVX2 version with horizontal add
int64_t count_bitmap_avx2(uint64_t* bitmap, size_t word_count) {
__m256i total = _mm256_setzero_si256();
for (size_t i = 0; i < word_count; i += 4) {
__m256i vec = _mm256_load_si256((__m256i*)&bitmap[i]);
// Use lookup table technique for popcount
// (AVX2 doesn't have native popcount, but AVX-512 VPOPCNT does)
__m256i lookup = _mm256_setr_epi8(
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4);
// Process 4 bits at a time using shuffle lookups
// ... (detailed implementation omitted)
total = _mm256_add_epi64(total, partial_count);
}
// Horizontal sum of the 4 64-bit counters
return horizontal_sum_256(total);
}
AVX-512 VPOPCNT:
The newest CPUs have VPOPCNT instructions that count bits in 512-bit vectors directly—the ultimate in counting performance.
Bitmap indexes enable COUNT(*) queries without accessing any table data. The entire result comes from bitmap operations and population counts. This is why analytical databases can count billions of filtered rows in milliseconds.
Real SQL queries have complex predicate structures. Let's trace how various query patterns translate to bitmap operation sequences:
Query 1: Simple conjunction (AND)
SELECT COUNT(*) FROM Sales
WHERE Region = 'West' AND Year = 2024 AND Status = 'Completed';
Bitmap operations:
temp1 = Bitmap_Region_West AND Bitmap_Year_2024
result = temp1 AND Bitmap_Status_Completed
count = POPCOUNT(result)
Order doesn't matter for correctness, but it does for performance (we'll explore this).
Query 2: Disjunction (OR)
SELECT COUNT(*) FROM Sales
WHERE Region = 'West' OR Region = 'South';
Bitmap operations:
result = Bitmap_Region_West OR Bitmap_Region_South
count = POPCOUNT(result)
Query 3: IN clause
SELECT COUNT(*) FROM Sales
WHERE Status IN ('Pending', 'Processing', 'Shipped');
Bitmap operations:
temp1 = Bitmap_Status_Pending OR Bitmap_Status_Processing
result = temp1 OR Bitmap_Status_Shipped
count = POPCOUNT(result)
Query 4: NOT IN clause
SELECT COUNT(*) FROM Sales
WHERE Status NOT IN ('Cancelled', 'Returned');
Bitmap operations:
excluded = Bitmap_Status_Cancelled OR Bitmap_Status_Returned
result = NOT(excluded) ANDNOT Bitmap_Status_NULL -- Handle NULLs
count = POPCOUNT(result)
-- Or equivalently (avoiding NOT):
result = Bitmap_Status_Pending OR Bitmap_Status_Processing
OR Bitmap_Status_Shipped OR Bitmap_Status_Completed
Query 5: Mixed AND/OR with parentheses
SELECT COUNT(*) FROM Sales
WHERE Year = 2024
AND (Region = 'West' OR Region = 'South')
AND (Status = 'Completed' OR Status = 'Shipped');
Bitmap operations (following precedence):
region_filter = Bitmap_Region_West OR Bitmap_Region_South
status_filter = Bitmap_Status_Completed OR Bitmap_Status_Shipped
temp1 = Bitmap_Year_2024 AND region_filter
result = temp1 AND status_filter
count = POPCOUNT(result)
Query 6: BETWEEN (range)
SELECT COUNT(*) FROM Sales
WHERE Amount BETWEEN 100 AND 500;
With equality-encoded bitmaps (expensive):
result = Bitmap_Amount_100 OR Bitmap_Amount_101 OR ... OR Bitmap_Amount_500
-- 401 OR operations!
count = POPCOUNT(result)
With range-encoded bitmaps (efficient):
result = Bitmap_Amount_≤500 ANDNOT Bitmap_Amount_≤99
count = POPCOUNT(result)
-- Only 2 operations!
This illustrates why encoding strategy matters for query patterns.
Database query optimizers translate SQL predicates to optimal bitmap operation sequences. They consider available indexes, encoding schemes, and estimated selectivities to generate efficient operation plans. Understanding this translation helps you write more optimizer-friendly queries.
While the result of multiple ANDs and ORs is the same regardless of order, the intermediate bitmap sizes differ—affecting memory usage and cache performance. Smart ordering can significantly improve performance.
Principle: Minimize intermediate bitmap density
For AND operations, start with the smallest (sparsest) bitmaps:
Example:
Order 1: A AND B AND C
Order 2: C AND B AND A
Both produce the same final result (~4M rows), but Order 1's intermediate is 8× smaller.
For OR operations, order also matters:
Start with the largest (densest) bitmaps when ORing:
Example:
Efficient: C OR B OR A
Reasoning: OR never decreases density. Starting with dense bitmaps means fewer new bits are added in subsequent operations, improving cache locality.
Short-circuit optimization:
For AND, if an intermediate result becomes all zeros, remaining operations can be skipped:
bitmap result = bitmap_a;
for (int i = 0; i < num_bitmaps; i++) {
result = result AND bitmaps[i];
// Early termination check
if (is_all_zeros(result)) {
break; // No rows can match; stop early
}
}
Selective check: Checking for all-zeros after every AND is expensive. Instead:
Modern databases use column statistics (selectivity estimates) to order operations before execution. The optimizer estimates which predicates are most selective and places those first in AND chains. Accurate statistics are essential for optimal performance.
Compressed bitmaps present a challenge: can we perform AND/OR without decompressing? Fortunately, most bitmap compression schemes support direct operations on compressed representations.
Run-Length Encoding (RLE) operations:
RLE represents bitmaps as sequences of runs:
Original: 11111111 00000000 00000000 11111111
RLE: (8 ones) (16 zeros) (8 ones)
AND on RLE:
Bitmap A: (10 ones) (5 zeros) (8 ones)
Bitmap B: (6 ones) (12 zeros) (5 ones)
A AND B: Process runs position-by-position:
- Position 0-5: (6 ones from A) AND (6 ones from B) = (6 ones)
- Position 6-9: (4 ones from A) AND (4 zeros from B) = (4 zeros)
- Position 10-14: (5 zeros from A) AND (5 zeros from B) = (5 zeros)
- Position 15-17: (3 ones from A) AND (3 zeros from B) = (3 zeros)
- Position 18-22: (5 ones from A) AND (5 ones from B) = (5 ones)
Result: (6 ones) (12 zeros) (5 ones)
No decompression required—operations work directly on run descriptors.
Word-Aligned Hybrid (WAH) operations:
WAH uses two word types:
WAH AND algorithm:
WHILE both bitmaps have words remaining:
IF both current words are fill words:
result_fill = min_length_fill(A_fill AND B_fill)
output result_fill
ELSE IF one is fill, one is literal:
IF fill is all-zeros:
output zeros (no data needed from literal)
ELSE IF fill is all-ones:
output the literal word
ELSE both are literals:
output (literal_a AND literal_b)
WAH operations are typically 2-5x faster than decompressing first, especially for sparse bitmaps where fill words are common.
Roaring Bitmap operations:
Roaring divides the bitmap into 65,536 chunks (one per 16-bit high-value range). Each chunk uses the best representation for its density:
Roaring AND:
FOR each chunk position:
IF A and B both have the chunk:
IF both are arrays:
result = sorted_intersection(A_array, B_array)
ELSE IF both are bitmaps:
result = bitwise_AND(A_bitmap, B_bitmap)
ELSE IF one is array, one is bitmap:
result = lookup_array_in_bitmap(array, bitmap)
// Similar for run containers
ELSE:
// Chunk missing in one input = empty result for this chunk
Roaring achieves the best of all worlds—fast operations regardless of density.
Roaring bitmaps have become the de facto standard for compressed bitmap operations. They're used in Apache Spark, Apache Lucene/Elasticsearch, InfluxDB, and many other systems. When implementing bitmap indexes, Roaring should be the default choice unless you have specific constraints.
Beyond basic AND/OR/NOT, several advanced operation patterns enable sophisticated query processing:
Threshold AND (T-AND):
Find rows where at least K of N conditions are true:
-- Rows with at least 3 of 5 specified features
SELECT * FROM Products
WHERE (Feature1 = 'Y') + (Feature2 = 'Y') + (Feature3 = 'Y')
+ (Feature4 = 'Y') + (Feature5 = 'Y') >= 3;
Bitmap approach:
bitmap_sum = zeros
for each feature_bitmap:
bitmap_sum = bitmap_add(bitmap_sum, feature_bitmap)
// bitmap_add adds corresponding bits, treating each bit position
// as a multi-bit counter
result = threshold_filter(bitmap_sum, threshold=3)
This requires multi-bit counters but is still much faster than row-by-row evaluation.
Symmetric Difference (XOR applications):
-- Find rows that changed between two snapshots
SELECT * FROM Data
WHERE (snapshot1.flag = 'Y') XOR (snapshot2.flag = 'Y');
Bitmap operation:
changed_rows = Bitmap_snapshot1_flag_Y XOR Bitmap_snapshot2_flag_Y
Rows where the flag changed (was Y, now not Y, or vice versa) are identified.
Existence testing:
-- Does ANY row match these conditions? (no count needed)
SELECT EXISTS(
SELECT 1 FROM Sales
WHERE Region = 'West' AND Status = 'Rush'
);
Optimized bitmap approach:
bool exists = false;
for each word position i:
uint64_t result = bitmap_west[i] & bitmap_rush[i];
if (result != 0) {
exists = true;
break; // Early exit on first match!
}
No need to process the entire bitmap—exit as soon as any 1-bit is found.
Rank and Select operations:
For some advanced queries, we need:
These support queries like:
// Select: Find position of k-th 1-bit
size_t select(uint64_t* bitmap, size_t k) {
size_t count = 0;
for (size_t i = 0; ; i++) {
size_t word_popcount = __builtin_popcountll(bitmap[i]);
if (count + word_popcount >= k) {
// The k-th bit is in this word
return find_kth_set_bit(bitmap[i], k - count) + (i * 64);
}
count += word_popcount;
}
}
Efficient Rank and Select require auxiliary data structures (succinct data structures) that precompute partial counts. These add ~3-6% space overhead but enable O(1) Rank and O(log n) Select operations. Libraries like SDSL (Succinct Data Structure Library) provide industrial-strength implementations.
We've thoroughly explored bitmap operations—the heart of bitmap index performance. Let's consolidate the key concepts:
What's next:
We've covered how bitmaps work and how operations execute. In the final page, we'll explore data warehouse usage—the real-world context where bitmap indexes deliver their greatest value, including star schema optimization and integration with modern analytical systems.
You now understand the complete operational model of bitmap indexes—from basic logical operations to SIMD optimization to advanced query patterns. Next, we'll see how these capabilities are applied in data warehouse environments.