Loading learning content...
If you've ever typed a search query and received results in under a second, you've benefited from an inverted index. This data structure—elegant in concept, sophisticated in implementation—is the foundation of every modern search engine, from Google to Elasticsearch to the search bar in your email client.
The inverted index is to search what the B-tree is to databases: a fundamental data structure that makes the impossible possible. Without it, searching billions of documents would take days instead of milliseconds. With it, search becomes so fast that we take it for granted.
Yet despite its ubiquity, many engineers have only a surface-level understanding of how inverted indexes work. This page will change that. We'll explore not just the concept, but the implementation details that separate a toy index from a production-grade search engine.
By the end of this page, you will understand: (1) The fundamental structure of inverted indexes and why they enable fast search, (2) The detailed components: term dictionaries, posting lists, skip lists, and stored fields, (3) How indexes are built, updated, and maintained, (4) Advanced optimizations: compression, block encoding, and skip pointers, (5) Trade-offs in index design and how they impact performance.
To understand inverted indexes, we must first understand what we're inverting. Let's start with the natural way documents are stored.
The Forward Index (How Documents Are Naturally Organized):
When you store documents, they're naturally organized by document ID:
Document 1: "The quick brown fox jumps over the lazy dog"
Document 2: "Quick brown foxes are faster than lazy dogs"
Document 3: "The dog chased the fox around the park"
The Search Problem:
If a user searches for "fox", we need to find all documents containing that word. With a forward index, we must scan every document:
Time complexity: O(N × M), where N = number of documents, M = average document length.
This doesn't scale. Scanning 1 billion documents × 1000 words each = 1 trillion comparisons.
Instead of asking 'what words are in this document?', we invert the question: 'what documents contain this word?' By pre-computing the answer to this inverted question for every word, we can answer search queries with a simple lookup instead of a scan.
The Inverted Index (Flipped Relationship):
An inverted index maps each term to the list of documents containing it:
"the" → [Doc1, Doc3, Doc3] (positions matter for phrases)
"quick" → [Doc1, Doc2]
"brown" → [Doc1, Doc2]
"fox" → [Doc1, Doc2*, Doc3] (*"foxes" stemmed to "fox")
"jump" → [Doc1] ("jumps" stemmed to "jump")
"over" → [Doc1]
"lazy" → [Doc1, Doc2]
"dog" → [Doc1, Doc2*, Doc3] (*"dogs" stemmed to "dog")
"fast" → [Doc2] ("faster" stemmed to "fast")
"chase" → [Doc3] ("chased" stemmed to "chase")
"around" → [Doc3]
"park" → [Doc3]
The Search Solution:
Now searching for "fox" is a simple dictionary lookup:
Time complexity: O(log V + P), where V = vocabulary size, P = posting list length.
For most queries, this is dramatically faster than scanning all documents.
| Aspect | Forward Index | Inverted Index |
|---|---|---|
| Organization | Document → Terms | Term → Documents |
| Search Time | O(N × M) - scan all docs | O(log V + P) - dictionary lookup |
| Update Time | O(1) - append doc | O(T) - update T term postings |
| Storage | Compact (original docs) | Larger (redundant term storage) |
| Query Type | Full document retrieval | Term-based search |
A production inverted index consists of several interconnected components. Understanding each component is essential for optimizing search performance.
Term Dictionary Deep Dive:
The term dictionary is the entry point for every search. It must support:
Common implementations:
| Structure | Lookup | Prefix | Memory | Notes |
|---|---|---|---|---|
| Sorted Array | O(log n) | O(log n + k) | Low | Simple but slow updates |
| B-tree | O(log n) | O(log n + k) | Medium | Good all-around performance |
| Trie | O(m) | O(m + k) | High | Fast prefix, high memory |
| FST (Lucene) | O(m) | O(m + k) | Very Low | Best compression, complex |
Lucene uses a Finite State Transducer (FST) for its term dictionary—a remarkable data structure that provides both memory efficiency and fast lookups. An FST is like a trie but shares suffixes as well as prefixes, dramatically reducing memory usage. For a 1 million term vocabulary, an FST might use 2-5MB versus 50-100MB for a naive trie.
Posting lists contain the actual document references for each term. Their structure and encoding significantly impact both storage efficiency and query performance.
Basic Posting List Structure:
At minimum, a posting list contains document IDs:
term "fox" → [1, 2, 3, 17, 42, 100, 105, 1000, 1001, 1005]
Enhanced Posting List with Metadata:
Production systems store additional information per posting:
term "fox" → [
{ docId: 1, freq: 2, positions: [5, 42] },
{ docId: 2, freq: 1, positions: [10] },
{ docId: 3, freq: 3, positions: [1, 15, 88] },
...
]
| Level | Data Stored | Enables | Storage Cost |
|---|---|---|---|
| Docs only | Document IDs | Boolean search | Lowest |
| Doc IDs + term freq | TF-IDF ranking | +20-40% |
| Doc IDs + freq + positions | Phrase queries, proximity | +100-200% |
| Doc IDs + freq + positions + char offsets | Highlighting | +50-100% |
| Custom per-position data | Advanced scoring | Variable |
Why Sorted Document IDs Matter:
Posting lists are sorted by document ID. This enables:
Query: "quick" AND "fox"
"quick" posting: [1, 2, 4, 7, 10]
"fox" posting: [1, 3, 4, 6, 10, 15]
Merge intersection:
Compare 1, 1 → Match! Add 1
Compare 2, 3 → Advance first
Compare 4, 3 → Advance second
Compare 4, 4 → Match! Add 4
Compare 7, 6 → Advance second
Compare 7, 10 → Advance first
Compare 10, 10 → Match! Add 10
Result: [1, 4, 10]
Time complexity: O(m + n) for two lists of length m and n—dramatically better than O(m × n) for unsorted lists.
Original: [1, 2, 3, 17, 42, 100, 105, 1000, 1001, 1005]
Delta encoded: [1, 1, 1, 14, 25, 58, 5, 895, 1, 4]
Smaller values compress better using variable-byte encoding.
Some terms appear in millions of documents (e.g., 'the', 'is', 'a'). Their posting lists are enormous and expensive to traverse. Solutions: (1) Stop word removal (don't index them), (2) Frequency-based skipping (skip documents unlikely to score highly), (3) Term caching (keep hot posting lists in memory).
Raw posting list traversal is O(n) for a list of n documents. For long posting lists, this is too slow. Two key optimizations accelerate traversal: skip lists and block encoding.
Skip Lists: Jumping Ahead
A skip list adds express lanes that allow jumping over multiple postings:
Level 2: 1─────────────────────────42────────────────────────►1000
Level 1: 1─────────17──────42────────────105─────────────────►1000
Level 0: 1──2──3──17──42──100──105──1000──1001──1005
To find doc 105:
Level 2: 1 → 42 (skip, 42 < 105) → 1000 (too far, go down)
Level 1: 42 → 105 (found!)
Steps: 4 instead of 8
With skip lists, we can answer "is document X in this posting?" in O(log n) instead of O(n).
When skip lists help:
Block Encoding: Compression + Speed
Modern indexes (Lucene, Elasticsearch) use block encoding:
Posting list: [1, 2, 3, ..., 128, 129, 130, ..., 256, ...]
Block 0: docs 1-128, minId=1, maxId=128, compressed data
Block 1: docs 129-256, minId=129, maxId=256, compressed data
...
To find doc 200:
Scan block metadata: Block 0 max=128 (skip), Block 1 max=256 (contains 200)
Decompress Block 1, scan for 200
Benefits of block encoding:
Lucene 9.x Posting List Structure (Simplified) ┌─────────────────────────────────────────────────────────────┐│ Term Dictionary (FST) ││ "fox" → Pointer to posting list │└──────────────────────────┬──────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Skip Data (Optional) ││ Every 128 docs: [skip docId, skip offset, ...] │└──────────────────────────┬──────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Doc Blocks ││ Block 0: [128 doc IDs, delta + PFOR encoded] ││ Block 1: [128 doc IDs, delta + PFOR encoded] ││ ... ││ Final Block: [remaining doc IDs, vInt encoded] │└──────────────────────────┬──────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Frequency Blocks ││ Block 0: [128 frequencies, packed] ││ Block 1: [128 frequencies, packed] ││ ... │└──────────────────────────┬──────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Position Blocks ││ Per-doc positions: [position deltas, encoded] │└─────────────────────────────────────────────────────────────┘Lucene uses PFOR-Delta (Patched Frame-of-Reference Delta) encoding for posting lists. It delta-encodes document IDs (small gaps), then bit-packs the deltas using the minimum bits needed for most values, with 'exceptions' for outliers. This achieves 2-4 bits per integer for typical posting lists while maintaining fast decompression.
Building an inverted index from a document collection is a multi-stage process. Understanding this process illuminates why indexing can be slow and how it's optimized.
The Basic Algorithm:
In-Memory Indexing (Simple but Limited):
index = defaultdict(list)
for doc_id, document in enumerate(documents):
terms = analyze(document) # tokenize, stem, etc.
for term in terms:
index[term].append(doc_id)
# Now index["fox"] = [1, 2, 3, ...]
This works for small collections but fails when the index doesn't fit in memory.
External Merge Sort for Large Collections:
For collections larger than memory, we use external sorting:
Phase 1: Build and Write Sorted Runs
┌─────────────────────────────────────────────────────┐
│ Read batch of docs → Build in-memory index │
│ → Write sorted run to disk → Repeat │
└─────────────────────────────────────────────────────┘
Run 1: [("apple", 1), ("apple", 3), ("banana", 2), ...]
Run 2: [("apple", 5), ("banana", 4), ("cherry", 6), ...]
Run 3: [("banana", 7), ("dog", 8), ("dog", 9), ...]
Phase 2: Merge Runs
┌─────────────────────────────────────────────────────┐
│ K-way merge: Read from all runs simultaneously │
│ Output: Final sorted posting list per term │
└─────────────────────────────────────────────────────┘
Final: "apple" → [1, 3, 5], "banana" → [2, 4, 7], ...
Time complexity: O(n log n) comparisons, O(n) disk I/O. Space: O(memory size) at any time, any collection size.
| Scale | Strategy | Time | Notes |
|---|---|---|---|
| < 1M docs | In-memory | Seconds-minutes | Simple, single machine |
| 1M - 100M docs | External sort | Minutes-hours | Single machine, streaming |
| 100M - 1B docs | Distributed (Spark/MR) | Hours | Cluster, parallel merge |
1B docs | Continuous incremental | Ongoing | Never 'done', always updating |
Lucene doesn't build one giant index. Instead, it creates small "segments" as documents arrive, each a self-contained mini-index. Background merge processes combine small segments into larger ones. This allows near-real-time indexing (new docs are searchable within seconds) while maintaining query efficiency (older docs in large, optimized segments).
Inverted indexes face a fundamental challenge: they're optimized for immutability, but real systems need updates and deletes. Understanding how search engines handle mutability is crucial for schema design and performance tuning.
The Immutability Problem:
Consider updating a document with a new term:
Original: Doc 5: "the quick brown fox"
Updated: Doc 5: "the slow brown dog"
Required changes:
- Remove Doc 5 from "quick" posting list
- Remove Doc 5 from "fox" posting list
- Add Doc 5 to "slow" posting list
- Add Doc 5 to "dog" posting list
If posting lists are stored as compressed, sorted arrays in the middle of a file, these updates require rewriting entire lists—potentially the entire index.
Solution 1: Delete + Re-add
Most search engines treat updates as delete + add:
Before update:
"quick" → [1, 3, 5, 7] Doc 5: active
After update:
"quick" → [1, 3, 5, 7] Doc 5: marked deleted (but still in list)
"slow" → [5'] Doc 5': new version (new internal ID)
Query for "quick":
Get postings [1, 3, 5, 7] → Filter deleted → Return [1, 3, 7]
Pros: Simple, no posting list rewrites Cons: Deleted docs waste space until merge; extra filtering at query time
Solution 2: Segment-Based Architecture
Lucene's approach uses immutable segments:
Timeline:
T0: Segment 1 created [Doc1, Doc2, Doc3]
T1: Segment 2 created [Doc4, Doc5, Doc6]
T2: Doc2 deleted → Segment 1 delete bitmap set for Doc2
T3: Doc5 updated → Segment 2 delete bitmap set for Doc5
→ Segment 3 created [Doc5']
T4: Merge (1, 2, 3) → Segment 4 [Doc1, Doc3, Doc4, Doc5', Doc6]
(Doc2 and old Doc5 physically removed)
This approach balances write efficiency with eventual compaction.
| Operation | Immediate Effect | Full Effect |
|---|---|---|
| Add document | Indexed in new segment | Visible after refresh |
| Delete document | Marked in delete bitmap | Removed at merge |
| Update document | Old deleted, new added | Converges at merge |
| Update single field | Delete + full re-add | Same as update |
Most search engines don't support partial field updates at the index level. Updating one field requires re-indexing the entire document. This has schema design implications: frequently-updated fields might be separated from stable content, or stored outside the search index entirely.
Compression is essential for inverted indexes. Uncompressed indexes would be larger than the original documents and too slow to traverse. Effective compression reduces storage by 5-10x while often improving query speed through better cache utilization.
Delta Encoding: The Foundation
Posting lists are sorted by document ID. Consecutive IDs are often close together. Delta encoding stores differences instead of absolute values:
Original IDs: [100, 102, 107, 108, 150, 151, 152, 200]
Deltas (d-gaps): [100, 2, 5, 1, 42, 1, 1, 48]
Deltas are typically much smaller than absolute IDs, requiring fewer bits.
Integer Compression Benchmark (Realistic Posting List) Posting list: 1 million document IDs (gaps average 100)Uncompressed: 4 bytes × 1M = 4 MB Algorithm Compressed Size Decode Speed (M ints/sec)─────────────────────────────────────────────────────────────Variable Byte 1.8 MB 800Simple-16 1.4 MB 600PForDelta 1.0 MB 2000 ← Lucene defaultBinary Interpolative 0.8 MB 400Roaring Bitmap 1.2 MB 3000+ ← Set operations Note: PForDelta is often fastest despite good compression becauseit decodes blocks of 128/256 integers at once, enabling SIMD.Term Dictionary Compression:
Term dictionaries also benefit from compression:
Front coding: Store common prefixes once. "abstract", "abstraction", "abstracts" → "abstract" + "|ion" + "|s"
Block compression: Group terms into blocks, compress each block. Store first term of each block for seeking.
Finite State Transducers (FST): Share both prefixes AND suffixes. Achieves remarkable compression for natural language dictionaries.
Stored Field Compression:
Document content (for retrieval) is compressed using general-purpose algorithms: LZ4 (fast, moderate compression) or Zstandard (slower, better compression).
Counter-intuitively, compression often makes search faster. Why? Modern CPUs are fast; I/O is slow. A 5x smaller index means 5x fewer bytes read from disk or cache. Even after decompression overhead, total time is often lower. This is why modern search engines use aggressive compression by default.
Real documents have multiple fields with different characteristics. Understanding how fields map to index structures is essential for schema design.
Per-Field Inverted Indexes:
Each searchable field has its own inverted index:
Document:
{
"title": "Quick Brown Fox Guide",
"body": "This comprehensive guide covers quick brown foxes...",
"tags": ["animals", "nature"],
"author": "Jane Smith",
"date": "2024-01-15"
}
Resulting indexes:
title_index: "quick" → [doc1], "brown" → [doc1], "fox" → [doc1], ...
body_index: "quick" → [doc1], "comprehensive" → [doc1], "guide" → [doc1], ...
tags_index: "animals" → [doc1], "nature" → [doc1]
author_index: "jane" → [doc1], "smith" → [doc1]
date_index: (Stored as BKD tree for range queries)
Why separate indexes?
title:fox vs body:fox)| Field Type | Index Structure | Typical Analysis | Query Types |
|---|---|---|---|
| Title / Name | Inverted index | Tokenize, lowercase, light stemming | Full-text, phrase |
| Body / Content | Inverted index | Tokenize, stem, stop words | Full-text |
| Exact keyword | Inverted index | None (exact value) | Term match, aggregation |
| Numeric | BKD tree / Points | None | Range, sort |
| Date/time | BKD tree | Timestamp conversion | Range, sort |
| Geo location | BKD tree / Geohash | Coordinates | Distance, bounding box |
| Vector / Embedding | HNSW / IVF | Normalization | Similarity search |
The _all Field Pattern:
Some systems create a synthetic "_all" field containing tokens from all other fields:
_all_index: Contains tokens from title + body + tags + author
This allows simple queries (q=fox) without specifying which field to search. The trade-off is increased index size and loss of field-specific weighting.
Modern alternative: copy_to
Elasticsearch's copy_to allows selecting which fields contribute to multi-field search, avoiding the overhead of indexing everything twice.
Cross-Field Scoring Challenges:
When search spans multiple fields, scoring becomes complex:
Solutions include:
Every field you index consumes disk and memory. Every analysis chain adds indexing latency. Every field-specific query requires its own index lookup. Design schemas with query patterns in mind: index only what you'll search, analyze appropriately, and avoid over-indexing 'just in case'.
We've explored the inverted index from concept to implementation detail. Let's consolidate the essential knowledge:
The Mental Model:
Think of an inverted index as a massive concordance—like the index at the back of a book, but for every word in millions of documents. The term dictionary is the alphabetical list of terms; the posting lists are the page numbers; skip lists and blocks are like sub-sections that let you jump around quickly.
What's Next:
Now that you understand the data structure powering search, we'll compare search systems to traditional databases. When should you use a search engine versus a database query? What are the fundamental differences in their capabilities? The next page explores Search vs Database Queries—clarifying when to reach for each tool.
You now have deep knowledge of inverted indexes—the data structure that makes modern search possible. You understand term dictionaries, posting lists, skip lists, compression, and per-field design. This knowledge enables informed decisions about search schema design, performance optimization, and system architecture.