System DesignSearch Architecture Overview

Search Architecture Overview

LevelIntermediate

Duration75 mins

TopicSearch Architecture Overview

3 / 4

Inverted Indexes: The Heart of Search

The Most Important Data Structure You've Never Heard Of

If you've ever typed a search query and received results in under a second, you've benefited from an inverted index. This data structure—elegant in concept, sophisticated in implementation—is the foundation of every modern search engine, from Google to Elasticsearch to the search bar in your email client.

The inverted index is to search what the B-tree is to databases: a fundamental data structure that makes the impossible possible. Without it, searching billions of documents would take days instead of milliseconds. With it, search becomes so fast that we take it for granted.

Yet despite its ubiquity, many engineers have only a surface-level understanding of how inverted indexes work. This page will change that. We'll explore not just the concept, but the implementation details that separate a toy index from a production-grade search engine.

What You Will Master

By the end of this page, you will understand: (1) The fundamental structure of inverted indexes and why they enable fast search, (2) The detailed components: term dictionaries, posting lists, skip lists, and stored fields, (3) How indexes are built, updated, and maintained, (4) Advanced optimizations: compression, block encoding, and skip pointers, (5) Trade-offs in index design and how they impact performance.

The Inversion Concept: Flipping the Relationship

To understand inverted indexes, we must first understand what we're inverting. Let's start with the natural way documents are stored.

The Forward Index (How Documents Are Naturally Organized):

When you store documents, they're naturally organized by document ID:

Document 1: "The quick brown fox jumps over the lazy dog"
Document 2: "Quick brown foxes are faster than lazy dogs"
Document 3: "The dog chased the fox around the park"

The Search Problem:

If a user searches for "fox", we need to find all documents containing that word. With a forward index, we must scan every document:

Check Document 1: Contains "fox"? Yes → match
Check Document 2: Contains "foxes"? Yes (with stemming) → match
Check Document 3: Contains "fox"? Yes → match

Time complexity: O(N × M), where N = number of documents, M = average document length.

This doesn't scale. Scanning 1 billion documents × 1000 words each = 1 trillion comparisons.

The Key Insight

Instead of asking 'what words are in this document?', we invert the question: 'what documents contain this word?' By pre-computing the answer to this inverted question for every word, we can answer search queries with a simple lookup instead of a scan.

The Inverted Index (Flipped Relationship):

An inverted index maps each term to the list of documents containing it:

"the"    → [Doc1, Doc3, Doc3]  (positions matter for phrases)
"quick"  → [Doc1, Doc2]
"brown"  → [Doc1, Doc2]
"fox"    → [Doc1, Doc2*, Doc3] (*"foxes" stemmed to "fox")
"jump"   → [Doc1]              ("jumps" stemmed to "jump")
"over"   → [Doc1]
"lazy"   → [Doc1, Doc2]
"dog"    → [Doc1, Doc2*, Doc3] (*"dogs" stemmed to "dog")
"fast"   → [Doc2]              ("faster" stemmed to "fast")
"chase"  → [Doc3]              ("chased" stemmed to "chase")
"around" → [Doc3]
"park"   → [Doc3]

The Search Solution:

Now searching for "fox" is a simple dictionary lookup:

Look up "fox" in the term dictionary
Retrieve the posting list: [Doc1, Doc2, Doc3]
Return results

Time complexity: O(log V + P), where V = vocabulary size, P = posting list length.

For most queries, this is dramatically faster than scanning all documents.

Forward Index vs Inverted Index
Aspect	Forward Index	Inverted Index
Organization	Document → Terms	Term → Documents
Search Time	O(N × M) - scan all docs	O(log V + P) - dictionary lookup
Update Time	O(1) - append doc	O(T) - update T term postings
Storage	Compact (original docs)	Larger (redundant term storage)
Query Type	Full document retrieval	Term-based search

Anatomy of an Inverted Index

A production inverted index consists of several interconnected components. Understanding each component is essential for optimizing search performance.

Converting Mermaid diagram...

Core Index Components

•Term Dictionary — A sorted lookup structure mapping terms to posting list locations. Typically implemented as a trie, FST (Finite State Transducer), or B-tree. Must support efficient prefix queries and term lookups.
•Posting Lists — For each term, a list of documents containing that term. Usually sorted by document ID for efficient intersection. May include additional data: term frequency, positions, offsets.
•Document Store — Storage for original document content or selected fields. Enables returning document data in search results. May be row-oriented (full docs) or columnar (individual fields).
•Field Norms — Precomputed field length normalization factors. Used in relevance scoring (shorter fields get boosted). One value per document per field.
•Term Vectors — Per-document term frequency data. Enables features like 'more like this'. Optional—increases storage significantly.

Term Dictionary Deep Dive:

The term dictionary is the entry point for every search. It must support:

Exact lookup: Find posting list for term "fox"
Prefix queries: Find all terms starting with "fox" (fox, foxes, foxy...)
Range queries: Find all terms between "apple" and "banana"
Fuzzy matching: Find terms similar to "fxo" (misspelling of "fox")

Common implementations:

Structure	Lookup	Prefix	Memory	Notes
Sorted Array	O(log n)	O(log n + k)	Low	Simple but slow updates
B-tree	O(log n)	O(log n + k)	Medium	Good all-around performance
Trie	O(m)	O(m + k)	High	Fast prefix, high memory
FST (Lucene)	O(m)	O(m + k)	Very Low	Best compression, complex

Lucene's Finite State Transducer (FST)

Lucene uses a Finite State Transducer (FST) for its term dictionary—a remarkable data structure that provides both memory efficiency and fast lookups. An FST is like a trie but shares suffixes as well as prefixes, dramatically reducing memory usage. For a 1 million term vocabulary, an FST might use 2-5MB versus 50-100MB for a naive trie.

Posting Lists: Where the Data Lives

Posting lists contain the actual document references for each term. Their structure and encoding significantly impact both storage efficiency and query performance.

Basic Posting List Structure:

At minimum, a posting list contains document IDs:

term "fox" → [1, 2, 3, 17, 42, 100, 105, 1000, 1001, 1005]

Enhanced Posting List with Metadata:

Production systems store additional information per posting:

term "fox" → [
  { docId: 1, freq: 2, positions: [5, 42] },
  { docId: 2, freq: 1, positions: [10] },
  { docId: 3, freq: 3, positions: [1, 15, 88] },
  ...
]

freq (term frequency): How many times the term appears in this document. Used for TF-IDF scoring.
positions: Where the term appears in the document. Enables phrase queries.

Posting List Data Levels
Level	Data Stored	Enables	Storage Cost
Docs only	Document IDs	Boolean search	Lowest
Frequencies	Doc IDs + term freq	TF-IDF ranking	+20-40%
Positions	Doc IDs + freq + positions	Phrase queries, proximity	+100-200%
Offsets	Doc IDs + freq + positions + char offsets	Highlighting	+50-100%
Payloads	Custom per-position data	Advanced scoring	Variable

Why Sorted Document IDs Matter:

Posting lists are sorted by document ID. This enables:

Efficient Intersection: Boolean AND queries require finding documents in ALL posting lists. With sorted lists, we use merge-join:

Query: "quick" AND "fox"
"quick" posting: [1, 2, 4, 7, 10]
"fox" posting:   [1, 3, 4, 6, 10, 15]

Merge intersection:
  Compare 1, 1 → Match! Add 1
  Compare 2, 3 → Advance first
  Compare 4, 3 → Advance second  
  Compare 4, 4 → Match! Add 4
  Compare 7, 6 → Advance second
  Compare 7, 10 → Advance first
  Compare 10, 10 → Match! Add 10
  
Result: [1, 4, 10]

Time complexity: O(m + n) for two lists of length m and n—dramatically better than O(m × n) for unsorted lists.

Compression: Sorted IDs allow delta encoding:

Original:       [1, 2, 3, 17, 42, 100, 105, 1000, 1001, 1005]
Delta encoded:  [1, 1, 1, 14, 25, 58, 5, 895, 1, 4]

Smaller values compress better using variable-byte encoding.

The High-Frequency Term Problem

Some terms appear in millions of documents (e.g., 'the', 'is', 'a'). Their posting lists are enormous and expensive to traverse. Solutions: (1) Stop word removal (don't index them), (2) Frequency-based skipping (skip documents unlikely to score highly), (3) Term caching (keep hot posting lists in memory).

Skip Lists and Block Encoding: Speed Optimizations

Raw posting list traversal is O(n) for a list of n documents. For long posting lists, this is too slow. Two key optimizations accelerate traversal: skip lists and block encoding.

Skip Lists: Jumping Ahead

A skip list adds express lanes that allow jumping over multiple postings:

Level 2:  1─────────────────────────42────────────────────────►1000
Level 1:  1─────────17──────42────────────105─────────────────►1000
Level 0:  1──2──3──17──42──100──105──1000──1001──1005

To find doc 105:
  Level 2: 1 → 42 (skip, 42 < 105) → 1000 (too far, go down)
  Level 1: 42 → 105 (found!)
  
Steps: 4 instead of 8

With skip lists, we can answer "is document X in this posting?" in O(log n) instead of O(n).

When skip lists help:

Intersection of posting lists of different sizes
Seeking to a specific document ID
Early termination in top-K queries

Block Encoding: Compression + Speed

Modern indexes (Lucene, Elasticsearch) use block encoding:

Divide posting list into fixed-size blocks (e.g., 128 documents)
Store block metadata: min docId, max docId, compressed size
Compress each block using efficient integer compression

Posting list: [1, 2, 3, ..., 128, 129, 130, ..., 256, ...]

Block 0: docs 1-128, minId=1, maxId=128, compressed data
Block 1: docs 129-256, minId=129, maxId=256, compressed data
...

To find doc 200:
  Scan block metadata: Block 0 max=128 (skip), Block 1 max=256 (contains 200)
  Decompress Block 1, scan for 200

Benefits of block encoding:

Skip entire blocks based on metadata
SIMD-friendly decompression (process 128 integers at once)
Balance between compression and access speed

lucene-posting-structure.txt

Structure

Lucene 9.x Posting List Structure (Simplified)
 
┌─────────────────────────────────────────────────────────────┐
│                     Term Dictionary (FST)                    │
│  "fox" → Pointer to posting list                            │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    Skip Data (Optional)                      │
│  Every 128 docs: [skip docId, skip offset, ...]             │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                      Doc Blocks                              │
│  Block 0: [128 doc IDs, delta + PFOR encoded]               │
│  Block 1: [128 doc IDs, delta + PFOR encoded]               │
│  ...                                                         │
│  Final Block: [remaining doc IDs, vInt encoded]             │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   Frequency Blocks                           │
│  Block 0: [128 frequencies, packed]                         │
│  Block 1: [128 frequencies, packed]                         │
│  ...                                                         │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   Position Blocks                            │
│  Per-doc positions: [position deltas, encoded]              │
└─────────────────────────────────────────────────────────────┘

PFOR-Delta: The Compression Workhorse

Lucene uses PFOR-Delta (Patched Frame-of-Reference Delta) encoding for posting lists. It delta-encodes document IDs (small gaps), then bit-packs the deltas using the minimum bits needed for most values, with 'exceptions' for outliers. This achieves 2-4 bits per integer for typical posting lists while maintaining fast decompression.

Building Inverted Indexes: The Construction Process

Building an inverted index from a document collection is a multi-stage process. Understanding this process illuminates why indexing can be slow and how it's optimized.

The Basic Algorithm:

Parse documents — Extract text from each document
Analyze text — Tokenize, filter, normalize to produce terms
Build postings — For each (term, docId) pair, create a posting
Sort postings — Sort by (term, docId) to group postings by term
Merge and write — Merge sorted postings into final index structure

In-Memory Indexing (Simple but Limited):

index = defaultdict(list)

for doc_id, document in enumerate(documents):
    terms = analyze(document)  # tokenize, stem, etc.
    for term in terms:
        index[term].append(doc_id)

# Now index["fox"] = [1, 2, 3, ...]

This works for small collections but fails when the index doesn't fit in memory.

External Merge Sort for Large Collections:

For collections larger than memory, we use external sorting:

Phase 1: Build and Write Sorted Runs
┌─────────────────────────────────────────────────────┐
│ Read batch of docs → Build in-memory index         │
│ → Write sorted run to disk → Repeat                │
└─────────────────────────────────────────────────────┘

Run 1: [("apple", 1), ("apple", 3), ("banana", 2), ...]
Run 2: [("apple", 5), ("banana", 4), ("cherry", 6), ...]
Run 3: [("banana", 7), ("dog", 8), ("dog", 9), ...]

Phase 2: Merge Runs
┌─────────────────────────────────────────────────────┐
│ K-way merge: Read from all runs simultaneously     │
│ Output: Final sorted posting list per term         │
└─────────────────────────────────────────────────────┘

Final: "apple" → [1, 3, 5], "banana" → [2, 4, 7], ...

Time complexity: O(n log n) comparisons, O(n) disk I/O. Space: O(memory size) at any time, any collection size.

Index Building Optimizations

•Parallel document processing — Parse and analyze multiple documents concurrently. Analysis is CPU-bound and parallelizes well.
•Buffer-based batching — Accumulate postings in memory, flush when buffer is full. Reduces disk I/O frequency.
•Streaming merge — Begin merging early; don't wait for all runs. Overlap I/O with CPU work.
•Incremental indexing — Build small segments as documents arrive; merge in background. Lucene's approach.
•MapReduce for massive scale — Distribute index building across cluster. Each mapper processes documents; reducers merge by term.

Index Building Strategies by Scale
Scale	Strategy	Time	Notes
< 1M docs	In-memory	Seconds-minutes	Simple, single machine
1M - 100M docs	External sort	Minutes-hours	Single machine, streaming
100M - 1B docs	Distributed (Spark/MR)	Hours	Cluster, parallel merge
1B docs	Continuous incremental	Ongoing	Never 'done', always updating

Lucene's Segment Strategy

Lucene doesn't build one giant index. Instead, it creates small "segments" as documents arrive, each a self-contained mini-index. Background merge processes combine small segments into larger ones. This allows near-real-time indexing (new docs are searchable within seconds) while maintaining query efficiency (older docs in large, optimized segments).

Index Updates and Deletes: The Mutability Challenge

Inverted indexes face a fundamental challenge: they're optimized for immutability, but real systems need updates and deletes. Understanding how search engines handle mutability is crucial for schema design and performance tuning.

The Immutability Problem:

Consider updating a document with a new term:

Original: Doc 5: "the quick brown fox"
Updated:  Doc 5: "the slow brown dog"

Required changes:
- Remove Doc 5 from "quick" posting list
- Remove Doc 5 from "fox" posting list  
- Add Doc 5 to "slow" posting list
- Add Doc 5 to "dog" posting list

If posting lists are stored as compressed, sorted arrays in the middle of a file, these updates require rewriting entire lists—potentially the entire index.

Solution 1: Delete + Re-add

Most search engines treat updates as delete + add:

Mark old version as deleted (set a bit in deletion bitmap)
Index new version as a new document
Queries filter out deleted documents

Before update:
  "quick" → [1, 3, 5, 7]   Doc 5: active
  
After update:
  "quick" → [1, 3, 5, 7]   Doc 5: marked deleted (but still in list)
  "slow"  → [5']           Doc 5': new version (new internal ID)
  
Query for "quick":
  Get postings [1, 3, 5, 7] → Filter deleted → Return [1, 3, 7]

Pros: Simple, no posting list rewrites Cons: Deleted docs waste space until merge; extra filtering at query time

Solution 2: Segment-Based Architecture

Lucene's approach uses immutable segments:

New documents go into in-memory segment (fully mutable)
When memory fills, flush to disk as immutable segment
Deletes are recorded in per-segment deletion bitmaps (not in postings)
Merges combine small segments into larger ones, physically removing deleted docs

Timeline:
  T0: Segment 1 created [Doc1, Doc2, Doc3]
  T1: Segment 2 created [Doc4, Doc5, Doc6]
  T2: Doc2 deleted → Segment 1 delete bitmap set for Doc2
  T3: Doc5 updated → Segment 2 delete bitmap set for Doc5
                   → Segment 3 created [Doc5']
  T4: Merge (1, 2, 3) → Segment 4 [Doc1, Doc3, Doc4, Doc5', Doc6]
                       (Doc2 and old Doc5 physically removed)

This approach balances write efficiency with eventual compaction.

Handling Mutability in Search Systems
Operation	Immediate Effect	Full Effect
Add document	Indexed in new segment	Visible after refresh
Delete document	Marked in delete bitmap	Removed at merge
Update document	Old deleted, new added	Converges at merge
Update single field	Delete + full re-add	Same as update

Full Document Updates Only

Most search engines don't support partial field updates at the index level. Updating one field requires re-indexing the entire document. This has schema design implications: frequently-updated fields might be separated from stable content, or stored outside the search index entirely.

Compression Techniques: Fitting More in Less

Compression is essential for inverted indexes. Uncompressed indexes would be larger than the original documents and too slow to traverse. Effective compression reduces storage by 5-10x while often improving query speed through better cache utilization.

Delta Encoding: The Foundation

Posting lists are sorted by document ID. Consecutive IDs are often close together. Delta encoding stores differences instead of absolute values:

Original IDs:     [100, 102, 107, 108, 150, 151, 152, 200]
Deltas (d-gaps):  [100,   2,   5,   1,  42,   1,   1,  48]

Deltas are typically much smaller than absolute IDs, requiring fewer bits.

Integer Compression Algorithms

•Variable Byte (VByte) — Use 7 bits per byte for data, 1 bit as continuation flag. Simple and fast to decode. ~1-5 bytes per integer.
•Simple-9 / Simple-16 — Pack multiple small integers into 32-bit or 64-bit words. Selector bits indicate packing scheme. Good balance of compression and speed.
•PForDelta (Patched Frame-of-Reference) — Compute minimum bits needed for most values, pack using that bit-width, store outliers separately. Excellent compression.
•Binary Interpolative Coding — Recursive scheme exploiting both gaps and remaining range. Best compression for some distributions.
•Roaring Bitmaps — Hybrid approach: use bitmaps for dense regions, arrays for sparse. Excellent for set operations.

compression-comparison.txt

Comparison

Integer Compression Benchmark (Realistic Posting List)
 
Posting list: 1 million document IDs (gaps average 100)
Uncompressed: 4 bytes × 1M = 4 MB
 
Algorithm          Compressed Size    Decode Speed (M ints/sec)
─────────────────────────────────────────────────────────────
Variable Byte          1.8 MB               800
Simple-16              1.4 MB               600
PForDelta              1.0 MB               2000  ← Lucene default
Binary Interpolative   0.8 MB               400
Roaring Bitmap         1.2 MB               3000+ ← Set operations
 
Note: PForDelta is often fastest despite good compression because
it decodes blocks of 128/256 integers at once, enabling SIMD.

Term Dictionary Compression:

Term dictionaries also benefit from compression:

Front coding: Store common prefixes once. "abstract", "abstraction", "abstracts" → "abstract" + "|ion" + "|s"
Block compression: Group terms into blocks, compress each block. Store first term of each block for seeking.
Finite State Transducers (FST): Share both prefixes AND suffixes. Achieves remarkable compression for natural language dictionaries.

Stored Field Compression:

Document content (for retrieval) is compressed using general-purpose algorithms: LZ4 (fast, moderate compression) or Zstandard (slower, better compression).

Compression Improves Speed

Counter-intuitively, compression often makes search faster. Why? Modern CPUs are fast; I/O is slow. A 5x smaller index means 5x fewer bytes read from disk or cache. Even after decompression overhead, total time is often lower. This is why modern search engines use aggressive compression by default.

Multi-Field and Multi-Index Design

Real documents have multiple fields with different characteristics. Understanding how fields map to index structures is essential for schema design.

Per-Field Inverted Indexes:

Each searchable field has its own inverted index:

Document:
{
  "title": "Quick Brown Fox Guide",
  "body": "This comprehensive guide covers quick brown foxes...",
  "tags": ["animals", "nature"],
  "author": "Jane Smith",
  "date": "2024-01-15"
}

Resulting indexes:
  title_index:  "quick" → [doc1], "brown" → [doc1], "fox" → [doc1], ...
  body_index:   "quick" → [doc1], "comprehensive" → [doc1], "guide" → [doc1], ...
  tags_index:   "animals" → [doc1], "nature" → [doc1]
  author_index: "jane" → [doc1], "smith" → [doc1]
  date_index:   (Stored as BKD tree for range queries)

Why separate indexes?

Different analysis per field (title may only lowercase; body may stem)
Different weighting (title matches score higher than body)
Field-specific queries (title:fox vs body:fox)

Field Types and Index Strategies
Field Type	Index Structure	Typical Analysis	Query Types
Title / Name	Inverted index	Tokenize, lowercase, light stemming	Full-text, phrase
Body / Content	Inverted index	Tokenize, stem, stop words	Full-text
Exact keyword	Inverted index	None (exact value)	Term match, aggregation
Numeric	BKD tree / Points	None	Range, sort
Date/time	BKD tree	Timestamp conversion	Range, sort
Geo location	BKD tree / Geohash	Coordinates	Distance, bounding box
Vector / Embedding	HNSW / IVF	Normalization	Similarity search

The _all Field Pattern:

Some systems create a synthetic "_all" field containing tokens from all other fields:

_all_index: Contains tokens from title + body + tags + author

This allows simple queries (q=fox) without specifying which field to search. The trade-off is increased index size and loss of field-specific weighting.

Modern alternative: copy_to

Elasticsearch's copy_to allows selecting which fields contribute to multi-field search, avoiding the overhead of indexing everything twice.

Cross-Field Scoring Challenges:

When search spans multiple fields, scoring becomes complex:

Should a title match score higher than body match?
How do we handle a query where one term is in title, another in body?

Solutions include:

Per-field boosting (title: 2.0x, body: 1.0x)
Best-fields scoring (take highest scoring field per doc)
Cross-fields scoring (treat all fields as one pseudo-field)

Schema Design Is Performance Design

Every field you index consumes disk and memory. Every analysis chain adds indexing latency. Every field-specific query requires its own index lookup. Design schemas with query patterns in mind: index only what you'll search, analyze appropriately, and avoid over-indexing 'just in case'.

Summary: The Inverted Index Mastery

We've explored the inverted index from concept to implementation detail. Let's consolidate the essential knowledge:

Key Takeaways

•Inversion enables search — Flipping from document→terms to term→documents converts O(n) scans to O(log n) lookups.
•Core structure: Dictionary + Postings + Store — Term dictionary for lookups, posting lists for document IDs, document store for retrieval.
•Posting lists are sorted — Enables O(m+n) intersection for AND queries and delta compression for storage.
•Skip lists and blocks accelerate traversal — Jump over irrelevant postings; process blocks with SIMD.
•Segments handle mutability — Immutable segments with deletion bitmaps; merges reclaim space.
•Compression is essential — Delta + variable-length encoding achieves 2-4 bits per posting; improves I/O and cache efficiency.
•Per-field design matters — Each field can have different analysis, weighting, and index structure.

The Mental Model:

Think of an inverted index as a massive concordance—like the index at the back of a book, but for every word in millions of documents. The term dictionary is the alphabetical list of terms; the posting lists are the page numbers; skip lists and blocks are like sub-sections that let you jump around quickly.

What's Next:

Now that you understand the data structure powering search, we'll compare search systems to traditional databases. When should you use a search engine versus a database query? What are the fundamental differences in their capabilities? The next page explores Search vs Database Queries—clarifying when to reach for each tool.

Page Complete

You now have deep knowledge of inverted indexes—the data structure that makes modern search possible. You understand term dictionaries, posting lists, skip lists, compression, and per-field design. This knowledge enables informed decisions about search schema design, performance optimization, and system architecture.

3 / 4

Loading learning content...

System DesignSearch Architecture Overview

Search Architecture Overview

LevelIntermediate

Duration75 mins

TopicSearch Architecture Overview

3 / 4

Inverted Indexes: The Heart of Search

The Most Important Data Structure You've Never Heard Of

What You Will Master

The Inversion Concept: Flipping the Relationship

To understand inverted indexes, we must first understand what we're inverting. Let's start with the natural way documents are stored.

The Forward Index (How Documents Are Naturally Organized):

When you store documents, they're naturally organized by document ID:

Document 1: "The quick brown fox jumps over the lazy dog"
Document 2: "Quick brown foxes are faster than lazy dogs"
Document 3: "The dog chased the fox around the park"

The Search Problem:

If a user searches for "fox", we need to find all documents containing that word. With a forward index, we must scan every document:

Check Document 1: Contains "fox"? Yes → match
Check Document 2: Contains "foxes"? Yes (with stemming) → match
Check Document 3: Contains "fox"? Yes → match

Time complexity: O(N × M), where N = number of documents, M = average document length.

This doesn't scale. Scanning 1 billion documents × 1000 words each = 1 trillion comparisons.

The Key Insight

The Inverted Index (Flipped Relationship):

An inverted index maps each term to the list of documents containing it:

"the"    → [Doc1, Doc3, Doc3]  (positions matter for phrases)
"quick"  → [Doc1, Doc2]
"brown"  → [Doc1, Doc2]
"fox"    → [Doc1, Doc2*, Doc3] (*"foxes" stemmed to "fox")
"jump"   → [Doc1]              ("jumps" stemmed to "jump")
"over"   → [Doc1]
"lazy"   → [Doc1, Doc2]
"dog"    → [Doc1, Doc2*, Doc3] (*"dogs" stemmed to "dog")
"fast"   → [Doc2]              ("faster" stemmed to "fast")
"chase"  → [Doc3]              ("chased" stemmed to "chase")
"around" → [Doc3]
"park"   → [Doc3]

The Search Solution:

Now searching for "fox" is a simple dictionary lookup:

Look up "fox" in the term dictionary
Retrieve the posting list: [Doc1, Doc2, Doc3]
Return results

Time complexity: O(log V + P), where V = vocabulary size, P = posting list length.

For most queries, this is dramatically faster than scanning all documents.

Forward Index vs Inverted Index
Aspect	Forward Index	Inverted Index
Organization	Document → Terms	Term → Documents
Search Time	O(N × M) - scan all docs	O(log V + P) - dictionary lookup
Update Time	O(1) - append doc	O(T) - update T term postings
Storage	Compact (original docs)	Larger (redundant term storage)
Query Type	Full document retrieval	Term-based search

Anatomy of an Inverted Index

A production inverted index consists of several interconnected components. Understanding each component is essential for optimizing search performance.

Converting Mermaid diagram...

Core Index Components

•Term Dictionary — A sorted lookup structure mapping terms to posting list locations. Typically implemented as a trie, FST (Finite State Transducer), or B-tree. Must support efficient prefix queries and term lookups.
•Posting Lists — For each term, a list of documents containing that term. Usually sorted by document ID for efficient intersection. May include additional data: term frequency, positions, offsets.
•Document Store — Storage for original document content or selected fields. Enables returning document data in search results. May be row-oriented (full docs) or columnar (individual fields).
•Field Norms — Precomputed field length normalization factors. Used in relevance scoring (shorter fields get boosted). One value per document per field.
•Term Vectors — Per-document term frequency data. Enables features like 'more like this'. Optional—increases storage significantly.

Term Dictionary Deep Dive:

The term dictionary is the entry point for every search. It must support:

Exact lookup: Find posting list for term "fox"
Prefix queries: Find all terms starting with "fox" (fox, foxes, foxy...)
Range queries: Find all terms between "apple" and "banana"
Fuzzy matching: Find terms similar to "fxo" (misspelling of "fox")

Common implementations:

Structure	Lookup	Prefix	Memory	Notes
Sorted Array	O(log n)	O(log n + k)	Low	Simple but slow updates
B-tree	O(log n)	O(log n + k)	Medium	Good all-around performance
Trie	O(m)	O(m + k)	High	Fast prefix, high memory
FST (Lucene)	O(m)	O(m + k)	Very Low	Best compression, complex

Lucene's Finite State Transducer (FST)

Posting Lists: Where the Data Lives

Posting lists contain the actual document references for each term. Their structure and encoding significantly impact both storage efficiency and query performance.

Basic Posting List Structure:

At minimum, a posting list contains document IDs:

term "fox" → [1, 2, 3, 17, 42, 100, 105, 1000, 1001, 1005]

Enhanced Posting List with Metadata:

Production systems store additional information per posting:

term "fox" → [
  { docId: 1, freq: 2, positions: [5, 42] },
  { docId: 2, freq: 1, positions: [10] },
  { docId: 3, freq: 3, positions: [1, 15, 88] },
  ...
]

freq (term frequency): How many times the term appears in this document. Used for TF-IDF scoring.
positions: Where the term appears in the document. Enables phrase queries.

Posting List Data Levels
Level	Data Stored	Enables	Storage Cost
Docs only	Document IDs	Boolean search	Lowest
Frequencies	Doc IDs + term freq	TF-IDF ranking	+20-40%
Positions	Doc IDs + freq + positions	Phrase queries, proximity	+100-200%
Offsets	Doc IDs + freq + positions + char offsets	Highlighting	+50-100%
Payloads	Custom per-position data	Advanced scoring	Variable

Why Sorted Document IDs Matter:

Posting lists are sorted by document ID. This enables:

Efficient Intersection: Boolean AND queries require finding documents in ALL posting lists. With sorted lists, we use merge-join:

Query: "quick" AND "fox"
"quick" posting: [1, 2, 4, 7, 10]
"fox" posting:   [1, 3, 4, 6, 10, 15]

Merge intersection:
  Compare 1, 1 → Match! Add 1
  Compare 2, 3 → Advance first
  Compare 4, 3 → Advance second  
  Compare 4, 4 → Match! Add 4
  Compare 7, 6 → Advance second
  Compare 7, 10 → Advance first
  Compare 10, 10 → Match! Add 10
  
Result: [1, 4, 10]

Time complexity: O(m + n) for two lists of length m and n—dramatically better than O(m × n) for unsorted lists.

Compression: Sorted IDs allow delta encoding:

Original:       [1, 2, 3, 17, 42, 100, 105, 1000, 1001, 1005]
Delta encoded:  [1, 1, 1, 14, 25, 58, 5, 895, 1, 4]

Smaller values compress better using variable-byte encoding.

The High-Frequency Term Problem

Skip Lists and Block Encoding: Speed Optimizations

Raw posting list traversal is O(n) for a list of n documents. For long posting lists, this is too slow. Two key optimizations accelerate traversal: skip lists and block encoding.

Skip Lists: Jumping Ahead

A skip list adds express lanes that allow jumping over multiple postings:

Level 2:  1─────────────────────────42────────────────────────►1000
Level 1:  1─────────17──────42────────────105─────────────────►1000
Level 0:  1──2──3──17──42──100──105──1000──1001──1005

To find doc 105:
  Level 2: 1 → 42 (skip, 42 < 105) → 1000 (too far, go down)
  Level 1: 42 → 105 (found!)
  
Steps: 4 instead of 8

With skip lists, we can answer "is document X in this posting?" in O(log n) instead of O(n).

When skip lists help:

Intersection of posting lists of different sizes
Seeking to a specific document ID
Early termination in top-K queries

Block Encoding: Compression + Speed

Modern indexes (Lucene, Elasticsearch) use block encoding:

Divide posting list into fixed-size blocks (e.g., 128 documents)
Store block metadata: min docId, max docId, compressed size
Compress each block using efficient integer compression

Posting list: [1, 2, 3, ..., 128, 129, 130, ..., 256, ...]

Block 0: docs 1-128, minId=1, maxId=128, compressed data
Block 1: docs 129-256, minId=129, maxId=256, compressed data
...

To find doc 200:
  Scan block metadata: Block 0 max=128 (skip), Block 1 max=256 (contains 200)
  Decompress Block 1, scan for 200

Benefits of block encoding:

Skip entire blocks based on metadata
SIMD-friendly decompression (process 128 integers at once)
Balance between compression and access speed

lucene-posting-structure.txt

Structure

Lucene 9.x Posting List Structure (Simplified)
 
┌─────────────────────────────────────────────────────────────┐
│                     Term Dictionary (FST)                    │
│  "fox" → Pointer to posting list                            │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    Skip Data (Optional)                      │
│  Every 128 docs: [skip docId, skip offset, ...]             │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                      Doc Blocks                              │
│  Block 0: [128 doc IDs, delta + PFOR encoded]               │
│  Block 1: [128 doc IDs, delta + PFOR encoded]               │
│  ...                                                         │
│  Final Block: [remaining doc IDs, vInt encoded]             │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   Frequency Blocks                           │
│  Block 0: [128 frequencies, packed]                         │
│  Block 1: [128 frequencies, packed]                         │
│  ...                                                         │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   Position Blocks                            │
│  Per-doc positions: [position deltas, encoded]              │
└─────────────────────────────────────────────────────────────┘

PFOR-Delta: The Compression Workhorse

Building Inverted Indexes: The Construction Process

Building an inverted index from a document collection is a multi-stage process. Understanding this process illuminates why indexing can be slow and how it's optimized.

The Basic Algorithm:

Parse documents — Extract text from each document
Analyze text — Tokenize, filter, normalize to produce terms
Build postings — For each (term, docId) pair, create a posting
Sort postings — Sort by (term, docId) to group postings by term
Merge and write — Merge sorted postings into final index structure

In-Memory Indexing (Simple but Limited):

index = defaultdict(list)

for doc_id, document in enumerate(documents):
    terms = analyze(document)  # tokenize, stem, etc.
    for term in terms:
        index[term].append(doc_id)

# Now index["fox"] = [1, 2, 3, ...]

This works for small collections but fails when the index doesn't fit in memory.

External Merge Sort for Large Collections:

For collections larger than memory, we use external sorting:

Phase 1: Build and Write Sorted Runs
┌─────────────────────────────────────────────────────┐
│ Read batch of docs → Build in-memory index         │
│ → Write sorted run to disk → Repeat                │
└─────────────────────────────────────────────────────┘

Run 1: [("apple", 1), ("apple", 3), ("banana", 2), ...]
Run 2: [("apple", 5), ("banana", 4), ("cherry", 6), ...]
Run 3: [("banana", 7), ("dog", 8), ("dog", 9), ...]

Phase 2: Merge Runs
┌─────────────────────────────────────────────────────┐
│ K-way merge: Read from all runs simultaneously     │
│ Output: Final sorted posting list per term         │
└─────────────────────────────────────────────────────┘

Final: "apple" → [1, 3, 5], "banana" → [2, 4, 7], ...

Time complexity: O(n log n) comparisons, O(n) disk I/O. Space: O(memory size) at any time, any collection size.

Index Building Optimizations

•Parallel document processing — Parse and analyze multiple documents concurrently. Analysis is CPU-bound and parallelizes well.
•Buffer-based batching — Accumulate postings in memory, flush when buffer is full. Reduces disk I/O frequency.
•Streaming merge — Begin merging early; don't wait for all runs. Overlap I/O with CPU work.
•Incremental indexing — Build small segments as documents arrive; merge in background. Lucene's approach.
•MapReduce for massive scale — Distribute index building across cluster. Each mapper processes documents; reducers merge by term.

Index Building Strategies by Scale
Scale	Strategy	Time	Notes
< 1M docs	In-memory	Seconds-minutes	Simple, single machine
1M - 100M docs	External sort	Minutes-hours	Single machine, streaming
100M - 1B docs	Distributed (Spark/MR)	Hours	Cluster, parallel merge
1B docs	Continuous incremental	Ongoing	Never 'done', always updating

Lucene's Segment Strategy

Index Updates and Deletes: The Mutability Challenge

The Immutability Problem:

Consider updating a document with a new term:

Original: Doc 5: "the quick brown fox"
Updated:  Doc 5: "the slow brown dog"

Required changes:
- Remove Doc 5 from "quick" posting list
- Remove Doc 5 from "fox" posting list  
- Add Doc 5 to "slow" posting list
- Add Doc 5 to "dog" posting list

If posting lists are stored as compressed, sorted arrays in the middle of a file, these updates require rewriting entire lists—potentially the entire index.

Solution 1: Delete + Re-add

Most search engines treat updates as delete + add:

Mark old version as deleted (set a bit in deletion bitmap)
Index new version as a new document
Queries filter out deleted documents

Before update:
  "quick" → [1, 3, 5, 7]   Doc 5: active
  
After update:
  "quick" → [1, 3, 5, 7]   Doc 5: marked deleted (but still in list)
  "slow"  → [5']           Doc 5': new version (new internal ID)
  
Query for "quick":
  Get postings [1, 3, 5, 7] → Filter deleted → Return [1, 3, 7]

Pros: Simple, no posting list rewrites Cons: Deleted docs waste space until merge; extra filtering at query time

Solution 2: Segment-Based Architecture

Lucene's approach uses immutable segments:

New documents go into in-memory segment (fully mutable)
When memory fills, flush to disk as immutable segment
Deletes are recorded in per-segment deletion bitmaps (not in postings)
Merges combine small segments into larger ones, physically removing deleted docs

Timeline:
  T0: Segment 1 created [Doc1, Doc2, Doc3]
  T1: Segment 2 created [Doc4, Doc5, Doc6]
  T2: Doc2 deleted → Segment 1 delete bitmap set for Doc2
  T3: Doc5 updated → Segment 2 delete bitmap set for Doc5
                   → Segment 3 created [Doc5']
  T4: Merge (1, 2, 3) → Segment 4 [Doc1, Doc3, Doc4, Doc5', Doc6]
                       (Doc2 and old Doc5 physically removed)

This approach balances write efficiency with eventual compaction.

Handling Mutability in Search Systems
Operation	Immediate Effect	Full Effect
Add document	Indexed in new segment	Visible after refresh
Delete document	Marked in delete bitmap	Removed at merge
Update document	Old deleted, new added	Converges at merge
Update single field	Delete + full re-add	Same as update

Full Document Updates Only

Compression Techniques: Fitting More in Less

Delta Encoding: The Foundation

Posting lists are sorted by document ID. Consecutive IDs are often close together. Delta encoding stores differences instead of absolute values:

Original IDs:     [100, 102, 107, 108, 150, 151, 152, 200]
Deltas (d-gaps):  [100,   2,   5,   1,  42,   1,   1,  48]

Deltas are typically much smaller than absolute IDs, requiring fewer bits.

Integer Compression Algorithms

•Variable Byte (VByte) — Use 7 bits per byte for data, 1 bit as continuation flag. Simple and fast to decode. ~1-5 bytes per integer.
•Simple-9 / Simple-16 — Pack multiple small integers into 32-bit or 64-bit words. Selector bits indicate packing scheme. Good balance of compression and speed.
•PForDelta (Patched Frame-of-Reference) — Compute minimum bits needed for most values, pack using that bit-width, store outliers separately. Excellent compression.
•Binary Interpolative Coding — Recursive scheme exploiting both gaps and remaining range. Best compression for some distributions.
•Roaring Bitmaps — Hybrid approach: use bitmaps for dense regions, arrays for sparse. Excellent for set operations.

compression-comparison.txt

Comparison

Integer Compression Benchmark (Realistic Posting List)
 
Posting list: 1 million document IDs (gaps average 100)
Uncompressed: 4 bytes × 1M = 4 MB
 
Algorithm          Compressed Size    Decode Speed (M ints/sec)
─────────────────────────────────────────────────────────────
Variable Byte          1.8 MB               800
Simple-16              1.4 MB               600
PForDelta              1.0 MB               2000  ← Lucene default
Binary Interpolative   0.8 MB               400
Roaring Bitmap         1.2 MB               3000+ ← Set operations
 
Note: PForDelta is often fastest despite good compression because
it decodes blocks of 128/256 integers at once, enabling SIMD.

Term Dictionary Compression:

Term dictionaries also benefit from compression:

Front coding: Store common prefixes once. "abstract", "abstraction", "abstracts" → "abstract" + "|ion" + "|s"
Block compression: Group terms into blocks, compress each block. Store first term of each block for seeking.
Finite State Transducers (FST): Share both prefixes AND suffixes. Achieves remarkable compression for natural language dictionaries.

Stored Field Compression:

Document content (for retrieval) is compressed using general-purpose algorithms: LZ4 (fast, moderate compression) or Zstandard (slower, better compression).

Compression Improves Speed

Multi-Field and Multi-Index Design

Real documents have multiple fields with different characteristics. Understanding how fields map to index structures is essential for schema design.

Per-Field Inverted Indexes:

Each searchable field has its own inverted index:

Document:
{
  "title": "Quick Brown Fox Guide",
  "body": "This comprehensive guide covers quick brown foxes...",
  "tags": ["animals", "nature"],
  "author": "Jane Smith",
  "date": "2024-01-15"
}

Resulting indexes:
  title_index:  "quick" → [doc1], "brown" → [doc1], "fox" → [doc1], ...
  body_index:   "quick" → [doc1], "comprehensive" → [doc1], "guide" → [doc1], ...
  tags_index:   "animals" → [doc1], "nature" → [doc1]
  author_index: "jane" → [doc1], "smith" → [doc1]
  date_index:   (Stored as BKD tree for range queries)

Why separate indexes?

Different analysis per field (title may only lowercase; body may stem)
Different weighting (title matches score higher than body)
Field-specific queries (title:fox vs body:fox)

Field Types and Index Strategies
Field Type	Index Structure	Typical Analysis	Query Types
Title / Name	Inverted index	Tokenize, lowercase, light stemming	Full-text, phrase
Body / Content	Inverted index	Tokenize, stem, stop words	Full-text
Exact keyword	Inverted index	None (exact value)	Term match, aggregation
Numeric	BKD tree / Points	None	Range, sort
Date/time	BKD tree	Timestamp conversion	Range, sort
Geo location	BKD tree / Geohash	Coordinates	Distance, bounding box
Vector / Embedding	HNSW / IVF	Normalization	Similarity search

The _all Field Pattern:

Some systems create a synthetic "_all" field containing tokens from all other fields:

_all_index: Contains tokens from title + body + tags + author

This allows simple queries (q=fox) without specifying which field to search. The trade-off is increased index size and loss of field-specific weighting.

Modern alternative: copy_to

Elasticsearch's copy_to allows selecting which fields contribute to multi-field search, avoiding the overhead of indexing everything twice.

Cross-Field Scoring Challenges:

When search spans multiple fields, scoring becomes complex:

Should a title match score higher than body match?
How do we handle a query where one term is in title, another in body?

Solutions include:

Per-field boosting (title: 2.0x, body: 1.0x)
Best-fields scoring (take highest scoring field per doc)
Cross-fields scoring (treat all fields as one pseudo-field)

Schema Design Is Performance Design

Summary: The Inverted Index Mastery

We've explored the inverted index from concept to implementation detail. Let's consolidate the essential knowledge:

Key Takeaways

•Inversion enables search — Flipping from document→terms to term→documents converts O(n) scans to O(log n) lookups.
•Core structure: Dictionary + Postings + Store — Term dictionary for lookups, posting lists for document IDs, document store for retrieval.
•Posting lists are sorted — Enables O(m+n) intersection for AND queries and delta compression for storage.
•Skip lists and blocks accelerate traversal — Jump over irrelevant postings; process blocks with SIMD.
•Segments handle mutability — Immutable segments with deletion bitmaps; merges reclaim space.
•Compression is essential — Delta + variable-length encoding achieves 2-4 bits per posting; improves I/O and cache efficiency.
•Per-field design matters — Each field can have different analysis, weighting, and index structure.

The Mental Model:

What's Next:

Page Complete

3 / 4