System DesignSearch Architecture Overview

Search Architecture Overview

LevelIntermediate

Duration75 mins

TopicSearch Architecture Overview

4 / 4

Search vs Database Queries: Choosing the Right Tool

The Right Tool for the Right Job

One of the most common architectural mistakes is using the wrong tool for data retrieval. Engineers reach for a database when they need search, or bolt search features onto a database that wasn't designed for them. The result is either slow queries, poor relevance, or complex workarounds that become maintenance nightmares.

The confusion is understandable. Both databases and search engines:

Store data
Accept queries
Return results

But they are optimized for fundamentally different access patterns. Understanding these differences—deeply, not superficially—is essential for making correct architectural decisions.

This page will give you the clarity to answer: "Should I use a database query or a search engine?" with confidence.

What You Will Master

By the end of this page, you will understand: (1) The fundamental architectural differences between search engines and databases, (2) The types of queries each excels at, (3) When to use search, when to use a database, and when to use both, (4) How to design systems that leverage the strengths of each, (5) Common anti-patterns and how to avoid them.

Fundamental Differences: Architecture and Purpose

Before comparing specific capabilities, let's understand the foundational differences in how search engines and databases are designed.

Architectural Philosophy Comparison
Aspect	Relational Database (e.g., PostgreSQL)	Search Engine (e.g., Elasticsearch)
Primary purpose	Store and retrieve structured data with ACID guarantees	Find relevant documents quickly via text matching
Data model	Tables with rows, columns, relationships, schemas	Documents (JSON), denormalized, schema-flexible
Index strategy	B-trees for sorted access, hash indexes for equality	Inverted indexes for term lookup, BKD for ranges
Consistency model	Strong consistency (ACID transactions)	Eventual consistency (near-real-time)
Query language	SQL (declarative, relational algebra)	Query DSL (JSON-based, search-oriented)
Primary optimization	Minimize disk I/O, maximize transaction throughput	Maximize query speed, relevance ranking
Scaling approach	Vertical first, horizontal with complexity	Horizontal by design (sharding + replication)

The Index Structure Divide:

The most fundamental difference is the underlying index structure:

B-tree (Database):

Optimized for range scans and exact matches on indexed columns
Efficient for queries like WHERE price BETWEEN 10 AND 50 or WHERE id = 123
Poor for full-text search: WHERE content LIKE '%machine learning%' requires full scan

Inverted Index (Search Engine):

Optimized for term lookup: "which documents contain 'machine' AND 'learning'?"
Excellent for full-text search, fuzzy matching, relevance scoring
Less efficient for arbitrary range queries on numeric data (though modern search engines add specialized indexes)

The Core Trade-off

Databases optimize for transactional integrity and structured queries—they ensure your data is correct and let you ask precise questions. Search engines optimize for relevance and speed—they help users find what they're looking for even when the query is imprecise. These are complementary, not competing, goals.

Query Capability Comparison

Let's compare specific query types and how each system handles them:

Full-Text Search Comparison:

Query: Find products matching "wireless bluetooth headphones noise cancelling"

Database (PostgreSQL with full-text):

SELECT * FROM products 
WHERE to_tsvector('english', description) @@ 
      plainto_tsquery('english', 'wireless bluetooth headphones noise cancelling')
ORDER BY ts_rank(...) DESC
LIMIT 20;

✓ Works for basic full-text search ✗ Limited relevance tuning ✗ No fuzzy matching without extensions ✗ Analysis options are limited ✗ Performance degrades with scale

Search Engine (Elasticsearch):

{
  "query": {
    "multi_match": {
      "query": "wireless bluetooth headphones noise cancelling",
      "fields": ["title^3", "description", "features"],
      "fuzziness": "AUTO",
      "operator": "and"
    }
  }
}

✓ Rich relevance tuning (field boosting, function scores) ✓ Fuzzy matching built-in ✓ Synonyms, stemming, language detection ✓ Scales horizontally ✓ Sub-100ms at billion-document scale

The Denormalization Trade-off

Search engines require denormalized data because they don't support joins. This means: (1) Duplicated data across documents, (2) Update complexity (changing a customer name requires updating all their orders), (3) Potential inconsistency during updates. This is acceptable when read performance matters more than write simplicity—which is true for most search use cases.

Relevance and Ranking: The Search Engine Superpower

The most significant capability gap between search engines and databases is relevance ranking. Search engines are fundamentally designed to return results ordered by relevance—a concept foreign to traditional databases.

What is Relevance?

Relevance is the degree to which a result satisfies the user's information need. It's subjective, context-dependent, and multidimensional:

Textual relevance: How well does the document match the query terms?
Quality signals: Is this a high-quality, authoritative document?
Freshness: Is this content recent and up-to-date?
Personalization: Is this relevant to this user specifically?
Popularity: Do other users find this useful?

Search engines compute relevance scores by combining these signals using sophisticated ranking algorithms.

Database Approach: Boolean Matching

-- Database returns unranked matches (or ordered by arbitrary column)
SELECT * FROM products 
WHERE title ILIKE '%headphones%' 
ORDER BY created_at DESC;  -- Ordered by date, NOT relevance

Databases return rows that match predicates, but they don't inherently understand "which match is better." You can add weights manually, but it's clunky:

-- Attempt at relevance scoring in SQL
SELECT *, 
  (CASE WHEN title ILIKE '%headphones%' THEN 2 ELSE 0 END +
   CASE WHEN description ILIKE '%headphones%' THEN 1 ELSE 0 END) AS score
FROM products 
WHERE title ILIKE '%headphones%' OR description ILIKE '%headphones%'
ORDER BY score DESC;

This approach:

Doesn't consider term frequency
Doesn't handle multi-word queries well
Doesn't adapt to query length or document length
Becomes unmanageable as complexity grows

Search Engine Approach: Scoring Functions

Search engines use well-researched scoring functions:

BM25 (Best Match 25):

score(D, Q) = Σ IDF(qi) × (tf(qi, D) × (k1 + 1)) / (tf(qi, D) + k1 × (1 - b + b × |D|/avgdl))

This considers:

tf (term frequency): How often the term appears in this document
IDF (inverse document frequency): How rare the term is across all documents
Document length normalization: Longer documents don't get unfair advantage
Tunable parameters: k1 and b can be adjusted for different use cases

Beyond BM25:

Field boosting: Title matches score 3x body matches
Function scores: Boost recent documents, popular items, in-stock products
Personalization: Boost based on user history
Learning to Rank: ML models trained on click data

Position Matters Enormously

In search results, position 1 gets ~30% of clicks, position 2 gets ~15%, position 10 gets ~2%. The difference between a good and great ranking algorithm can mean millions of dollars in e-commerce, or the difference between users finding answers or abandoning your product. This is why search engines invest so heavily in ranking—it's existential.

Performance Characteristics

Understanding performance profiles helps you predict system behavior and make capacity decisions.

Performance Profile Comparison
Metric	Database (Typical)	Search Engine (Typical)
Point lookup by ID	< 1ms	1-5ms
Simple filtered query	1-10ms	5-20ms
Full-text search (1M docs)	100ms-seconds	5-20ms
Full-text search (1B docs)	Not practical	50-200ms
Complex join (5 tables)	10-100ms	N/A (no joins)
Write/insert latency	< 10ms (durable)	< 100ms (indexed)
Write throughput	10K-100K writes/sec	5K-50K docs/sec
Time to visibility	Immediate (committed)	1-5 seconds (refresh)

Key Performance Insights:

1. Search engines win at text search, especially at scale

Inverted indexes make text search O(log n + k) regardless of corpus size. A database LIKE query is O(n)—it gets slower linearly with data size.

1 million products:
  Database LIKE: 500ms
  Elasticsearch: 10ms

100 million products:
  Database LIKE: 50 seconds
  Elasticsearch: 15ms

2. Databases win at transactional consistency

Read-after-write consistency is trivial in databases. In search engines, a document indexed now may not be visible for 1-5 seconds (configurable, but with trade-offs).

3. Databases win at complex relational queries

Multi-table joins, subqueries, window functions—databases have mature query planners optimized for these. Search engines don't even attempt them.

4. Search engines scale horizontally more naturally

Adding shards and replicas in Elasticsearch is configuration. Adding nodes to a PostgreSQL cluster requires careful architectural work (read replicas, partitioning, connection pooling).

Database Full-Text Limitations

PostgreSQL's full-text search is often 'good enough' for small datasets (<100K documents) with simple requirements. Beyond that, performance degrades, and feature gaps (fuzzy matching, language analysis, relevance tuning) become painful. Many teams start with database full-text, then migrate to a search engine when scale or requirements exceed its capabilities.

Consistency and Durability: The ACID Gap

One of the most significant differences is how each system handles data consistency and durability.

Database: ACID Guarantees

•Atomicity — Transactions are all-or-nothing. Multi-row updates either fully commit or fully rollback.
•Consistency — Constraints (foreign keys, checks) are enforced. Data integrity is guaranteed.
•Isolation — Concurrent transactions don't see each other's uncommitted changes.
•Durability — Committed data survives crash. Write is not acknowledged until on disk.

Search Engine: Eventual Consistency

•No transactions — Each document indexed independently. No multi-document atomicity.
•Near-real-time — Documents become searchable after refresh (1-5s default), not immediately.
•Eventual replica sync — Replicas may lag behind primary. Read from replica may return stale data.
•Optimistic concurrency — Version conflicts require application handling.

Practical Implications:

Scenario 1: E-commerce order

1. User places order (database INSERT)
2. Order appears in 'My Orders' immediately (database SELECT)
3. Order appears in search after refresh (1-5 seconds)

Acceptable: The critical transactional data is in the database. Search visibility lag is tolerable.

Scenario 2: Chat message

1. User sends message (database INSERT)
2. Message visible to recipient (database SELECT or real-time push)
3. Message searchable in history (search, after refresh)

Acceptable: Real-time delivery doesn't depend on search index.

Scenario 3: Financial trading

1. Trade executed (database, requires ACID)
2. Trade visible in portfolio (database, must be immediate and consistent)
3. Trade searchable in history (secondary concern)

Search is tertiary here. The database handles all critical paths.

Search Is Not a System of Record

Never use a search engine as your primary data store for transactional data. The database is the source of truth; the search index is a derived, optimized view for search queries. If the search index is lost, you should be able to rebuild it from the database. This separation protects data integrity.

When to Use Which: Decision Framework

Let's crystallize the decision into clear guidelines:

Use a Database When You Need...

•ACID transactions — Multi-row atomic operations, foreign key constraints, strict consistency
•Relational queries — Joins across normalized tables, complex subqueries, recursive CTEs
•Immediate consistency — Read-your-writes guarantee is essential
•Primary data storage — This is the authoritative source of truth for the data
•Simple exact-match queries — Key-value lookups, simple filtered selects
•Small-scale full-text — <100K documents, simple search requirements

Use a Search Engine When You Need...

•Full-text search at scale — Millions/billions of documents, sub-second response
•Relevance ranking — Results ordered by how well they match user intent
•Fuzzy matching — Handle typos, synonyms, language variations
•Faceted navigation — Filter + aggregate simultaneously (e.g., e-commerce facets)
•Autocomplete/typeahead — Prefix matching with fast suggestions
•Log/event analytics — High-volume time-series with text fields
•Geographic search — Combined text + location queries

Quick Decision Guide
Use Case	Primary Tool	Why
User authentication	Database	Transactional, exact match, strong consistency
Order management	Database	ACID transactions, relational data
Product search	Search engine	Full-text, relevance ranking, facets
Log aggregation	Search engine	High volume, text analysis, time-series
User profile storage	Database	Structured data, foreign keys, updates
Document repository search	Search engine	Full-text, file content, relevance
Shopping cart	Database	Transactions, immediate consistency
Autocomplete suggestions	Search engine	Prefix matching, speed
Analytics dashboards	Database or Search	Depends on data freshness, query complexity
Content recommendations	Search engine (or ML)	Similarity matching, personalization

The Hybrid Sweet Spot

Most production systems use BOTH. The database is the source of truth for transactional operations. The search engine is a derived, denormalized projection optimized for search queries. Changes flow: Database → Change Data Capture → Search Index. This architecture leverages the strengths of each.

Hybrid Architectures: Using Both Effectively

In practice, most systems combine databases and search engines. Let's explore common patterns for integrating them.

Pattern 1: Database as Source of Truth, Search for Queries

┌─────────────┐     Write      ┌─────────────┐
│   Service   │ ──────────────► │  Database   │
└─────────────┘                 └──────┬──────┘
      ▲                                │
      │ Search Query              CDC / ETL
      │                                │
┌─────┴─────┐                   ┌──────▼──────┐
│  Search   │ ◄──── Sync ────── │   Search    │
│  Results  │                   │   Engine    │
└───────────┘                   └─────────────┘

How it works:

All writes go to the database (ACID guaranteed)
Change Data Capture (CDC) or ETL syncs changes to search
Search queries go directly to search engine
Detailed lookups may go back to database

Pros: Clear separation of concerns, each system does what it's best at Cons: Sync latency (seconds to minutes), sync complexity

Implementation options:

Debezium for CDC from PostgreSQL/MySQL to Kafka → Elasticsearch
Application-level dual-write (careful: can diverge)
Periodic full reindex (simple but slow)

Pattern 2: Search for Discovery, Database for Details

User searches "blue Nike running shoes" → Elasticsearch
  ↓ Returns [product_id: 123, 456, 789] with scores
  
User clicks product 123 → PostgreSQL
  ↓ Fetch full product details, inventory, reviews
  
User adds to cart → PostgreSQL
  ↓ Transaction: update cart, check inventory

Search handles the "needle in haystack" problem; database handles transactional operations and detailed retrieval.

Pattern 3: Search IDs, Database for Consistency

Search: "available flights SFO → NYC" → Elasticsearch
  ↓ Returns candidate flight IDs based on text/facet match
  
Filter: Check real-time availability → PostgreSQL
  ↓ Database has authoritative seat counts
  
Book: Reserve seat → PostgreSQL
  ↓ Transaction with row locks, ACID guarantee

Search narrows candidates; database provides consistent state for booking.

Avoid Dual-Write Anti-pattern

Writing to both database AND search engine in the same operation is risky. If one fails, they diverge. Solutions: (1) Write to database, sync to search via CDC (eventual consistency), (2) Write to message queue, consume by both (eventual, but ordered), (3) Accept that search may lag and design for it.

Pattern 4: Polyglot Persistence

Different data stores for different access patterns:

┌─────────────────────────────────────────────────────────────────┐
│                        Application Layer                        │
└───────────────────────────────┬─────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        ▼                       ▼                       ▼
┌───────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   PostgreSQL  │     │  Elasticsearch  │     │      Redis      │
│  Transactions │     │     Search      │     │     Caching     │
│     OLTP      │     │   Full-text     │     │    Sessions     │
└───────────────┘     └─────────────────┘     └─────────────────┘

Each store is optimized for its access pattern. The application orchestrates.

Anti-patterns to Avoid

Learning from common mistakes helps you avoid costly architectural decisions:

Common Anti-patterns

•Using LIKE '%term%' for search at scale — This requires full table scans. It worked in development with 1000 rows but times out in production with 10 million rows. Use a real search engine.
•Using Elasticsearch as primary data store — No ACID, no joins, eventual consistency. Data can be lost in edge cases. Use a database for authoritative storage.
•Synchronous dual-writes — Writing to both DB and search in the same request. If search is slow/down, your writes are slow/fail. Decouple with async sync.
•Ignoring search index sync — Assuming search is always in sync with database. Users see stale data and wonder why their changes aren't showing.
•Forcing relational patterns on search — Trying to do joins or transactions in Elasticsearch. It's not designed for this. Denormalize or use database.
•Using database full-text as "temporary" — "We'll add Elasticsearch later." Later never comes, and you're stuck with slow search. Evaluate requirements early.
•Not considering search at schema design — Designing database schema without considering how search will be indexed. Leads to complex sync logic later.

Case Study: The Stale Search Problem

Scenario: E-commerce site. User updates product inventory to 0 (sold out). Customer searches for product, sees it, tries to buy, gets error "out of stock."

Root cause: Database updated, but search index hasn't refreshed yet. Customer sees stale data.

Solutions:

Accept latency, improve UX: Show "availability may have changed" message. Check DB at cart/checkout.
Force refresh on critical updates: After setting inventory=0, force search refresh (expensive, use sparingly).
Real-time inventory overlay: Search returns candidates; a fast inventory service filters unavailable items before display.
Don't index inventory: Keep volatile fields in fast cache/DB, not in search. Search for discovery, DB for availability.

Design for the Sync Gap

Accept that search and database WILL have a sync gap. Design your UX and application logic to handle this gracefully. The gap isn't a bug—it's a fundamental trade-off for search performance. Work with it, not against it.

Summary: Choosing the Right Tool

We've explored the fundamental differences between search engines and databases. Let's consolidate the essential guidance:

Key Takeaways

•Different indexes for different queries — B-trees for range/equality; inverted indexes for text lookup. Choose based on query patterns.
•Relevance ranking is search-native — Databases return matches; search engines return ranked matches. If ranking matters, use search.
•ACID vs eventual consistency — Databases guarantee transactions; search engines prioritize speed over consistency. Know what your data needs.
•Search engines don't do joins — If your data is relational, keep it in a database. Denormalize for search or accept application-level joins.
•Use both in hybrid architectures — Database as source of truth, search as optimized query layer. CDC/ETL keeps them in sync.
•Design for the sync gap — Search will lag database. Build UX and logic that handles this gracefully.

The Decision Heuristic:

Does the query involve:
  ├─ Free-form text matching? → Search Engine
  ├─ Relevance ranking? → Search Engine  
  ├─ Faceted filtering + aggregation? → Search Engine
  ├─ Transactions / ACID? → Database
  ├─ Multi-table joins? → Database
  ├─ Strong consistency? → Database
  └─ Both categories? → Hybrid (both)

Module Complete:

You've now completed the Search Architecture Overview module. You understand search system components, the indexing vs querying paths, inverted index internals, and how search engines differ from databases. This foundation equips you for the subsequent modules: full-text search techniques, Elasticsearch deep dive, indexing strategies, relevance tuning, and autocomplete implementation.

Module Complete

Congratulations! You've mastered Search Architecture Overview. You can now: identify search system components and their roles, explain the indexing vs querying trade-off, describe inverted index structure and optimizations, and choose between search engines and databases for different use cases. You're ready to dive deeper into full-text search techniques.

4 / 4

Loading learning content...

System DesignSearch Architecture Overview

Search Architecture Overview

LevelIntermediate

Duration75 mins

TopicSearch Architecture Overview

4 / 4

Search vs Database Queries: Choosing the Right Tool

The Right Tool for the Right Job

The confusion is understandable. Both databases and search engines:

Store data
Accept queries
Return results

But they are optimized for fundamentally different access patterns. Understanding these differences—deeply, not superficially—is essential for making correct architectural decisions.

This page will give you the clarity to answer: "Should I use a database query or a search engine?" with confidence.

What You Will Master

Fundamental Differences: Architecture and Purpose

Before comparing specific capabilities, let's understand the foundational differences in how search engines and databases are designed.

Architectural Philosophy Comparison
Aspect	Relational Database (e.g., PostgreSQL)	Search Engine (e.g., Elasticsearch)
Primary purpose	Store and retrieve structured data with ACID guarantees	Find relevant documents quickly via text matching
Data model	Tables with rows, columns, relationships, schemas	Documents (JSON), denormalized, schema-flexible
Index strategy	B-trees for sorted access, hash indexes for equality	Inverted indexes for term lookup, BKD for ranges
Consistency model	Strong consistency (ACID transactions)	Eventual consistency (near-real-time)
Query language	SQL (declarative, relational algebra)	Query DSL (JSON-based, search-oriented)
Primary optimization	Minimize disk I/O, maximize transaction throughput	Maximize query speed, relevance ranking
Scaling approach	Vertical first, horizontal with complexity	Horizontal by design (sharding + replication)

The Index Structure Divide:

The most fundamental difference is the underlying index structure:

B-tree (Database):

Optimized for range scans and exact matches on indexed columns
Efficient for queries like WHERE price BETWEEN 10 AND 50 or WHERE id = 123
Poor for full-text search: WHERE content LIKE '%machine learning%' requires full scan

Inverted Index (Search Engine):

Optimized for term lookup: "which documents contain 'machine' AND 'learning'?"
Excellent for full-text search, fuzzy matching, relevance scoring
Less efficient for arbitrary range queries on numeric data (though modern search engines add specialized indexes)

The Core Trade-off

Query Capability Comparison

Let's compare specific query types and how each system handles them:

Full-Text Search Comparison:

Query: Find products matching "wireless bluetooth headphones noise cancelling"

Database (PostgreSQL with full-text):

SELECT * FROM products 
WHERE to_tsvector('english', description) @@ 
      plainto_tsquery('english', 'wireless bluetooth headphones noise cancelling')
ORDER BY ts_rank(...) DESC
LIMIT 20;

✓ Works for basic full-text search ✗ Limited relevance tuning ✗ No fuzzy matching without extensions ✗ Analysis options are limited ✗ Performance degrades with scale

Search Engine (Elasticsearch):

{
  "query": {
    "multi_match": {
      "query": "wireless bluetooth headphones noise cancelling",
      "fields": ["title^3", "description", "features"],
      "fuzziness": "AUTO",
      "operator": "and"
    }
  }
}

✓ Rich relevance tuning (field boosting, function scores) ✓ Fuzzy matching built-in ✓ Synonyms, stemming, language detection ✓ Scales horizontally ✓ Sub-100ms at billion-document scale

The Denormalization Trade-off

Relevance and Ranking: The Search Engine Superpower

What is Relevance?

Relevance is the degree to which a result satisfies the user's information need. It's subjective, context-dependent, and multidimensional:

Textual relevance: How well does the document match the query terms?
Quality signals: Is this a high-quality, authoritative document?
Freshness: Is this content recent and up-to-date?
Personalization: Is this relevant to this user specifically?
Popularity: Do other users find this useful?

Search engines compute relevance scores by combining these signals using sophisticated ranking algorithms.

Database Approach: Boolean Matching

-- Database returns unranked matches (or ordered by arbitrary column)
SELECT * FROM products 
WHERE title ILIKE '%headphones%' 
ORDER BY created_at DESC;  -- Ordered by date, NOT relevance

Databases return rows that match predicates, but they don't inherently understand "which match is better." You can add weights manually, but it's clunky:

-- Attempt at relevance scoring in SQL
SELECT *, 
  (CASE WHEN title ILIKE '%headphones%' THEN 2 ELSE 0 END +
   CASE WHEN description ILIKE '%headphones%' THEN 1 ELSE 0 END) AS score
FROM products 
WHERE title ILIKE '%headphones%' OR description ILIKE '%headphones%'
ORDER BY score DESC;

This approach:

Doesn't consider term frequency
Doesn't handle multi-word queries well
Doesn't adapt to query length or document length
Becomes unmanageable as complexity grows

Search Engine Approach: Scoring Functions

Search engines use well-researched scoring functions:

BM25 (Best Match 25):

score(D, Q) = Σ IDF(qi) × (tf(qi, D) × (k1 + 1)) / (tf(qi, D) + k1 × (1 - b + b × |D|/avgdl))

This considers:

tf (term frequency): How often the term appears in this document
IDF (inverse document frequency): How rare the term is across all documents
Document length normalization: Longer documents don't get unfair advantage
Tunable parameters: k1 and b can be adjusted for different use cases

Beyond BM25:

Field boosting: Title matches score 3x body matches
Function scores: Boost recent documents, popular items, in-stock products
Personalization: Boost based on user history
Learning to Rank: ML models trained on click data

Position Matters Enormously

Performance Characteristics

Understanding performance profiles helps you predict system behavior and make capacity decisions.

Performance Profile Comparison
Metric	Database (Typical)	Search Engine (Typical)
Point lookup by ID	< 1ms	1-5ms
Simple filtered query	1-10ms	5-20ms
Full-text search (1M docs)	100ms-seconds	5-20ms
Full-text search (1B docs)	Not practical	50-200ms
Complex join (5 tables)	10-100ms	N/A (no joins)
Write/insert latency	< 10ms (durable)	< 100ms (indexed)
Write throughput	10K-100K writes/sec	5K-50K docs/sec
Time to visibility	Immediate (committed)	1-5 seconds (refresh)

Key Performance Insights:

1. Search engines win at text search, especially at scale

Inverted indexes make text search O(log n + k) regardless of corpus size. A database LIKE query is O(n)—it gets slower linearly with data size.

1 million products:
  Database LIKE: 500ms
  Elasticsearch: 10ms

100 million products:
  Database LIKE: 50 seconds
  Elasticsearch: 15ms

2. Databases win at transactional consistency

Read-after-write consistency is trivial in databases. In search engines, a document indexed now may not be visible for 1-5 seconds (configurable, but with trade-offs).

3. Databases win at complex relational queries

Multi-table joins, subqueries, window functions—databases have mature query planners optimized for these. Search engines don't even attempt them.

4. Search engines scale horizontally more naturally

Adding shards and replicas in Elasticsearch is configuration. Adding nodes to a PostgreSQL cluster requires careful architectural work (read replicas, partitioning, connection pooling).

Database Full-Text Limitations

Consistency and Durability: The ACID Gap

One of the most significant differences is how each system handles data consistency and durability.

Database: ACID Guarantees

•Atomicity — Transactions are all-or-nothing. Multi-row updates either fully commit or fully rollback.
•Consistency — Constraints (foreign keys, checks) are enforced. Data integrity is guaranteed.
•Isolation — Concurrent transactions don't see each other's uncommitted changes.
•Durability — Committed data survives crash. Write is not acknowledged until on disk.

Search Engine: Eventual Consistency

•No transactions — Each document indexed independently. No multi-document atomicity.
•Near-real-time — Documents become searchable after refresh (1-5s default), not immediately.
•Eventual replica sync — Replicas may lag behind primary. Read from replica may return stale data.
•Optimistic concurrency — Version conflicts require application handling.

Practical Implications:

Scenario 1: E-commerce order

1. User places order (database INSERT)
2. Order appears in 'My Orders' immediately (database SELECT)
3. Order appears in search after refresh (1-5 seconds)

Acceptable: The critical transactional data is in the database. Search visibility lag is tolerable.

Scenario 2: Chat message

1. User sends message (database INSERT)
2. Message visible to recipient (database SELECT or real-time push)
3. Message searchable in history (search, after refresh)

Acceptable: Real-time delivery doesn't depend on search index.

Scenario 3: Financial trading

1. Trade executed (database, requires ACID)
2. Trade visible in portfolio (database, must be immediate and consistent)
3. Trade searchable in history (secondary concern)

Search is tertiary here. The database handles all critical paths.

Search Is Not a System of Record

When to Use Which: Decision Framework

Let's crystallize the decision into clear guidelines:

Use a Database When You Need...

•ACID transactions — Multi-row atomic operations, foreign key constraints, strict consistency
•Relational queries — Joins across normalized tables, complex subqueries, recursive CTEs
•Immediate consistency — Read-your-writes guarantee is essential
•Primary data storage — This is the authoritative source of truth for the data
•Simple exact-match queries — Key-value lookups, simple filtered selects
•Small-scale full-text — <100K documents, simple search requirements

Use a Search Engine When You Need...

•Full-text search at scale — Millions/billions of documents, sub-second response
•Relevance ranking — Results ordered by how well they match user intent
•Fuzzy matching — Handle typos, synonyms, language variations
•Faceted navigation — Filter + aggregate simultaneously (e.g., e-commerce facets)
•Autocomplete/typeahead — Prefix matching with fast suggestions
•Log/event analytics — High-volume time-series with text fields
•Geographic search — Combined text + location queries

Quick Decision Guide
Use Case	Primary Tool	Why
User authentication	Database	Transactional, exact match, strong consistency
Order management	Database	ACID transactions, relational data
Product search	Search engine	Full-text, relevance ranking, facets
Log aggregation	Search engine	High volume, text analysis, time-series
User profile storage	Database	Structured data, foreign keys, updates
Document repository search	Search engine	Full-text, file content, relevance
Shopping cart	Database	Transactions, immediate consistency
Autocomplete suggestions	Search engine	Prefix matching, speed
Analytics dashboards	Database or Search	Depends on data freshness, query complexity
Content recommendations	Search engine (or ML)	Similarity matching, personalization

The Hybrid Sweet Spot

Hybrid Architectures: Using Both Effectively

In practice, most systems combine databases and search engines. Let's explore common patterns for integrating them.

Pattern 1: Database as Source of Truth, Search for Queries

┌─────────────┐     Write      ┌─────────────┐
│   Service   │ ──────────────► │  Database   │
└─────────────┘                 └──────┬──────┘
      ▲                                │
      │ Search Query              CDC / ETL
      │                                │
┌─────┴─────┐                   ┌──────▼──────┐
│  Search   │ ◄──── Sync ────── │   Search    │
│  Results  │                   │   Engine    │
└───────────┘                   └─────────────┘

How it works:

All writes go to the database (ACID guaranteed)
Change Data Capture (CDC) or ETL syncs changes to search
Search queries go directly to search engine
Detailed lookups may go back to database

Pros: Clear separation of concerns, each system does what it's best at Cons: Sync latency (seconds to minutes), sync complexity

Implementation options:

Debezium for CDC from PostgreSQL/MySQL to Kafka → Elasticsearch
Application-level dual-write (careful: can diverge)
Periodic full reindex (simple but slow)

Pattern 2: Search for Discovery, Database for Details

User searches "blue Nike running shoes" → Elasticsearch
  ↓ Returns [product_id: 123, 456, 789] with scores
  
User clicks product 123 → PostgreSQL
  ↓ Fetch full product details, inventory, reviews
  
User adds to cart → PostgreSQL
  ↓ Transaction: update cart, check inventory

Search handles the "needle in haystack" problem; database handles transactional operations and detailed retrieval.

Pattern 3: Search IDs, Database for Consistency

Search: "available flights SFO → NYC" → Elasticsearch
  ↓ Returns candidate flight IDs based on text/facet match
  
Filter: Check real-time availability → PostgreSQL
  ↓ Database has authoritative seat counts
  
Book: Reserve seat → PostgreSQL
  ↓ Transaction with row locks, ACID guarantee

Search narrows candidates; database provides consistent state for booking.

Avoid Dual-Write Anti-pattern

Pattern 4: Polyglot Persistence

Different data stores for different access patterns:

┌─────────────────────────────────────────────────────────────────┐
│                        Application Layer                        │
└───────────────────────────────┬─────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        ▼                       ▼                       ▼
┌───────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   PostgreSQL  │     │  Elasticsearch  │     │      Redis      │
│  Transactions │     │     Search      │     │     Caching     │
│     OLTP      │     │   Full-text     │     │    Sessions     │
└───────────────┘     └─────────────────┘     └─────────────────┘

Each store is optimized for its access pattern. The application orchestrates.

Anti-patterns to Avoid

Learning from common mistakes helps you avoid costly architectural decisions:

Common Anti-patterns

•Using LIKE '%term%' for search at scale — This requires full table scans. It worked in development with 1000 rows but times out in production with 10 million rows. Use a real search engine.
•Using Elasticsearch as primary data store — No ACID, no joins, eventual consistency. Data can be lost in edge cases. Use a database for authoritative storage.
•Synchronous dual-writes — Writing to both DB and search in the same request. If search is slow/down, your writes are slow/fail. Decouple with async sync.
•Ignoring search index sync — Assuming search is always in sync with database. Users see stale data and wonder why their changes aren't showing.
•Forcing relational patterns on search — Trying to do joins or transactions in Elasticsearch. It's not designed for this. Denormalize or use database.
•Using database full-text as "temporary" — "We'll add Elasticsearch later." Later never comes, and you're stuck with slow search. Evaluate requirements early.
•Not considering search at schema design — Designing database schema without considering how search will be indexed. Leads to complex sync logic later.

Case Study: The Stale Search Problem

Scenario: E-commerce site. User updates product inventory to 0 (sold out). Customer searches for product, sees it, tries to buy, gets error "out of stock."

Root cause: Database updated, but search index hasn't refreshed yet. Customer sees stale data.

Solutions:

Accept latency, improve UX: Show "availability may have changed" message. Check DB at cart/checkout.
Force refresh on critical updates: After setting inventory=0, force search refresh (expensive, use sparingly).
Real-time inventory overlay: Search returns candidates; a fast inventory service filters unavailable items before display.
Don't index inventory: Keep volatile fields in fast cache/DB, not in search. Search for discovery, DB for availability.

Design for the Sync Gap

Summary: Choosing the Right Tool

We've explored the fundamental differences between search engines and databases. Let's consolidate the essential guidance:

Key Takeaways

•Different indexes for different queries — B-trees for range/equality; inverted indexes for text lookup. Choose based on query patterns.
•Relevance ranking is search-native — Databases return matches; search engines return ranked matches. If ranking matters, use search.
•ACID vs eventual consistency — Databases guarantee transactions; search engines prioritize speed over consistency. Know what your data needs.
•Search engines don't do joins — If your data is relational, keep it in a database. Denormalize for search or accept application-level joins.
•Use both in hybrid architectures — Database as source of truth, search as optimized query layer. CDC/ETL keeps them in sync.
•Design for the sync gap — Search will lag database. Build UX and logic that handles this gracefully.

The Decision Heuristic:

Does the query involve:
  ├─ Free-form text matching? → Search Engine
  ├─ Relevance ranking? → Search Engine  
  ├─ Faceted filtering + aggregation? → Search Engine
  ├─ Transactions / ACID? → Database
  ├─ Multi-table joins? → Database
  ├─ Strong consistency? → Database
  └─ Both categories? → Hybrid (both)

Module Complete:

Module Complete

4 / 4