System Design (HLD)Why Databases Matter

Why Databases Matter in System Design

LevelBeginner

Duration60 mins

TopicWhy Databases Matter

3 / 4

Query and Access Patterns

The Questions Your Data Must Answer

Every database stores data. But the questions you ask of that data—and how frequently you ask them—determine whether your system performs brilliantly or struggles pathetically.

Consider two systems storing identical user data:

System A primarily answers: "What is this user's profile?"

System B primarily answers: "Which users are online in this geographic region?"

Same data. Radically different optimal storage strategies. System A wants user data grouped by user ID (document or row per user). System B wants user data indexed by status and geography (requires different indexing, potentially different storage model).

The access pattern is the architecture.

The Central Insight

You cannot evaluate a database without knowing how you'll query it. A database that's perfect for one access pattern may be disastrous for another. This page teaches you to identify, analyze, and design for access patterns—the skill that separates thoughtful system designers from those who pick databases based on hype.

What Are Access Patterns?

An access pattern describes how your application interacts with data: what data is read or written, how it's queried, how often, by whom, and under what conditions.

Components of an access pattern:

Access Pattern Dimensions

•Read vs Write Ratio — Is the workload read-heavy (90% reads), write-heavy (90% writes), or balanced? This affects caching strategy, replication topology, and consistency requirements.
•Query Shape — Point lookups by ID? Range scans? Full-text search? Aggregations? Graph traversals? Each query shape has different optimal data structures.
•Data Access Locality — Do queries typically access related data together (temporal or spatial locality)? Or randomly across the dataset?
•Concurrency Pattern — Many users reading same data? Many users writing to different data? Many users writing to same data (hotspots)?
•Latency Requirements — Must queries return in 10ms? 100ms? 1 second? This determines indexing aggressiveness and caching requirements.
•Throughput Requirements — How many queries per second? Thousands? Millions? This affects scaling strategy.
•Freshness Requirements — Must reads see the latest write? Or is eventual consistency acceptable?

Documenting access patterns:

Before selecting a database or designing a schema, enumerate your access patterns explicitly. For each pattern, document:

Field	Description	Example
Name	Short identifier	"GetUserProfile"
Query	What question is being asked	"Get full profile for user ID X"
Input	What parameters/filters	User ID
Output	What data is returned	User object with all fields
Frequency	How often per second/day	10,000 requests/second
Latency SLA	Required response time	p99 < 50ms
Freshness	Staleness tolerance	< 5 seconds
Write/Read	Is this a read or write	Read

This exercise forces clarity. It also provides the foundation for capacity planning and performance testing.

The 80/20 Rule

Typically, 3-5 access patterns dominate 80%+ of traffic. Identify these dominant patterns first and optimize ruthlessly for them. Secondary patterns can often be served less efficiently without significant user impact.

Common Access Pattern Categories

Access patterns cluster into recognizable categories, each with well-known optimization strategies.

Category 1: Key-Value Lookup

Pattern: Given a known key (user ID, product ID, session token), retrieve the associated value.

Characteristics:

O(1) lookup complexity required
High throughput, low latency
Often cacheable

Optimal for: Redis, DynamoDB, Memcached, or any database with hash index on key.

Anti-pattern for: Full table scans, complex joins, ad-hoc querying.

Category 2: Range Queries

Pattern: Retrieve records within a range (orders from last 7 days, products priced $10-$50, users whose names start with 'A').

Characteristics:

O(log n + k) complexity (find start, scan k results)
Requires sorted index on range column
Often combined with filters

Optimal for: B-tree indexes, sorted sets, time-series databases.

Anti-pattern for: Hash indexes (no ordering), heap scans.

Category 3: Full-Text Search

Pattern: Find documents containing words, phrases, or matching semantic concepts.

Characteristics:

Inverted index required
Relevance scoring matters
Often includes faceting, highlighting, suggestions

Optimal for: Elasticsearch, Solr, PostgreSQL full-text search, Algolia.

Anti-pattern for: Relational LIKE '%term%' queries (O(n) scan).

Category 4: Graph Traversal

Pattern: Navigate relationships (friends of friends, product recommendations based on purchase history, dependency graphs).

Characteristics:

Relationship-centric queries
Variable-depth traversals
Path finding, connectivity analysis

Optimal for: Neo4j, Neptune, JanusGraph, or GraphQL over relational with careful modeling.

Anti-pattern for: Repeated JOINs in relational databases (exploding complexity).

Category 5: Aggregations and Analytics

Pattern: Compute summaries over large datasets (total revenue by region, average session duration, top products by sales).

Characteristics:

Often scans large portions of data
Compute-intensive (SUM, COUNT, AVG, GROUP BY)
Can tolerate higher latency

Optimal for: Column-oriented databases (ClickHouse, BigQuery, Redshift), materialized views.

Anti-pattern for: Row-oriented transactional databases (PostgreSQL can do it, but inefficiently at scale).

Category 6: Time-Series Access

Pattern: Query data organized by time (metrics, logs, IoT sensor readings).

Characteristics:

High write throughput (continuous ingestion)
Queries typically by time range
Recent data accessed more than old data
Often downsampled or rolled up over time

Optimal for: InfluxDB, TimescaleDB, Prometheus, time-partitioned tables.

Anti-pattern for: Generic relational databases without time-based partitioning.

Hybrid Systems

Real applications often need multiple pattern categories. An e-commerce system might need key-value (session), range queries (order history), full-text (product search), and analytics (sales dashboards). This is why polyglot persistence—using multiple databases—is common in mature architectures.

Read Patterns Deep Dive

Most systems are read-heavy, making read optimization crucial. Let's examine read patterns in detail.

Single-Record Reads (Point Queries)

The simplest pattern: fetch one record by primary key.

SELECT * FROM users WHERE user_id = 123;

Optimization strategies:

Index on lookup key (obvious but essential)
Caching layer (Redis, Memcached) for hot records
Read replicas to distribute load
Connection pooling to minimize overhead

Expected performance: Sub-millisecond with proper indexing and caching.

Multi-Record Reads (Batch Queries)

Fetching multiple records by a list of keys:

SELECT * FROM users WHERE user_id IN (1, 2, 3, 4, 5);

Optimization strategies:

Batch requests to reduce round-trips
Parallel fetching across shards/replicas
Multi-get operations in key-value stores

Anti-patterns:

Issuing N separate queries instead of one batch query
Unbounded IN clauses (thousands of IDs)

Range Scans

Fetching records within a range:

SELECT * FROM orders 
WHERE user_id = 123 
  AND created_at >= '2024-01-01' 
  AND created_at < '2024-02-01';

Optimization strategies:

Composite index on (user_id, created_at)
LIMIT clause to bound result size
Pagination with cursors for large result sets

Key insight: Index column order matters. For this query, (user_id, created_at) is optimal. (created_at, user_id) would be suboptimal—it narrows by date first, then filters by user.

Joins and Denormalization Trade-off

Join-heavy reads:

SELECT o.*, u.name, p.title 
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2024-01-01';

The trade-off:

Normalized (with JOINs): Smaller storage, no data duplication, complex reads
Denormalized (pre-joined): Faster reads, storage overhead, write complexity

Decision framework:

Read-heavy and joins are bottleneck → Denormalize
Write-heavy and consistency critical → Keep normalized
Mixed → Consider materialized views or CQRS pattern

The N+1 Query Problem

A classic anti-pattern: fetching a list of N parents, then issuing N separate queries for children. If you have 100 orders and issue 100 queries to get order items, you've made 101 queries instead of 2. This is visible in application code, especially with ORMs. Always look for batch fetch or eager loading options.

Write Patterns Deep Dive

Write patterns are often more challenging than reads because writes must maintain durability and consistency guarantees.

Single-Record Writes

Inserting or updating one record:

INSERT INTO orders (user_id, product_id, amount) VALUES (123, 456, 99.99);
UPDATE users SET last_active = NOW() WHERE user_id = 123;

Optimization strategies:

Batch commits (group multiple writes in one transaction)
Asynchronous writes where durability requirements allow
Write-behind caching for frequent updates to same records

Watch for: Lock contention on frequently-updated rows.

Bulk/Batch Writes

Loading large amounts of data:

INSERT INTO events (event_type, data, timestamp) VALUES 
  ('click', '...', NOW()),
  ('view', '...', NOW()),
  ('purchase', '...', NOW()),
  ... (thousands more);

Optimization strategies:

Disable indexes during bulk load, rebuild after
Use COPY (PostgreSQL) or LOAD DATA (MySQL) for fastest ingestion
Partition target tables to isolate write impact
Use staging tables for validation before merge

Anti-patterns:

Individual INSERT statements in a loop
Auto-commit on every row
Writing to heavily-indexed tables without optimization

High-Frequency Updates (Hotspots)

Many concurrent writes to the same record:

-- 1000 concurrent requests
UPDATE counters SET value = value + 1 WHERE counter_id = 'global_page_views';

The problem: Row-level locks serialize updates, creating a bottleneck. Even with optimistic locking, retry storms occur.

Solutions:

Sharded counters: Spread across multiple rows, sum when reading
Write-behind accumulation: Buffer in memory, periodically flush to database
Eventually consistent counters: Use Redis INCR, periodically persist
Probabilistic counting: HyperLogLog for approximate unique counts

Transactional Writes

Multiple writes that must succeed or fail together:

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'from_account';
UPDATE accounts SET balance = balance + 100 WHERE id = 'to_account';
INSERT INTO transfers (from_id, to_id, amount) VALUES ('from_account', 'to_account', 100);
COMMIT;

Key considerations:

Lock ordering to prevent deadlocks
Transaction duration (shorter is better for concurrency)
Retry strategy for failed transactions
Read-your-writes consistency within the transaction

Write Amplification

Every index is updated on every relevant write. A table with 10 indexes requires 11 write operations (1 table + 10 indexes) for each INSERT. This is write amplification. Balance index coverage for reads against write overhead.

Read/Write Ratio and Its Implications

The ratio of reads to writes profoundly affects architecture decisions.

Workload Classification:

Workload Types and Optimization Strategies
Workload Type	Read/Write Ratio	Examples	Optimization Focus
Read-heavy	100:1	Content sites, product catalogs, documentation	Caching, read replicas, denormalization
Balanced	10:1 to 100:1	Social media, e-commerce	Mixed strategies, CQRS consideration
Write-heavy	< 10:1	Logging, analytics ingestion, IoT	Write throughput, partitioning, async processing
Write-intensive	< 1:1	Event sourcing, append-only logs	Append optimization, compaction, streaming

Architecture implications by ratio:

Read-heavy (100:1+):

Aggressive caching (cache hit ratio > 95% achievable)
Multiple read replicas
CDN for static content
Denormalized read models
Consider eventual consistency (most reads serve cached/replicated data)

Balanced (10:1 to 100:1):

Moderate caching with careful invalidation
Primary + few replicas
Materialized views for complex queries
Consider CQRS (Command Query Responsibility Segregation)

Write-heavy (< 10:1):

Minimal indexing (only essential indexes)
Write partitioning/sharding
Async replication acceptable
Batch writes where possible
Consider append-only structures

Write-intensive (< 1:1):

Streaming architecture (Kafka, Kinesis)
Append-only storage
Separate write path from query path entirely
Batch processing for analytics on written data

Read Optimization Tactics

•Add read replicas to distribute load
•Implement multi-layer caching
•Denormalize for common query patterns
•Pre-compute expensive aggregations
•Use content delivery networks

Write Optimization Tactics

•Minimize indexes on write-heavy tables
•Batch writes for efficiency
•Use asynchronous writes where possible
•Partition data across nodes
•Consider append-only patterns

Measure, Don't Assume

Don't guess your read/write ratio—measure it. Production workloads often differ from expectations. A 'definitely read-heavy' application might have a high-volume logging path that makes it write-heavy overall. Instrument early and measure continuously.

Access Patterns Drive Database Selection

The most important database selection criterion isn't popularity, brand, or what your last job used. It's fit with your access patterns.

Database-Pattern Fit Matrix:

Database Selection by Primary Access Pattern
Primary Access Pattern	Strong Fit	Acceptable	Poor Fit
Key-value lookups	Redis, DynamoDB, Aerospike	PostgreSQL, MongoDB	Cassandra (overkill)
Complex relational queries	PostgreSQL, MySQL	MongoDB (with care)	Redis, DynamoDB
Full-text search	Elasticsearch, Algolia	PostgreSQL FTS	MySQL, DynamoDB
Graph traversals	Neo4j, Neptune	PostgreSQL (recursive CTEs)	MySQL, DynamoDB
Time-series metrics	InfluxDB, TimescaleDB	PostgreSQL + partitioning	Generic SQL
High-volume analytics	ClickHouse, BigQuery	PostgreSQL (small scale)	OLTP databases
Document storage	MongoDB, CouchDB	PostgreSQL JSONB	MySQL JSON
High write throughput	Cassandra, ScyllaDB	PostgreSQL (partitioned)	Replicated MySQL

Case study: Choosing for an e-commerce platform

Let's walk through database selection for a fictional e-commerce system:

Identified access patterns:

GetProductDetails — Key-value by product ID, very high volume
SearchProducts — Full-text search with facets (price, category, brand)
GetUserCart — Key-value by session/user, mutable but ephemeral
GetOrderHistory — Range query by user + date
CreateOrder — Transactional write across multiple tables
GetRecommendations — Graph traversal (purchased together, viewed together)
SalesDashboard — Aggregations by region, time, product category

Database selection:

Pattern	Database Choice	Reasoning
GetProductDetails	Redis cache + PostgreSQL	Cache for speed, PostgreSQL for authority
SearchProducts	Elasticsearch	Built for full-text + facets
GetUserCart	Redis	High-speed mutable state, ephemeral OK
GetOrderHistory	PostgreSQL	ACID needed, range queries well-supported
CreateOrder	PostgreSQL	Transactional integrity essential
GetRecommendations	Precomputed in Redis	Could be Neo4j if graph queries are complex
SalesDashboard	ClickHouse or BigQuery	Analytics-optimized columnar storage

Result: Polyglot persistence — PostgreSQL for core transactional data, Redis for caching and sessions, Elasticsearch for search, ClickHouse for analytics. This is common in mature systems.

Start Simple, Add Complexity

Don't begin with 7 databases. Start with PostgreSQL for most needs (it's surprisingly capable), add Redis for caching early, and introduce specialized databases only when PostgreSQL demonstrably can't handle specific patterns. Every database added is operational complexity added.

Schema Design for Access Patterns

Once you've chosen a database, your schema design should align with access patterns. This is especially critical for NoSQL databases where you often can't add new query patterns without restructuring data.

Relational Schema Design:

For SQL databases, start normalized and denormalize based on measured pain:

Third Normal Form (3NF) — Start here. No redundant data, clear relationships.
Identify hot paths — Which queries run most often?
Denormalize selectively — Add redundant columns or materialized views for hot queries.
Index strategically — Cover hot queries with composite indexes.

-- Normalized: Requires JOIN for every order display
SELECT o.*, u.name, u.email 
FROM orders o JOIN users u ON o.user_id = u.id 
WHERE o.id = 123;

-- Denormalized: Add user_name directly to orders table
-- Faster query, but now must update user_name in two places
SELECT id, user_name, total, status FROM orders WHERE id = 123;

Document Schema Design (MongoDB, DynamoDB):

Document databases require thinking in aggregates—what data is accessed together?

Embedding (Denormalized):

{
  "orderId": "12345",
  "user": {
    "id": "u123",
    "name": "Jane Doe",
    "email": "jane@example.com"
  },
  "items": [
    {"productId": "p1", "name": "Widget", "quantity": 2},
    {"productId": "p2", "name": "Gadget", "quantity": 1}
  ]
}

Use when: Data is accessed together, updates are infrequent, or embedded data is bounded.

Referencing (Normalized):

{
  "orderId": "12345",
  "userId": "u123",
  "itemIds": ["i1", "i2"]
}

Use when: Referenced data is large, frequently updated independently, or unbounded.

Key-Value Schema Design (DynamoDB, Cassandra):

With key-value stores, your access patterns define your key structure:

Access Pattern	Key Structure	Example Key
Get user by ID	USER#<id>	USER#12345
Get orders by user	USER#<id>#ORDER#<orderId>	USER#12345#ORDER#99
Get orders by date	ORDER#<date>#<orderId>	ORDER#2024-01-15#99
Get order items	ORDER#<orderId>#ITEM#<itemId>	ORDER#99#ITEM#1

The critical constraint: You must know how you'll query data BEFORE you store it. Adding a new access pattern may require migrating all existing data with new key structures. This is the trade-off for horizontal scalability.

NoSQL Access Pattern Lock-In

In relational databases, adding a new query often means adding an index. In key-value stores, adding a new query pattern may require restructuring data or maintaining redundant copies with different keys. Design access patterns upfront with care.

Summary: Access Patterns as Architecture

Access patterns are not an afterthought—they are the foundation of database architecture. Understanding how your application will read and write data determines almost every downstream decision: database selection, schema design, indexing strategy, and scaling approach.

Key Takeaways

•Document access patterns explicitly — Name, query shape, frequency, latency requirements, and freshness needs for each pattern.
•Access patterns cluster into recognizable categories: key-value, range, full-text, graph, aggregation, time-series.
•Read/write ratio fundamentally shapes architecture—read-heavy systems optimize differently than write-heavy systems.
•Database selection should be driven by access pattern fit, not popularity or familiarity.
•Schema design must align with access patterns, especially in NoSQL where new patterns may require data restructuring.
•The 80/20 rule applies: optimize ruthlessly for dominant patterns; accept less efficiency for rare patterns.
•Polyglot persistence is common: different data types and access patterns often warrant different storage systems.

What's next:

Now that we understand persistence requirements and access patterns, we'll examine the database as a potential bottleneck. Every optimization has limits, and databases are often where systems first struggle under load. Understanding how and why databases become bottlenecks—and what to do about it—is essential for designing systems that scale.

Page Complete

You now understand how access patterns shape database architecture. You can identify, document, and classify access patterns, select databases based on pattern fit, and design schemas that serve your workload efficiently. Next, we'll explore the database as a common system bottleneck.

3 / 4

Loading learning content...

System Design (HLD)Why Databases Matter

Why Databases Matter in System Design

LevelBeginner

Duration60 mins

TopicWhy Databases Matter

3 / 4

Query and Access Patterns

The Questions Your Data Must Answer

Every database stores data. But the questions you ask of that data—and how frequently you ask them—determine whether your system performs brilliantly or struggles pathetically.

Consider two systems storing identical user data:

System A primarily answers: "What is this user's profile?"

System B primarily answers: "Which users are online in this geographic region?"

The access pattern is the architecture.

The Central Insight

What Are Access Patterns?

An access pattern describes how your application interacts with data: what data is read or written, how it's queried, how often, by whom, and under what conditions.

Components of an access pattern:

Access Pattern Dimensions

•Read vs Write Ratio — Is the workload read-heavy (90% reads), write-heavy (90% writes), or balanced? This affects caching strategy, replication topology, and consistency requirements.
•Query Shape — Point lookups by ID? Range scans? Full-text search? Aggregations? Graph traversals? Each query shape has different optimal data structures.
•Data Access Locality — Do queries typically access related data together (temporal or spatial locality)? Or randomly across the dataset?
•Concurrency Pattern — Many users reading same data? Many users writing to different data? Many users writing to same data (hotspots)?
•Latency Requirements — Must queries return in 10ms? 100ms? 1 second? This determines indexing aggressiveness and caching requirements.
•Throughput Requirements — How many queries per second? Thousands? Millions? This affects scaling strategy.
•Freshness Requirements — Must reads see the latest write? Or is eventual consistency acceptable?

Documenting access patterns:

Before selecting a database or designing a schema, enumerate your access patterns explicitly. For each pattern, document:

Field	Description	Example
Name	Short identifier	"GetUserProfile"
Query	What question is being asked	"Get full profile for user ID X"
Input	What parameters/filters	User ID
Output	What data is returned	User object with all fields
Frequency	How often per second/day	10,000 requests/second
Latency SLA	Required response time	p99 < 50ms
Freshness	Staleness tolerance	< 5 seconds
Write/Read	Is this a read or write	Read

This exercise forces clarity. It also provides the foundation for capacity planning and performance testing.

The 80/20 Rule

Common Access Pattern Categories

Access patterns cluster into recognizable categories, each with well-known optimization strategies.

Category 1: Key-Value Lookup

Pattern: Given a known key (user ID, product ID, session token), retrieve the associated value.

Characteristics:

O(1) lookup complexity required
High throughput, low latency
Often cacheable

Optimal for: Redis, DynamoDB, Memcached, or any database with hash index on key.

Anti-pattern for: Full table scans, complex joins, ad-hoc querying.

Category 2: Range Queries

Pattern: Retrieve records within a range (orders from last 7 days, products priced $10-$50, users whose names start with 'A').

Characteristics:

O(log n + k) complexity (find start, scan k results)
Requires sorted index on range column
Often combined with filters

Optimal for: B-tree indexes, sorted sets, time-series databases.

Anti-pattern for: Hash indexes (no ordering), heap scans.

Category 3: Full-Text Search

Pattern: Find documents containing words, phrases, or matching semantic concepts.

Characteristics:

Inverted index required
Relevance scoring matters
Often includes faceting, highlighting, suggestions

Optimal for: Elasticsearch, Solr, PostgreSQL full-text search, Algolia.

Anti-pattern for: Relational LIKE '%term%' queries (O(n) scan).

Category 4: Graph Traversal

Pattern: Navigate relationships (friends of friends, product recommendations based on purchase history, dependency graphs).

Characteristics:

Relationship-centric queries
Variable-depth traversals
Path finding, connectivity analysis

Optimal for: Neo4j, Neptune, JanusGraph, or GraphQL over relational with careful modeling.

Anti-pattern for: Repeated JOINs in relational databases (exploding complexity).

Category 5: Aggregations and Analytics

Pattern: Compute summaries over large datasets (total revenue by region, average session duration, top products by sales).

Characteristics:

Often scans large portions of data
Compute-intensive (SUM, COUNT, AVG, GROUP BY)
Can tolerate higher latency

Optimal for: Column-oriented databases (ClickHouse, BigQuery, Redshift), materialized views.

Anti-pattern for: Row-oriented transactional databases (PostgreSQL can do it, but inefficiently at scale).

Category 6: Time-Series Access

Pattern: Query data organized by time (metrics, logs, IoT sensor readings).

Characteristics:

High write throughput (continuous ingestion)
Queries typically by time range
Recent data accessed more than old data
Often downsampled or rolled up over time

Optimal for: InfluxDB, TimescaleDB, Prometheus, time-partitioned tables.

Anti-pattern for: Generic relational databases without time-based partitioning.

Hybrid Systems

Read Patterns Deep Dive

Most systems are read-heavy, making read optimization crucial. Let's examine read patterns in detail.

Single-Record Reads (Point Queries)

The simplest pattern: fetch one record by primary key.

SELECT * FROM users WHERE user_id = 123;

Optimization strategies:

Index on lookup key (obvious but essential)
Caching layer (Redis, Memcached) for hot records
Read replicas to distribute load
Connection pooling to minimize overhead

Expected performance: Sub-millisecond with proper indexing and caching.

Multi-Record Reads (Batch Queries)

Fetching multiple records by a list of keys:

SELECT * FROM users WHERE user_id IN (1, 2, 3, 4, 5);

Optimization strategies:

Batch requests to reduce round-trips
Parallel fetching across shards/replicas
Multi-get operations in key-value stores

Anti-patterns:

Issuing N separate queries instead of one batch query
Unbounded IN clauses (thousands of IDs)

Range Scans

Fetching records within a range:

SELECT * FROM orders 
WHERE user_id = 123 
  AND created_at >= '2024-01-01' 
  AND created_at < '2024-02-01';

Optimization strategies:

Composite index on (user_id, created_at)
LIMIT clause to bound result size
Pagination with cursors for large result sets

Key insight: Index column order matters. For this query, (user_id, created_at) is optimal. (created_at, user_id) would be suboptimal—it narrows by date first, then filters by user.

Joins and Denormalization Trade-off

Join-heavy reads:

SELECT o.*, u.name, p.title 
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2024-01-01';

The trade-off:

Normalized (with JOINs): Smaller storage, no data duplication, complex reads
Denormalized (pre-joined): Faster reads, storage overhead, write complexity

Decision framework:

Read-heavy and joins are bottleneck → Denormalize
Write-heavy and consistency critical → Keep normalized
Mixed → Consider materialized views or CQRS pattern

The N+1 Query Problem

Write Patterns Deep Dive

Write patterns are often more challenging than reads because writes must maintain durability and consistency guarantees.

Single-Record Writes

Inserting or updating one record:

INSERT INTO orders (user_id, product_id, amount) VALUES (123, 456, 99.99);
UPDATE users SET last_active = NOW() WHERE user_id = 123;

Optimization strategies:

Batch commits (group multiple writes in one transaction)
Asynchronous writes where durability requirements allow
Write-behind caching for frequent updates to same records

Watch for: Lock contention on frequently-updated rows.

Bulk/Batch Writes

Loading large amounts of data:

INSERT INTO events (event_type, data, timestamp) VALUES 
  ('click', '...', NOW()),
  ('view', '...', NOW()),
  ('purchase', '...', NOW()),
  ... (thousands more);

Optimization strategies:

Disable indexes during bulk load, rebuild after
Use COPY (PostgreSQL) or LOAD DATA (MySQL) for fastest ingestion
Partition target tables to isolate write impact
Use staging tables for validation before merge

Anti-patterns:

Individual INSERT statements in a loop
Auto-commit on every row
Writing to heavily-indexed tables without optimization

High-Frequency Updates (Hotspots)

Many concurrent writes to the same record:

-- 1000 concurrent requests
UPDATE counters SET value = value + 1 WHERE counter_id = 'global_page_views';

The problem: Row-level locks serialize updates, creating a bottleneck. Even with optimistic locking, retry storms occur.

Solutions:

Sharded counters: Spread across multiple rows, sum when reading
Write-behind accumulation: Buffer in memory, periodically flush to database
Eventually consistent counters: Use Redis INCR, periodically persist
Probabilistic counting: HyperLogLog for approximate unique counts

Transactional Writes

Multiple writes that must succeed or fail together:

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'from_account';
UPDATE accounts SET balance = balance + 100 WHERE id = 'to_account';
INSERT INTO transfers (from_id, to_id, amount) VALUES ('from_account', 'to_account', 100);
COMMIT;

Key considerations:

Lock ordering to prevent deadlocks
Transaction duration (shorter is better for concurrency)
Retry strategy for failed transactions
Read-your-writes consistency within the transaction

Write Amplification

Read/Write Ratio and Its Implications

The ratio of reads to writes profoundly affects architecture decisions.

Workload Classification:

Workload Types and Optimization Strategies
Workload Type	Read/Write Ratio	Examples	Optimization Focus
Read-heavy	100:1	Content sites, product catalogs, documentation	Caching, read replicas, denormalization
Balanced	10:1 to 100:1	Social media, e-commerce	Mixed strategies, CQRS consideration
Write-heavy	< 10:1	Logging, analytics ingestion, IoT	Write throughput, partitioning, async processing
Write-intensive	< 1:1	Event sourcing, append-only logs	Append optimization, compaction, streaming

Architecture implications by ratio:

Read-heavy (100:1+):

Aggressive caching (cache hit ratio > 95% achievable)
Multiple read replicas
CDN for static content
Denormalized read models
Consider eventual consistency (most reads serve cached/replicated data)

Balanced (10:1 to 100:1):

Moderate caching with careful invalidation
Primary + few replicas
Materialized views for complex queries
Consider CQRS (Command Query Responsibility Segregation)

Write-heavy (< 10:1):

Minimal indexing (only essential indexes)
Write partitioning/sharding
Async replication acceptable
Batch writes where possible
Consider append-only structures

Write-intensive (< 1:1):

Streaming architecture (Kafka, Kinesis)
Append-only storage
Separate write path from query path entirely
Batch processing for analytics on written data

Read Optimization Tactics

•Add read replicas to distribute load
•Implement multi-layer caching
•Denormalize for common query patterns
•Pre-compute expensive aggregations
•Use content delivery networks

Write Optimization Tactics

•Minimize indexes on write-heavy tables
•Batch writes for efficiency
•Use asynchronous writes where possible
•Partition data across nodes
•Consider append-only patterns

Measure, Don't Assume

Access Patterns Drive Database Selection

The most important database selection criterion isn't popularity, brand, or what your last job used. It's fit with your access patterns.

Database-Pattern Fit Matrix:

Database Selection by Primary Access Pattern
Primary Access Pattern	Strong Fit	Acceptable	Poor Fit
Key-value lookups	Redis, DynamoDB, Aerospike	PostgreSQL, MongoDB	Cassandra (overkill)
Complex relational queries	PostgreSQL, MySQL	MongoDB (with care)	Redis, DynamoDB
Full-text search	Elasticsearch, Algolia	PostgreSQL FTS	MySQL, DynamoDB
Graph traversals	Neo4j, Neptune	PostgreSQL (recursive CTEs)	MySQL, DynamoDB
Time-series metrics	InfluxDB, TimescaleDB	PostgreSQL + partitioning	Generic SQL
High-volume analytics	ClickHouse, BigQuery	PostgreSQL (small scale)	OLTP databases
Document storage	MongoDB, CouchDB	PostgreSQL JSONB	MySQL JSON
High write throughput	Cassandra, ScyllaDB	PostgreSQL (partitioned)	Replicated MySQL

Case study: Choosing for an e-commerce platform

Let's walk through database selection for a fictional e-commerce system:

Identified access patterns:

GetProductDetails — Key-value by product ID, very high volume
SearchProducts — Full-text search with facets (price, category, brand)
GetUserCart — Key-value by session/user, mutable but ephemeral
GetOrderHistory — Range query by user + date
CreateOrder — Transactional write across multiple tables
GetRecommendations — Graph traversal (purchased together, viewed together)
SalesDashboard — Aggregations by region, time, product category

Database selection:

Pattern	Database Choice	Reasoning
GetProductDetails	Redis cache + PostgreSQL	Cache for speed, PostgreSQL for authority
SearchProducts	Elasticsearch	Built for full-text + facets
GetUserCart	Redis	High-speed mutable state, ephemeral OK
GetOrderHistory	PostgreSQL	ACID needed, range queries well-supported
CreateOrder	PostgreSQL	Transactional integrity essential
GetRecommendations	Precomputed in Redis	Could be Neo4j if graph queries are complex
SalesDashboard	ClickHouse or BigQuery	Analytics-optimized columnar storage

Result: Polyglot persistence — PostgreSQL for core transactional data, Redis for caching and sessions, Elasticsearch for search, ClickHouse for analytics. This is common in mature systems.

Start Simple, Add Complexity

Schema Design for Access Patterns

Relational Schema Design:

For SQL databases, start normalized and denormalize based on measured pain:

Third Normal Form (3NF) — Start here. No redundant data, clear relationships.
Identify hot paths — Which queries run most often?
Denormalize selectively — Add redundant columns or materialized views for hot queries.
Index strategically — Cover hot queries with composite indexes.

-- Normalized: Requires JOIN for every order display
SELECT o.*, u.name, u.email 
FROM orders o JOIN users u ON o.user_id = u.id 
WHERE o.id = 123;

-- Denormalized: Add user_name directly to orders table
-- Faster query, but now must update user_name in two places
SELECT id, user_name, total, status FROM orders WHERE id = 123;

Document Schema Design (MongoDB, DynamoDB):

Document databases require thinking in aggregates—what data is accessed together?

Embedding (Denormalized):

{
  "orderId": "12345",
  "user": {
    "id": "u123",
    "name": "Jane Doe",
    "email": "jane@example.com"
  },
  "items": [
    {"productId": "p1", "name": "Widget", "quantity": 2},
    {"productId": "p2", "name": "Gadget", "quantity": 1}
  ]
}

Use when: Data is accessed together, updates are infrequent, or embedded data is bounded.

Referencing (Normalized):

{
  "orderId": "12345",
  "userId": "u123",
  "itemIds": ["i1", "i2"]
}

Use when: Referenced data is large, frequently updated independently, or unbounded.

Key-Value Schema Design (DynamoDB, Cassandra):

With key-value stores, your access patterns define your key structure:

Access Pattern	Key Structure	Example Key
Get user by ID	USER#<id>	USER#12345
Get orders by user	USER#<id>#ORDER#<orderId>	USER#12345#ORDER#99
Get orders by date	ORDER#<date>#<orderId>	ORDER#2024-01-15#99
Get order items	ORDER#<orderId>#ITEM#<itemId>	ORDER#99#ITEM#1

NoSQL Access Pattern Lock-In

Summary: Access Patterns as Architecture

Key Takeaways

•Document access patterns explicitly — Name, query shape, frequency, latency requirements, and freshness needs for each pattern.
•Access patterns cluster into recognizable categories: key-value, range, full-text, graph, aggregation, time-series.
•Read/write ratio fundamentally shapes architecture—read-heavy systems optimize differently than write-heavy systems.
•Database selection should be driven by access pattern fit, not popularity or familiarity.
•Schema design must align with access patterns, especially in NoSQL where new patterns may require data restructuring.
•The 80/20 rule applies: optimize ruthlessly for dominant patterns; accept less efficiency for rare patterns.
•Polyglot persistence is common: different data types and access patterns often warrant different storage systems.

What's next:

Page Complete

3 / 4