Loading learning content...
Every database stores data. But the questions you ask of that data—and how frequently you ask them—determine whether your system performs brilliantly or struggles pathetically.
Consider two systems storing identical user data:
System A primarily answers: "What is this user's profile?"
System B primarily answers: "Which users are online in this geographic region?"
Same data. Radically different optimal storage strategies. System A wants user data grouped by user ID (document or row per user). System B wants user data indexed by status and geography (requires different indexing, potentially different storage model).
The access pattern is the architecture.
You cannot evaluate a database without knowing how you'll query it. A database that's perfect for one access pattern may be disastrous for another. This page teaches you to identify, analyze, and design for access patterns—the skill that separates thoughtful system designers from those who pick databases based on hype.
An access pattern describes how your application interacts with data: what data is read or written, how it's queried, how often, by whom, and under what conditions.
Components of an access pattern:
Documenting access patterns:
Before selecting a database or designing a schema, enumerate your access patterns explicitly. For each pattern, document:
| Field | Description | Example |
|---|---|---|
| Name | Short identifier | "GetUserProfile" |
| Query | What question is being asked | "Get full profile for user ID X" |
| Input | What parameters/filters | User ID |
| Output | What data is returned | User object with all fields |
| Frequency | How often per second/day | 10,000 requests/second |
| Latency SLA | Required response time | p99 < 50ms |
| Freshness | Staleness tolerance | < 5 seconds |
| Write/Read | Is this a read or write | Read |
This exercise forces clarity. It also provides the foundation for capacity planning and performance testing.
Typically, 3-5 access patterns dominate 80%+ of traffic. Identify these dominant patterns first and optimize ruthlessly for them. Secondary patterns can often be served less efficiently without significant user impact.
Access patterns cluster into recognizable categories, each with well-known optimization strategies.
Category 1: Key-Value Lookup
Pattern: Given a known key (user ID, product ID, session token), retrieve the associated value.
Characteristics:
Optimal for: Redis, DynamoDB, Memcached, or any database with hash index on key.
Anti-pattern for: Full table scans, complex joins, ad-hoc querying.
Category 2: Range Queries
Pattern: Retrieve records within a range (orders from last 7 days, products priced $10-$50, users whose names start with 'A').
Characteristics:
Optimal for: B-tree indexes, sorted sets, time-series databases.
Anti-pattern for: Hash indexes (no ordering), heap scans.
Category 3: Full-Text Search
Pattern: Find documents containing words, phrases, or matching semantic concepts.
Characteristics:
Optimal for: Elasticsearch, Solr, PostgreSQL full-text search, Algolia.
Anti-pattern for: Relational LIKE '%term%' queries (O(n) scan).
Category 4: Graph Traversal
Pattern: Navigate relationships (friends of friends, product recommendations based on purchase history, dependency graphs).
Characteristics:
Optimal for: Neo4j, Neptune, JanusGraph, or GraphQL over relational with careful modeling.
Anti-pattern for: Repeated JOINs in relational databases (exploding complexity).
Category 5: Aggregations and Analytics
Pattern: Compute summaries over large datasets (total revenue by region, average session duration, top products by sales).
Characteristics:
Optimal for: Column-oriented databases (ClickHouse, BigQuery, Redshift), materialized views.
Anti-pattern for: Row-oriented transactional databases (PostgreSQL can do it, but inefficiently at scale).
Category 6: Time-Series Access
Pattern: Query data organized by time (metrics, logs, IoT sensor readings).
Characteristics:
Optimal for: InfluxDB, TimescaleDB, Prometheus, time-partitioned tables.
Anti-pattern for: Generic relational databases without time-based partitioning.
Real applications often need multiple pattern categories. An e-commerce system might need key-value (session), range queries (order history), full-text (product search), and analytics (sales dashboards). This is why polyglot persistence—using multiple databases—is common in mature architectures.
Most systems are read-heavy, making read optimization crucial. Let's examine read patterns in detail.
Single-Record Reads (Point Queries)
The simplest pattern: fetch one record by primary key.
SELECT * FROM users WHERE user_id = 123;
Optimization strategies:
Expected performance: Sub-millisecond with proper indexing and caching.
Multi-Record Reads (Batch Queries)
Fetching multiple records by a list of keys:
SELECT * FROM users WHERE user_id IN (1, 2, 3, 4, 5);
Optimization strategies:
Anti-patterns:
Range Scans
Fetching records within a range:
SELECT * FROM orders
WHERE user_id = 123
AND created_at >= '2024-01-01'
AND created_at < '2024-02-01';
Optimization strategies:
Key insight: Index column order matters. For this query, (user_id, created_at) is optimal. (created_at, user_id) would be suboptimal—it narrows by date first, then filters by user.
Joins and Denormalization Trade-off
Join-heavy reads:
SELECT o.*, u.name, p.title
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2024-01-01';
The trade-off:
Decision framework:
A classic anti-pattern: fetching a list of N parents, then issuing N separate queries for children. If you have 100 orders and issue 100 queries to get order items, you've made 101 queries instead of 2. This is visible in application code, especially with ORMs. Always look for batch fetch or eager loading options.
Write patterns are often more challenging than reads because writes must maintain durability and consistency guarantees.
Single-Record Writes
Inserting or updating one record:
INSERT INTO orders (user_id, product_id, amount) VALUES (123, 456, 99.99);
UPDATE users SET last_active = NOW() WHERE user_id = 123;
Optimization strategies:
Watch for: Lock contention on frequently-updated rows.
Bulk/Batch Writes
Loading large amounts of data:
INSERT INTO events (event_type, data, timestamp) VALUES
('click', '...', NOW()),
('view', '...', NOW()),
('purchase', '...', NOW()),
... (thousands more);
Optimization strategies:
Anti-patterns:
High-Frequency Updates (Hotspots)
Many concurrent writes to the same record:
-- 1000 concurrent requests
UPDATE counters SET value = value + 1 WHERE counter_id = 'global_page_views';
The problem: Row-level locks serialize updates, creating a bottleneck. Even with optimistic locking, retry storms occur.
Solutions:
Transactional Writes
Multiple writes that must succeed or fail together:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'from_account';
UPDATE accounts SET balance = balance + 100 WHERE id = 'to_account';
INSERT INTO transfers (from_id, to_id, amount) VALUES ('from_account', 'to_account', 100);
COMMIT;
Key considerations:
Every index is updated on every relevant write. A table with 10 indexes requires 11 write operations (1 table + 10 indexes) for each INSERT. This is write amplification. Balance index coverage for reads against write overhead.
The ratio of reads to writes profoundly affects architecture decisions.
Workload Classification:
| Workload Type | Read/Write Ratio | Examples | Optimization Focus |
|---|---|---|---|
| Read-heavy | 100:1 | Content sites, product catalogs, documentation | Caching, read replicas, denormalization |
| Balanced | 10:1 to 100:1 | Social media, e-commerce | Mixed strategies, CQRS consideration |
| Write-heavy | < 10:1 | Logging, analytics ingestion, IoT | Write throughput, partitioning, async processing |
| Write-intensive | < 1:1 | Event sourcing, append-only logs | Append optimization, compaction, streaming |
Architecture implications by ratio:
Read-heavy (100:1+):
Balanced (10:1 to 100:1):
Write-heavy (< 10:1):
Write-intensive (< 1:1):
Don't guess your read/write ratio—measure it. Production workloads often differ from expectations. A 'definitely read-heavy' application might have a high-volume logging path that makes it write-heavy overall. Instrument early and measure continuously.
The most important database selection criterion isn't popularity, brand, or what your last job used. It's fit with your access patterns.
Database-Pattern Fit Matrix:
| Primary Access Pattern | Strong Fit | Acceptable | Poor Fit |
|---|---|---|---|
| Key-value lookups | Redis, DynamoDB, Aerospike | PostgreSQL, MongoDB | Cassandra (overkill) |
| Complex relational queries | PostgreSQL, MySQL | MongoDB (with care) | Redis, DynamoDB |
| Full-text search | Elasticsearch, Algolia | PostgreSQL FTS | MySQL, DynamoDB |
| Graph traversals | Neo4j, Neptune | PostgreSQL (recursive CTEs) | MySQL, DynamoDB |
| Time-series metrics | InfluxDB, TimescaleDB | PostgreSQL + partitioning | Generic SQL |
| High-volume analytics | ClickHouse, BigQuery | PostgreSQL (small scale) | OLTP databases |
| Document storage | MongoDB, CouchDB | PostgreSQL JSONB | MySQL JSON |
| High write throughput | Cassandra, ScyllaDB | PostgreSQL (partitioned) | Replicated MySQL |
Case study: Choosing for an e-commerce platform
Let's walk through database selection for a fictional e-commerce system:
Identified access patterns:
Database selection:
| Pattern | Database Choice | Reasoning |
|---|---|---|
| GetProductDetails | Redis cache + PostgreSQL | Cache for speed, PostgreSQL for authority |
| SearchProducts | Elasticsearch | Built for full-text + facets |
| GetUserCart | Redis | High-speed mutable state, ephemeral OK |
| GetOrderHistory | PostgreSQL | ACID needed, range queries well-supported |
| CreateOrder | PostgreSQL | Transactional integrity essential |
| GetRecommendations | Precomputed in Redis | Could be Neo4j if graph queries are complex |
| SalesDashboard | ClickHouse or BigQuery | Analytics-optimized columnar storage |
Result: Polyglot persistence — PostgreSQL for core transactional data, Redis for caching and sessions, Elasticsearch for search, ClickHouse for analytics. This is common in mature systems.
Don't begin with 7 databases. Start with PostgreSQL for most needs (it's surprisingly capable), add Redis for caching early, and introduce specialized databases only when PostgreSQL demonstrably can't handle specific patterns. Every database added is operational complexity added.
Once you've chosen a database, your schema design should align with access patterns. This is especially critical for NoSQL databases where you often can't add new query patterns without restructuring data.
Relational Schema Design:
For SQL databases, start normalized and denormalize based on measured pain:
-- Normalized: Requires JOIN for every order display
SELECT o.*, u.name, u.email
FROM orders o JOIN users u ON o.user_id = u.id
WHERE o.id = 123;
-- Denormalized: Add user_name directly to orders table
-- Faster query, but now must update user_name in two places
SELECT id, user_name, total, status FROM orders WHERE id = 123;
Document Schema Design (MongoDB, DynamoDB):
Document databases require thinking in aggregates—what data is accessed together?
Embedding (Denormalized):
{
"orderId": "12345",
"user": {
"id": "u123",
"name": "Jane Doe",
"email": "jane@example.com"
},
"items": [
{"productId": "p1", "name": "Widget", "quantity": 2},
{"productId": "p2", "name": "Gadget", "quantity": 1}
]
}
Use when: Data is accessed together, updates are infrequent, or embedded data is bounded.
Referencing (Normalized):
{
"orderId": "12345",
"userId": "u123",
"itemIds": ["i1", "i2"]
}
Use when: Referenced data is large, frequently updated independently, or unbounded.
Key-Value Schema Design (DynamoDB, Cassandra):
With key-value stores, your access patterns define your key structure:
| Access Pattern | Key Structure | Example Key |
|---|---|---|
| Get user by ID | USER#<id> | USER#12345 |
| Get orders by user | USER#<id>#ORDER#<orderId> | USER#12345#ORDER#99 |
| Get orders by date | ORDER#<date>#<orderId> | ORDER#2024-01-15#99 |
| Get order items | ORDER#<orderId>#ITEM#<itemId> | ORDER#99#ITEM#1 |
The critical constraint: You must know how you'll query data BEFORE you store it. Adding a new access pattern may require migrating all existing data with new key structures. This is the trade-off for horizontal scalability.
In relational databases, adding a new query often means adding an index. In key-value stores, adding a new query pattern may require restructuring data or maintaining redundant copies with different keys. Design access patterns upfront with care.
Access patterns are not an afterthought—they are the foundation of database architecture. Understanding how your application will read and write data determines almost every downstream decision: database selection, schema design, indexing strategy, and scaling approach.
What's next:
Now that we understand persistence requirements and access patterns, we'll examine the database as a potential bottleneck. Every optimization has limits, and databases are often where systems first struggle under load. Understanding how and why databases become bottlenecks—and what to do about it—is essential for designing systems that scale.
You now understand how access patterns shape database architecture. You can identify, document, and classify access patterns, select databases based on pattern fit, and design schemas that serve your workload efficiently. Next, we'll explore the database as a common system bottleneck.