Loading learning content...
Traditional data modeling teaches us to start with entities and their logical relationships. Customer has Orders. Order contains Products. This entity-first approach produces clean, normalized schemas that accurately represent the domain.
But there's a problem: the database doesn't care about your domain model. The database cares about the queries it must execute. A beautifully normalized schema that requires 8-table JOINs for your most common query is a poorly designed schema.
Access pattern-driven design flips the methodology. Instead of asking 'What entities exist?', we ask 'What questions will we ask of this data?' The queries come first; the schema follows.
This isn't a rejection of proper modeling—it's an evolution. We still understand entities and relationships, but we optimize the physical structure for the access patterns that matter most.
By the end of this page, you will understand how to identify and document access patterns, how to design schemas that optimize for specific query types, how to balance competing access patterns, and how to apply these principles across both SQL and NoSQL databases.
Entity-first design feels intuitive. You examine the domain, identify entities, define their attributes, map relationships, and normalize. The result is a schema that accurately represents the business domain.
But accuracy and performance are different goals.
The Reality of Production Queries:
Consider an e-commerce application. The entity-first schema might look like:
customers, orders, order_items, products, categories, addresses, paymentsThis is logically correct. But what happens when you need to display a customer's order history page? You need:
That's a 5-table JOIN for every page load, on one of your most common user flows.
123456789101112131415161718192021222324252627282930
-- Entity-first schema: Logically correct, operationally expensiveSELECT o.order_id, o.status, o.created_at, c.name AS customer_name, c.email, oi.quantity, oi.unit_price, p.name AS product_name, p.image_url, a.street, a.city, a.state, a.zip_codeFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idJOIN products p ON oi.product_id = p.product_idJOIN addresses a ON o.shipping_address_id = a.address_idWHERE c.customer_id = 'cust-uuid'ORDER BY o.created_at DESCLIMIT 50; -- Query plan analysis might show:-- - Seq Scan on orders (if customer_id index missing)-- - Hash Join on order_items -- - Hash Join on products-- - Nested Loop on addresses and customers-- - Total: 200ms+ on moderate data volumesThe entity-first trap is insidious because the schema looks 'clean' and passes all normalization checks. The problem only surfaces in production when query latency climbs. By then, you're dealing with migrations on live data—far more expensive than designing for access patterns upfront.
Before designing a schema, you must systematically identify and document your access patterns. This is not guesswork—it requires careful analysis of how the system will be used.
Access Pattern Documentation:
For each major operation, document:
| Operation | Frequency | Latency | Pattern Type | Priority |
|---|---|---|---|---|
| Get product details by ID | 10K/sec | <50ms | Point lookup | Critical |
| List products by category | 5K/sec | <100ms | Range scan + pagination | Critical |
| Search products by keyword | 2K/sec | <200ms | Full-text search | High |
| Get customer order history | 500/sec | <150ms | Range scan | High |
| Calculate daily revenue | 10/day | <30sec | Aggregation | Medium |
| Find products below restock level | 1/hour | <5sec | Range scan | Low |
Priority-Based Design:
Not all access patterns are equal. A query that runs 10,000 times per second with a 50ms SLA demands schema optimization. A nightly batch report can tolerate suboptimal query plans.
Rank access patterns by:
Optimize ruthlessly for the top 3-5 patterns. Accept that lower-priority patterns may require more expensive queries—this is an intentional trade-off, not a failure.
1234567891011121314151617181920212223242526272829303132333435363738394041
// Formal access pattern documentationinterface AccessPattern { id: string; description: string; frequency: { value: number; unit: 'per_second' | 'per_minute' | 'per_hour' | 'per_day'; }; latencySLA: { p50: number; // milliseconds p99: number; // milliseconds }; inputs: string[]; outputs: string[]; patternType: 'point_lookup' | 'range_scan' | 'aggregation' | 'search'; priority: 'critical' | 'high' | 'medium' | 'low';} const accessPatterns: AccessPattern[] = [ { id: 'get-product-by-id', description: 'Fetch product details for product display page', frequency: { value: 10000, unit: 'per_second' }, latencySLA: { p50: 20, p99: 50 }, inputs: ['product_id'], outputs: ['name', 'description', 'price', 'images', 'category', 'inventory_count'], patternType: 'point_lookup', priority: 'critical', }, { id: 'customer-order-history', description: 'List orders with details for customer account page', frequency: { value: 500, unit: 'per_second' }, latencySLA: { p50: 100, p99: 200 }, inputs: ['customer_id', 'page_number', 'page_size'], outputs: ['order_id', 'date', 'status', 'total', 'item_summaries'], patternType: 'range_scan', priority: 'high', }, // ... more patterns];Access patterns emerge from use cases, which emerge from product requirements. Don't design in isolation. Work with product managers to understand user flows. Work with frontend engineers to understand what data each page/screen needs. These conversations reveal the true access patterns, not the ones you imagine.
Each access pattern type suggests specific schema design strategies. Let's examine how to optimize for the most common patterns.
Pattern 1: Point Lookups (Get by ID)
The most common pattern: fetch a single record by its identifier. This should be O(1) or O(log n).
Design principles:
12345678910111213141516171819202122232425262728293031
-- POINT LOOKUP: Get product by ID-- Optimize by denormalizing category and aggregates into product row CREATE TABLE products ( product_id UUID PRIMARY KEY, sku VARCHAR(100) NOT NULL UNIQUE, name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10, 2) NOT NULL, -- Denormalized: category info embedded category_id UUID NOT NULL, category_name VARCHAR(100) NOT NULL, category_path TEXT, -- '/electronics/phones/smartphones' -- Denormalized: aggregates average_rating DECIMAL(3, 2), review_count INTEGER DEFAULT 0, inventory_count INTEGER DEFAULT 0, -- Denormalized: main image for listing pages main_image_url TEXT, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW()); -- Single query returns everything needed for product pageSELECT * FROM products WHERE product_id = 'uuid'; -- No JOINs needed for the critical product display pathPattern 2: Range Scans (List by Filter)
Fetch multiple records matching criteria, often with pagination. Examples: orders by customer, products by category, messages by thread.
Design principles:
123456789101112131415161718192021222324252627282930
-- RANGE SCAN: List customer orders with pagination-- Design index to match access pattern exactly CREATE TABLE orders ( order_id UUID PRIMARY KEY, customer_id UUID NOT NULL, status VARCHAR(50) NOT NULL, total_amount DECIMAL(12, 2) NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- Denormalized for order list display item_count INTEGER NOT NULL, first_item_name VARCHAR(255), -- "MacBook Pro and 2 more items" first_item_image TEXT); -- Composite index matching the exact access pattern-- customer_id (equality) + created_at (range/sort)CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at DESC); -- Query uses the index efficientlySELECT order_id, status, total_amount, created_at, item_count, first_item_name, first_item_imageFROM ordersWHERE customer_id = 'cust-uuid'ORDER BY created_at DESCLIMIT 20 OFFSET 0; -- Index-only scan possible if all columns in index or few enoughPattern 3: Aggregations (Analytics Queries)
Compute sums, averages, counts across many records. Examples: daily revenue, user growth, conversion rates.
Design principles:
1234567891011121314151617181920212223242526272829303132333435
-- AGGREGATION: Daily revenue reporting-- Use partitioning + materialized views for efficient aggregation -- Partition orders by month for efficient date-range queriesCREATE TABLE orders ( order_id UUID NOT NULL, customer_id UUID NOT NULL, total_amount DECIMAL(12, 2) NOT NULL, created_at TIMESTAMPTZ NOT NULL, PRIMARY KEY (order_id, created_at)) PARTITION BY RANGE (created_at); -- Create monthly partitionsCREATE TABLE orders_2024_01 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');CREATE TABLE orders_2024_02 PARTITION OF orders FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');-- etc. -- Materialized view for daily aggregatesCREATE MATERIALIZED VIEW daily_revenue ASSELECT DATE(created_at) AS date, COUNT(*) AS order_count, SUM(total_amount) AS revenue, AVG(total_amount) AS average_order_valueFROM ordersWHERE created_at >= CURRENT_DATE - INTERVAL '90 days'GROUP BY DATE(created_at); -- Refresh nightlyREFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue; -- Fast analytics queriesSELECT * FROM daily_revenue WHERE date >= CURRENT_DATE - INTERVAL '30 days';If your system requires both transactional (OLTP) and analytical (OLAP) query patterns, consider separating them. Keep a normalized OLTP database for writes and real-time reads. Replicate to a denormalized data warehouse (Snowflake, BigQuery, Redshift) for analytics. This is the 'lambda architecture' approach—each database optimized for its access patterns.
Indexes are the primary mechanism for optimizing access patterns without changing table structure. Expert index design can often eliminate the need for denormalization.
The Compound Index Ordering Rule:
For composite indexes, column order matters dramatically. The rule:
=<, >, BETWEENThis order allows the database to use the index for the maximum number of conditions.
123456789101112131415161718192021
-- ACCESS PATTERN: Find active products in category, sorted by price-- Query: WHERE category_id = ? AND status = 'active' ORDER BY price ASC -- WRONG: Range column (price) before equality columnsCREATE INDEX idx_bad ON products (price, category_id, status);-- Index can't be used efficiently for equality on category_id -- CORRECT: Equality first, then sortCREATE INDEX idx_good ON products (category_id, status, price);-- Index scan: Find category_id, filter status, already sorted by price -- QUERY USES INDEX PERFECTLYEXPLAIN ANALYZESELECT product_id, name, priceFROM productsWHERE category_id = 'cat-uuid' AND status = 'active'ORDER BY price ASCLIMIT 20; -- Result: Index Scan using idx_good, no sort operation neededCovering Indexes:
A covering index contains all columns needed by a query, eliminating the need to access the table itself (heap/clustered index). This dramatically improves performance for frequently-run queries.
1234567891011121314151617181920212223
-- ACCESS PATTERN: List order summaries for customer-- Query needs: order_id, status, total, created_at-- Filter: customer_id = ? -- Non-covering index: requires heap lookup for each rowCREATE INDEX idx_non_covering ON orders (customer_id);-- Scan index → get row IDs → lookup heap for each row -- Covering index: INCLUDE columns in the index leaf nodesCREATE INDEX idx_covering ON orders (customer_id, created_at DESC)INCLUDE (order_id, status, total_amount);-- Scan index → return data directly, no heap access -- PostgreSQL EXPLAIN shows "Index Only Scan" when covering worksEXPLAIN ANALYZESELECT order_id, status, total_amount, created_atFROM ordersWHERE customer_id = 'cust-uuid'ORDER BY created_at DESCLIMIT 10; -- "Index Only Scan using idx_covering"-- Heap Fetches: 0 ← This is the goalNever guess about index effectiveness. Run EXPLAIN ANALYZE on your queries to see exactly how the database executes them. Look for 'Seq Scan' on large tables (bad), 'Index Scan' or 'Index Only Scan' (good), and check the actual vs estimated row counts.
Access pattern-driven design becomes even more critical in NoSQL databases, where query flexibility is limited by design. Unlike SQL databases which can JOIN anything at runtime (at performance cost), NoSQL databases must have data pre-arranged for the queries you need.
DynamoDB: The Extreme Case
Amazon DynamoDB exemplifies access pattern-driven design. You must define your access patterns before writing a single line of code, because:
This forces rigorous upfront analysis but delivers consistent single-digit millisecond performance at any scale.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
// DynamoDB single-table design for e-commerce// All access patterns served by one table with composite keys interface DynamoDBRecord { PK: string; // Partition key SK: string; // Sort key GSI1PK?: string; // Global Secondary Index 1 partition key GSI1SK?: string; // Global Secondary Index 1 sort key // ... entity-specific attributes} // ACCESS PATTERNS AND KEY DESIGN: // 1. Get customer by ID// PK: CUSTOMER#<customer_id> SK: PROFILE{ PK: 'CUSTOMER#cust-123', SK: 'PROFILE', name: 'John Doe', email: 'john@example.com', entityType: 'Customer'} // 2. List orders for customer (sorted by date)// PK: CUSTOMER#<customer_id> SK: ORDER#<timestamp>{ PK: 'CUSTOMER#cust-123', SK: 'ORDER#2024-01-15T10:30:00Z#ord-456', orderId: 'ord-456', status: 'shipped', total: 129.99, entityType: 'Order'} // Query: Get all orders for customer// Key condition: PK = 'CUSTOMER#cust-123' AND begins_with(SK, 'ORDER#')// Returns orders sorted by date (SK is sortable) // 3. Get order details (including items)// PK: ORDER#<order_id> SK: METADATA or ITEM#<product_id>{ PK: 'ORDER#ord-456', SK: 'METADATA', customerId: 'cust-123', status: 'shipped', shippingAddress: { /* embedded */ }, entityType: 'OrderMetadata'}{ PK: 'ORDER#ord-456', SK: 'ITEM#prod-789', productName: 'Widget Pro', quantity: 2, unitPrice: 49.99, entityType: 'OrderItem'} // Query: Get order with all items// Key condition: PK = 'ORDER#ord-456'// Returns metadata + all items in one queryMongoDB: Embedded Documents
MongoDB offers more flexibility than DynamoDB but still rewards access pattern-driven design. The key question: should related data be embedded (denormalized) or referenced (normalized)?
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
// MongoDB document design based on access patterns // PATTERN: Always fetch user with their preferences// DESIGN: Embed preferences in user documentinterface UserDocument { _id: ObjectId; email: string; name: string; // Embedded (1:1, always accessed together) preferences: { theme: 'light' | 'dark'; notifications: boolean; language: string; }; // Embedded (1:few, bounded growth) addresses: Array<{ type: 'home' | 'work' | 'shipping'; street: string; city: string; zipCode: string; }>;} // PATTERN: Fetch blog post with author name (not full profile)// DESIGN: Embed author summary, reference full authorinterface BlogPostDocument { _id: ObjectId; title: string; content: string; // Denormalized author summary for display author: { _id: ObjectId; // Reference for full profile link name: string; avatarUrl: string; }; // Referenced (unbounded, accessed separately) commentIds: ObjectId[]; // Only IDs, fetch comments separately // Or for small numbers, embed recent comments recentComments: Array<{ _id: ObjectId; authorName: string; content: string; createdAt: Date; }>;} // PATTERN: Show order history with basic details// DESIGN: Embed line item summaries, reference productsinterface OrderDocument { _id: ObjectId; customerId: ObjectId; status: string; totalAmount: number; createdAt: Date; // Embedded snapshot (historical accuracy) items: Array<{ productId: ObjectId; productName: string; // Snapshot at order time quantity: number; unitPrice: number; // Price at order time }>; shippingAddress: { // Full embedded address snapshot };}With SQL databases, you can often add indexes or rewrite queries to support new access patterns. With NoSQL, changing access patterns may require data migrations, new tables, or complete redesigns. Invest heavily in access pattern analysis before choosing NoSQL.
Real systems have multiple access patterns that may conflict. Optimizing for one pattern often suboptimizes another. How do you resolve these tensions?
Strategy 1: Priority-Based Optimization
Rank patterns by importance (frequency × latency sensitivity). Optimize fully for top patterns; accept suboptimal performance for lower-priority patterns.
Strategy 2: Separate Read Models (CQRS)
Command Query Responsibility Segregation maintains separate data stores for reads and writes. The write store is normalized for integrity; read stores are denormalized for specific queries.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
// CQRS: Separate write and read models // WRITE MODEL: Normalized, handles all mutations// Stored in PostgreSQL for ACID guaranteesinterface OrderWriteModel { orderId: string; customerId: string; status: string; items: Array<{ productId: string; quantity: number; unitPrice: number; }>; createdAt: Date;} // READ MODELS: Denormalized for specific use cases // Read model 1: Customer order list// Stored in Redis or Elasticsearchinterface CustomerOrdersReadModel { customerId: string; orders: Array<{ orderId: string; status: string; totalAmount: number; itemCount: number; createdAt: Date; }>;} // Read model 2: Order detail viewinterface OrderDetailReadModel { orderId: string; customerName: string; customerEmail: string; status: string; items: Array<{ productName: string; productImage: string; quantity: number; unitPrice: number; }>; shippingAddress: FullAddress; totalAmount: number;} // Event handler: Sync read models when orders changeasync function onOrderCreated(event: OrderCreatedEvent) { // Update write store await orderRepository.save(event.order); // Project to read model 1 await customerOrdersCache.addOrder( event.order.customerId, summarizeOrder(event.order) ); // Project to read model 2 await orderDetailCache.save( event.order.orderId, await enrichOrderForDisplay(event.order) );}Strategy 3: Multiple Indexes / GSIs
For SQL databases, multiple indexes can support different access patterns on the same data. For DynamoDB, Global Secondary Indexes (GSIs) allow querying on different key combinations.
Strategy 4: Polyglot Persistence
Use different database technologies for different access patterns:
| Strategy | Best For | Trade-offs |
|---|---|---|
| Priority-based optimization | Clear hierarchy of importance | Low-priority queries may be slow |
| CQRS | Complex domains, event sourcing | Eventual consistency, complexity |
| Multiple indexes | SQL with varied query patterns | Write overhead, storage cost |
| Polyglot persistence | Extreme scale, specialized requirements | Operational complexity, sync challenges |
Don't adopt CQRS or polyglot persistence prematurely. Start with a well-indexed SQL database. Only add complexity when you have evidence that the simpler approach fails. Many successful systems run entirely on PostgreSQL with thoughtful index design.
Access patterns change over time. New features require new queries. User behavior evolves. How do you design schemas that can adapt?
Principle: Anticipate Change, But Don't Over-Engineer
The goal is a schema that:
Techniques for Evolvable Schemas:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
-- EVOLVABLE SCHEMA: Products with flexible attributes CREATE TABLE products ( product_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), sku VARCHAR(100) NOT NULL UNIQUE, name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10, 2) NOT NULL, category_id UUID NOT NULL, -- JSONB for category-specific attributes -- Electronics: { "screenSize": "15.6", "cpu": "Intel i7", "ram": "16GB" } -- Clothing: { "size": "M", "color": "blue", "material": "cotton" } attributes JSONB DEFAULT '{}', -- Metadata for record evolution schema_version INTEGER DEFAULT 1, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW()); -- GIN index for JSONB queriesCREATE INDEX idx_products_attributes ON products USING GIN (attributes); -- Query specific attributesSELECT * FROM products WHERE attributes->>'screenSize' = '15.6' AND category_id = 'electronics-uuid'; -- Later: Add new access pattern with just an index-- "Find products by color across all categories"CREATE INDEX idx_products_color ON products ((attributes->>'color')); -- VIEW abstraction: Application sees stable interfaceCREATE VIEW product_catalog ASSELECT product_id, sku, name, price, category_id, attributes->>'mainImage' AS main_image, COALESCE((attributes->>'reviewCount')::integer, 0) AS review_countFROM products; -- Application queries the view-- Later: change underlying structure without breaking queriesDon't build for access patterns you might need. Build for patterns you know you need, with reasonable flexibility for evolution. The cost of over-engineering is real: complexity, maintenance burden, and slower development velocity. When new patterns emerge, adapt—migrations are a normal part of system evolution.
Access pattern-driven design is the discipline of designing data structures around the queries they must serve, not just the entities they represent.
What's Next:
With access pattern-driven design understood, we'll explore Schema Evolution—the practices and patterns for changing your data model safely in production systems with millions of records.
You now understand how to design schemas that efficiently serve their queries. In the next page, we'll explore how to evolve schemas safely as systems grow and requirements change.