Data Modeling For Lld - Learning Module

Loading content...

0/246

Normalization vs Denormalization Trade-offs

The Structural Foundation

Before code can model entities and relationships, someone must decide how data is structured in the database. This fundamental decision—how to organize data across tables—has cascading effects on your entire application:

Query complexity: Normalized data requires joins; denormalized data doesn't
Write performance: Normalized data writes to one place; denormalized updates multiple
Data consistency: Normalized data has one source of truth; denormalized can become inconsistent
Storage efficiency: Normalized data is compact; denormalized duplicates data
Code complexity: Normalized data maps cleanly to objects; denormalized may not

This page dissects normalization and denormalization, providing you with the analytical framework to make informed structural decisions—and understand the code-level implications of decisions already made.

What You Will Master

By the end of this page, you will understand: (1) Normalization forms and their purposes, (2) When and why to denormalize, (3) Common denormalization patterns, (4) The code-level impact of each approach, (5) Strategies for managing denormalized data, and (6) Making the right choice for your system.

Understanding Normalization

Normalization is the process of organizing database tables to minimize redundancy and dependency. Each normal form addresses specific types of anomalies:

The goal: Each piece of data is stored exactly once. Updates happen in one place. The database structure reflects logical dependencies.

The trade-off: Queries must JOIN multiple tables to assemble complete data. Read patterns become more complex.

Normal Forms Summary
Form	Requirement	Eliminates	Practical Meaning
1NF	Atomic values, no repeating groups	Multi-valued columns	One value per cell; no arrays in columns
2NF	1NF + no partial dependencies	Partial key dependency	Non-key columns depend on whole primary key
3NF	2NF + no transitive dependencies	Transitive dependency	Non-key columns depend only on primary key
BCNF	3NF + every determinant is a key	Remaining anomalies	Stricter 3NF; handles edge cases

Practical example:

Consider an order system where we might be tempted to store data like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- ❌ DENORMALIZED (problematic) structure
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    order_date TIMESTAMP,
    
    -- Customer data repeated on every order
    customer_id UUID,
    customer_name VARCHAR(255),      -- Duplicated!
    customer_email VARCHAR(255),     -- Duplicated!
    customer_phone VARCHAR(50),      -- Duplicated!
    
    -- Address repeated on every order
    shipping_street VARCHAR(255),    -- Duplicated!
    shipping_city VARCHAR(100),      -- Duplicated!
    shipping_country VARCHAR(100),   -- Duplicated!
    
    -- Product data repeated per order item (even worse!)
    item1_product_id UUID,
    item1_product_name VARCHAR(255), -- Duplicated!
    item1_quantity INTEGER,
    item2_product_id UUID,
    item2_product_name VARCHAR(255), -- Duplicated!
    item2_quantity INTEGER
    -- etc...
);
 
-- Problems:
-- 1. Update anomaly: Customer changes email → must update ALL their orders
-- 2. Insert anomaly: Can't record a customer without an order
-- 3. Delete anomaly: Deleting last order loses customer data
-- 4. Storage waste: Same data repeated thousands of times

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- ✅ NORMALIZED structure (3NF)
CREATE TABLE customers (
    id UUID PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    phone VARCHAR(50)
);
 
CREATE TABLE addresses (
    id UUID PRIMARY KEY,
    customer_id UUID REFERENCES customers(id),
    type VARCHAR(20) NOT NULL, -- 'billing' or 'shipping'
    street VARCHAR(255),
    city VARCHAR(100),
    country VARCHAR(100)
);
 
CREATE TABLE products (
    id UUID PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    current_price DECIMAL(10, 2) NOT NULL
);
 
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    customer_id UUID REFERENCES customers(id),
    shipping_address_id UUID REFERENCES addresses(id),
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(50) NOT NULL
);
 
CREATE TABLE order_items (
    id UUID PRIMARY KEY,
    order_id UUID REFERENCES orders(id),
    product_id UUID REFERENCES products(id),
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10, 2) NOT NULL  -- Captured at order time
);
 
-- Benefits:
-- 1. Customer email change: ONE update in customers table
-- 2. Product name change: ONE update in products table  
-- 3. No data duplication
-- 4. Logical structure matches domain model

unit_price in order_items

Notice that unit_price is stored in order_items even though products has current_price. This is intentional denormalization—we capture the price at order time because it's a business requirement that historical orders show what was actually charged, not what the product costs now.

The Case for Normalization

Normalization isn't just academic theory—it solves real, expensive problems in production systems.

Benefits of Normalization

•Single source of truth — Customer data lives in one place. Update once, reflects everywhere. No risk of inconsistent copies.
•Data integrity — Foreign keys enforce relationships. Can't create order items for non-existent products. The database protects invariants.
•Storage efficiency — Each fact stored once. At scale, this matters: 1M orders × 500 bytes of repeated customer data = 500MB wasted.
•Write simplicity — Insert a new product in one table. Update a customer address in one place. No multi-table update coordination.
•Code clarity — Object model maps cleanly to table structure. Order has Customer; Customer has Addresses. Relationships are explicit.
•Schema flexibility — Adding a column to customers automatically affects all related data. No need to update millions of order rows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Normalized structure maps cleanly to domain model
 
class Customer {
  private id: string;
  private name: string;
  private email: string;
  private addresses: Address[] = [];
  
  // Update happens in one place, reflected everywhere
  changeEmail(newEmail: string): void {
    this.email = newEmail;
    // When saved, updates one row in customers table
    // All orders for this customer automatically reflect new email
  }
}
 
class Order {
  private id: string;
  private customer: Customer; // Reference, not embedded data
  private shippingAddress: Address;
  private items: OrderItem[] = [];
  
  // Getting customer data involves navigation (and possibly a query)
  getCustomerName(): string {
    return this.customer.getName();
  }
}
 
class OrderItem {
  private id: string;
  private product: Product;
  private quantity: number;
  private unitPrice: Money; // Captured at order time (intentional denormalization)
  
  // Product current price may differ from order-time price
  getCurrentProductPrice(): Money {
    return this.product.getCurrentPrice();
  }
  
  getOrderPrice(): Money {
    return this.unitPrice; // What was actually charged
  }
}

The Default Choice

Start with normalized design. It's the safer default—you can always denormalize specific areas later when you identify performance bottlenecks. Denormalizing first and trying to renormalize later is extremely difficult once data exists.

The Case for Denormalization

Despite its benefits, normalization has costs that become painful at scale. Denormalization intentionally introduces redundancy to optimize for specific access patterns.

When Denormalization Helps

•Read-heavy workloads — If you read 1000x more than you write, optimizing reads (even at write cost) makes sense.
•Expensive joins eliminated — Joining 5 tables for every page view is costly. Storing pre-joined data avoids repeated work.
•Aggregations pre-computed — Counting all orders for a customer requires scanning. Storing order_count on customer is instant.
•Historical snapshots needed — Order needs to remember customer's name at order time, not current name.
•Distributed systems — JOINs across database shards or microservices are expensive or impossible.
•Latency-critical paths — Single-table lookup is faster than multi-table join for sub-millisecond requirements.

The join cost reality:

Every join has cost. At small scale, this cost is negligible. At large scale, it dominates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Simple query on normalized data
-- "Show order list with customer names"
 
SELECT 
    o.id,
    o.order_date,
    o.status,
    c.name as customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending'
ORDER BY o.order_date DESC
LIMIT 100;
 
-- This requires:
-- 1. Index scan on orders (status, order_date)
-- 2. Index lookup on customers (id) for EACH row
-- 3. Nested loop or hash join
 
-- At 100M orders, even indexed, this is expensive
-- The join is executed 100 times (once per result row)
 
-- With denormalized data:
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    order_date TIMESTAMP,
    status VARCHAR(50),
    customer_id UUID,
    customer_name VARCHAR(255)  -- Denormalized!
);
 
SELECT id, order_date, status, customer_name
FROM orders
WHERE status = 'pending'
ORDER BY order_date DESC
LIMIT 100;
 
-- Single table scan, no joins
-- Can be 10-100x faster at scale

The Price of Speed

Every piece of denormalized data must be kept in sync. customer_name in orders must be updated when customers.name changes. This is where bugs hide. Denormalization trades write complexity and consistency risk for read performance.

Common Denormalization Patterns

Denormalization takes many forms, each with specific use cases and trade-offs.

Pattern 1: Cached Aggregates

Store computed values instead of recalculating on every read:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Instead of: SELECT COUNT(*) FROM orders WHERE customer_id = ?
-- (scans order table every time)
 
ALTER TABLE customers ADD COLUMN order_count INTEGER DEFAULT 0;
ALTER TABLE customers ADD COLUMN total_spent DECIMAL(10,2) DEFAULT 0;
 
-- Update on order creation (application code or trigger)
UPDATE customers 
SET order_count = order_count + 1,
    total_spent = total_spent + :order_total
WHERE id = :customer_id;
 
-- Now this is instant:
SELECT order_count, total_spent FROM customers WHERE id = ?;

Pattern 2: Embedded/Copied Fields

Copy frequently-accessed fields from related entities:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Order needs customer name for display
-- Instead of joining every time, store it
 
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    customer_id UUID REFERENCES customers(id),
    customer_name VARCHAR(255),  -- Copied from customer
    customer_email VARCHAR(255), -- Copied from customer
    -- ... other fields
);
 
-- Sync requirement: When customer changes name, update their orders
-- Could be done by:
-- 1. Application code on customer update
-- 2. Database trigger
-- 3. Async event handler

Pattern 3: Precomputed/Materialized Views

Create read-optimized tables or views for specific query patterns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Complex dashboard query
SELECT 
    DATE_TRUNC('day', o.order_date) as day,
    COUNT(*) as order_count,
    SUM(o.total) as revenue,
    COUNT(DISTINCT o.customer_id) as unique_customers
FROM orders o
GROUP BY DATE_TRUNC('day', o.order_date)
ORDER BY day DESC;
 
-- At scale, this is slow. Create a summary table:
 
CREATE TABLE daily_order_summary (
    day DATE PRIMARY KEY,
    order_count INTEGER NOT NULL,
    revenue DECIMAL(15,2) NOT NULL,
    unique_customers INTEGER NOT NULL
);
 
-- Populate periodically (batch job) or incrementally (on order creation)
-- Dashboard queries against summary table are instant

Pattern 4: Historical Snapshots

Preserve state at a point in time rather than referencing current state:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Order items need price AT ORDER TIME, not current price
 
CREATE TABLE order_items (
    id UUID PRIMARY KEY,
    order_id UUID REFERENCES orders(id),
    product_id UUID REFERENCES products(id),
    quantity INTEGER NOT NULL,
    
    -- Snapshot fields (denormalized for business reasons)
    product_name_at_order VARCHAR(255) NOT NULL,
    unit_price_at_order DECIMAL(10,2) NOT NULL,
    product_sku_at_order VARCHAR(50) NOT NULL
);
 
-- Product changes don't affect historical orders
-- This is REQUIRED denormalization for business correctness

Pattern 5: JSON/Document Embedding

Embed related data as JSON for flexible, schema-less storage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Shipping/billing addresses as embedded JSON
 
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    customer_id UUID REFERENCES customers(id),
    
    -- Embedded JSON for complex, variable structure
    shipping_address JSONB NOT NULL,
    billing_address JSONB,
    
    -- Example:
    -- shipping_address = {
    --   "name": "John Doe",
    --   "street": "123 Main St",
    --   "city": "Portland",
    --   "state": "OR",
    --   "zip": "97201",
    --   "country": "USA"
    -- }
    
    order_metadata JSONB  -- Flexible extra fields
);
 
-- Benefits: Single row read, flexible schema
-- Drawbacks: Can't easily query/join on JSON fields (though PostgreSQL helps)

Code-Level Implications

Database denormalization forces code to manage consistency. This has significant implications for your domain model and data access layer.

Managing Cached Aggregates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
// Customer has denormalized order_count and total_spent
 
class Customer {
  private id: string;
  private name: string;
  private orderCount: number;
  private totalSpent: Money;
  
  // These are denormalized - must be updated on order changes
  
  _incrementOrderCount(): void {
    this.orderCount++;
  }
  
  _decrementOrderCount(): void {
    this.orderCount = Math.max(0, this.orderCount - 1);
  }
  
  _addToTotalSpent(amount: Money): void {
    this.totalSpent = this.totalSpent.add(amount);
  }
  
  _subtractFromTotalSpent(amount: Money): void {
    this.totalSpent = this.totalSpent.subtract(amount);
  }
}
 
class Order {
  constructor(
    private customer: Customer,
    private items: OrderItem[]
  ) {
    // Update denormalized aggregates on creation
    this.customer._incrementOrderCount();
    this.customer._addToTotalSpent(this.getTotal());
  }
  
  cancel(): void {
    if (this.status === OrderStatus.CANCELLED) return;
    
    this.status = OrderStatus.CANCELLED;
    
    // Update denormalized aggregates on cancellation
    this.customer._decrementOrderCount();
    this.customer._subtractFromTotalSpent(this.getTotal());
  }
  
  // Order total changes affect customer aggregate
  addItem(product: Product, quantity: number): void {
    const itemTotal = product.price.multiply(quantity);
    
    this.items.push(new OrderItem(this, product, quantity));
    this._total = this._total.add(itemTotal);
    
    // Update customer aggregate
    this.customer._addToTotalSpent(itemTotal);
  }
}

Managing Copied Fields:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Order stores customer_name as a denormalized field
// Must propagate changes
 
class Customer {
  private name: string;
  private orders: Order[] = []; // Or: access via repository
  
  changeName(newName: string): void {
    const oldName = this.name;
    this.name = newName;
    
    // Propagate to denormalized copies
    // Option 1: Direct update (if orders loaded)
    for (const order of this.orders) {
      order._updateCustomerName(newName);
    }
    
    // Option 2: Emit domain event (preferred for decoupling)
    this.addDomainEvent(new CustomerNameChanged(this.id, oldName, newName));
  }
}
 
// Event handler updates denormalized data
class CustomerNameChangedHandler {
  constructor(private orderRepo: OrderRepository) {}
  
  async handle(event: CustomerNameChanged): Promise<void> {
    // Bulk update all orders for this customer
    await this.orderRepo.updateCustomerName(
      event.customerId,
      event.newName
    );
    
    // SQL: UPDATE orders SET customer_name = ? WHERE customer_id = ?
  }
}

Managing Historical Snapshots:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// OrderItem captures product state at order time
 
class OrderItem {
  private productId: string;
  private quantity: number;
  
  // Snapshot fields - set at creation, never updated
  private readonly productNameAtOrder: string;
  private readonly productSkuAtOrder: string;
  private readonly unitPriceAtOrder: Money;
  
  constructor(order: Order, product: Product, quantity: number) {
    this.productId = product.id;
    this.quantity = quantity;
    
    // Capture current state as snapshot
    this.productNameAtOrder = product.name;
    this.productSkuAtOrder = product.sku;
    this.unitPriceAtOrder = product.currentPrice;
    
    // These never change - even if product is updated
  }
  
  // Use snapshot for display/calculation
  getDisplayName(): string {
    return this.productNameAtOrder;
  }
  
  getSubtotal(): Money {
    return this.unitPriceAtOrder.multiply(this.quantity);
  }
  
  // Can still access current product if needed
  // (but this requires loading product, which may have changed)
}

Consistency Bugs Are Subtle

The most common bugs in denormalized systems: forgetting to update copied data in some code path, race conditions between updates, and aggregates drifting from true values over time. Always have reconciliation jobs that verify denormalized data matches the source of truth.

Strategies for Keeping Denormalized Data in Sync

When you denormalize, you create a synchronization responsibility. Several patterns help manage this:

Synchronization Strategies

•Synchronous in-transaction — Update all copies in the same database transaction. Consistent but creates tight coupling and slower writes.
•Database triggers — Let the database handle updates automatically. Keeps sync logic close to data but harder to test and debug.
•Domain events + handlers — Emit event when source changes; handler updates copies. Decoupled but eventually consistent.
•Change data capture (CDC) — Database log readers detect changes and propagate. Works for cross-system sync.
•Periodic reconciliation — Batch job recalculates denormalized values. Catches drift but data may be stale between runs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Strategy 1: Synchronous in-transaction
class OrderService {
  async createOrder(customerId: string, items: CartItem[]): Promise<Order> {
    return this.unitOfWork.transaction(async (tx) => {
      const customer = await this.customerRepo.findById(customerId, tx);
      
      const order = new Order(customer, items);
      await this.orderRepo.save(order, tx);
      
      // Update denormalized aggregate in same transaction
      customer._incrementOrderCount();
      customer._addToTotalSpent(order.getTotal());
      await this.customerRepo.save(customer, tx);
      
      // Both updates succeed or both fail
      return order;
    });
  }
}
 
// Strategy 2: Domain events (async)
class Order {
  private domainEvents: DomainEvent[] = [];
  
  constructor(customer: Customer, items: CartItem[]) {
    // ... create order
    
    // Emit event instead of direct update
    this.domainEvents.push(new OrderCreated(
      this.id,
      customer.id,
      this.getTotal()
    ));
  }
  
  getDomainEvents(): DomainEvent[] {
    return this.domainEvents;
  }
}
 
class OrderCreatedHandler {
  async handle(event: OrderCreated): Promise<void> {
    // Update denormalized data asynchronously
    await this.customerRepo.incrementOrderCount(event.customerId);
    await this.customerRepo.addToTotalSpent(event.customerId, event.total);
  }
}
 
// Strategy 3: Periodic reconciliation
class CustomerAggregatesReconciliation {
  async reconcile(): Promise<void> {
    // Recalculate from source of truth
    const aggregates = await this.db.query(`
      SELECT 
        customer_id,
        COUNT(*) as true_order_count,
        SUM(total) as true_total_spent
      FROM orders
      WHERE status != 'cancelled'
      GROUP BY customer_id
    `);
    
    // Compare with stored values and fix discrepancies
    for (const agg of aggregates) {
      await this.db.query(`
        UPDATE customers
        SET 
          order_count = ${agg.true_order_count},
          total_spent = ${agg.true_total_spent}
        WHERE id = ${agg.customer_id}
        AND (order_count != ${agg.true_order_count}
             OR total_spent != ${agg.true_total_spent})
      `);
    }
  }
}

Synchronization Strategy Trade-offs
Strategy	Consistency	Performance	Complexity	Best For
In-transaction	Strong	Lower write perf	Medium	Critical data, low write volume
Database triggers	Strong	Lower write perf	Low (code), high (debug)	Simple cases, DBA-managed
Domain events	Eventual	Good	Higher	Microservices, decoupled systems
CDC	Eventual	Good	High infrastructure	Cross-system, legacy integration
Periodic reconciliation	Periodic	Best write perf	Low	Analytics, non-critical aggregates

Making the Normalization Decision

Use this framework to decide when to normalize and when to denormalize:

Normalize When...

•Data changes frequently — The more updates, the more expensive keeping copies in sync.
•Consistency is critical — Financial data, inventory counts, anything where inconsistency causes real harm.
•Storage is constrained — Though less common today, normalized is more space-efficient.
•Schema flexibility needed — Adding/changing fields is simpler in normalized structure.
•Write-heavy workload — Normalized writes to one place; denormalized updates multiple.
•You're unsure — Start normalized; it's easier to denormalize later than the reverse.

Denormalize When...

•Specific read pattern is hot — A particular query runs millions of times daily and is slow due to joins.
•Read/write ratio is extremely high — 10,000 reads per write makes write overhead acceptable.
•Historical snapshots are business requirements — Order must show price at order time, not current price.
•Distributed data access — Cross-service joins are impossible; embed what you need.
•Latency requirements are extreme — Sub-millisecond reads can't afford join overhead.
•Aggregations are expensive — Counting millions of rows repeatedly is wasteful.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Decision example: Product listing page
 
// Normalized approach:
async function getProductListing(categoryId: string): Promise<ProductDTO[]> {
  const products = await prisma.product.findMany({
    where: { categoryId },
    include: {
      category: true,
      reviews: { select: { rating: true } }, // For avg calculation
      stocks: true, // For availability
      images: { take: 1 } // For thumbnail
    }
  });
  
  return products.map(p => ({
    id: p.id,
    name: p.name,
    price: p.price,
    categoryName: p.category.name, // Joined
    avgRating: calculateAvg(p.reviews.map(r => r.rating)), // Computed
    inStock: p.stocks.some(s => s.quantity > 0), // Computed
    thumbnailUrl: p.images[0]?.url
  }));
}
 
// Problem: At 10,000 products, 4+ joins per query, avg calculation
// for each product = slow page loads
 
// Denormalized approach:
// Product table stores: category_name, avg_rating, in_stock, thumbnail_url
 
async function getProductListingOptimized(categoryId: string): Promise<ProductDTO[]> {
  return prisma.product.findMany({
    where: { categoryId },
    select: {
      id: true,
      name: true,
      price: true,
      categoryName: true,    // Denormalized
      avgRating: true,       // Precomputed
      inStock: true,         // Precomputed
      thumbnailUrl: true     // Denormalized
    }
  });
}
 
// Trade-off: When category name changes, must update all products in it
// When review is added, must recalculate avgRating
// When stock changes, must update inStock flag

Measure First

Don't denormalize based on assumption. Measure actual query performance in production-like conditions. Many slow queries can be fixed with proper indexing or query optimization rather than schema changes. Denormalize only when you have data proving it's necessary.

Hybrid Approaches

Real-world systems rarely use pure normalization or pure denormalization. Effective architectures are hybrid—normalized core with selective denormalization for specific needs.

CQRS (Command Query Responsibility Segregation):

Maintain normalized model for writes, denormalized model for reads:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// CQRS: Separate write and read models
 
// WRITE SIDE: Normalized, consistent, domain-rich
class Order {
  // Full domain entity with behavior
  private customer: Customer;
  private items: OrderItem[];
  
  constructor(customer: Customer, items: OrderItemData[]) {
    this.validateItems(items);
    this.items = items.map(d => new OrderItem(this, d));
    
    // Publish event for read side to consume
    this.publishEvent(new OrderCreated(this.toSnapshot()));
  }
}
 
class OrderRepository {
  // Writes to normalized tables
  async save(order: Order): Promise<void> {
    await this.db.orders.insert(order.toRecord());
    await this.db.orderItems.insertMany(order.items.map(i => i.toRecord()));
  }
}
 
// READ SIDE: Denormalized, fast, query-optimized
interface OrderReadModel {
  id: string;
  orderDate: Date;
  customerName: string;  // Denormalized
  customerEmail: string; // Denormalized
  itemCount: number;     // Precomputed
  total: number;         // Precomputed
  status: string;
}
 
class OrderReadRepository {
  // Reads from denormalized view/table
  async findForListing(filters: OrderFilters): Promise<OrderReadModel[]> {
    return this.db.orderReadModels.find(filters);
  }
}
 
// Projection handler updates read model when events occur
class OrderProjection {
  async onOrderCreated(event: OrderCreated): Promise<void> {
    await this.db.orderReadModels.insert({
      id: event.orderId,
      orderDate: event.orderDate,
      customerName: event.customerName,
      customerEmail: event.customerEmail,
      itemCount: event.items.length,
      total: event.items.reduce((sum, i) => sum + i.subtotal, 0),
      status: 'pending'
    });
  }
}

Database + Search Index:

Normalized database for transactions, denormalized search index for queries:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Database stores normalized data (source of truth)
// Elasticsearch/Algolia stores denormalized data (search)
 
class ProductService {
  async createProduct(data: ProductData): Promise<Product> {
    // 1. Save to normalized database
    const product = await this.productRepo.save(new Product(data));
    
    // 2. Index to search with denormalized data
    await this.searchIndex.index({
      id: product.id,
      name: product.name,
      description: product.description,
      categoryName: product.category.name, // Flattened
      brandName: product.brand.name,       // Flattened
      price: product.price,
      avgRating: 0,                        // Precomputed, starts at 0
      reviewCount: 0,
      tags: product.tags.map(t => t.name), // Flattened array
      inStock: product.inventory > 0
    });
    
    return product;
  }
  
  // Search uses denormalized index
  async search(query: string): Promise<ProductSearchResult[]> {
    return this.searchIndex.search(query);
    // Fast, pre-joined, no database calls
  }
  
  // Detailed view uses normalized database
  async getProductDetail(id: string): Promise<ProductDetail> {
    return this.productRepo.findWithAllRelations(id);
    // Full, up-to-date, complete data
  }
}

Eventual Consistency Trade-off

Hybrid approaches often mean the read model lags behind writes. When a product name changes, search results might show the old name for seconds to minutes. Decide if this is acceptable for each use case. Critical displays should query the normalized source.

Summary: Normalization vs Denormalization

Key Takeaways

•Normalization eliminates redundancy — Each fact stored once. Updates in one place. Data integrity enforced by structure.
•Denormalization optimizes reads — Avoids joins, precomputes aggregates. Trades write complexity for read performance.
•Start normalized — It's the safer default. Denormalize specific hot paths when you have proof they need it.
•Denormalization patterns vary: cached aggregates, copied fields, snapshots, embedded documents, materialized views.
•Code must manage consistency — Denormalized data creates sync responsibilities. Plan your strategy upfront.
•Sync strategies include in-transaction, triggers, domain events, CDC, and periodic reconciliation. Choose based on consistency needs.
•Hybrid is reality — Production systems combine normalized core with denormalized views for specific access patterns.
•Measure before changing — Query optimization and indexing often solve problems without schema changes.

Module Complete:

With this page, we've completed the Data Modeling for LLD module. You now understand:

How to model entity relationships in code
The characteristics of one-to-one, one-to-many, and many-to-many relationships
When to use lazy vs eager loading and how to avoid N+1 problems
The normalization vs denormalization trade-offs and how to choose

These foundational concepts underpin every persistence layer you'll ever build. Master them, and your data access code will be clean, performant, and maintainable.

Module Complete

You've mastered Data Modeling for LLD—from entity relationships through loading strategies to normalization trade-offs. You now have the conceptual framework to make informed decisions about data structure, relationship representation, and the code patterns that bridge your domain model to database storage.