Loading content...
Before code can model entities and relationships, someone must decide how data is structured in the database. This fundamental decision—how to organize data across tables—has cascading effects on your entire application:
This page dissects normalization and denormalization, providing you with the analytical framework to make informed structural decisions—and understand the code-level implications of decisions already made.
By the end of this page, you will understand: (1) Normalization forms and their purposes, (2) When and why to denormalize, (3) Common denormalization patterns, (4) The code-level impact of each approach, (5) Strategies for managing denormalized data, and (6) Making the right choice for your system.
Normalization is the process of organizing database tables to minimize redundancy and dependency. Each normal form addresses specific types of anomalies:
The goal: Each piece of data is stored exactly once. Updates happen in one place. The database structure reflects logical dependencies.
The trade-off: Queries must JOIN multiple tables to assemble complete data. Read patterns become more complex.
| Form | Requirement | Eliminates | Practical Meaning |
|---|---|---|---|
| 1NF | Atomic values, no repeating groups | Multi-valued columns | One value per cell; no arrays in columns |
| 2NF | 1NF + no partial dependencies | Partial key dependency | Non-key columns depend on whole primary key |
| 3NF | 2NF + no transitive dependencies | Transitive dependency | Non-key columns depend only on primary key |
| BCNF | 3NF + every determinant is a key | Remaining anomalies | Stricter 3NF; handles edge cases |
Practical example:
Consider an order system where we might be tempted to store data like this:
12345678910111213141516171819202122232425262728293031
-- ❌ DENORMALIZED (problematic) structureCREATE TABLE orders ( id UUID PRIMARY KEY, order_date TIMESTAMP, -- Customer data repeated on every order customer_id UUID, customer_name VARCHAR(255), -- Duplicated! customer_email VARCHAR(255), -- Duplicated! customer_phone VARCHAR(50), -- Duplicated! -- Address repeated on every order shipping_street VARCHAR(255), -- Duplicated! shipping_city VARCHAR(100), -- Duplicated! shipping_country VARCHAR(100), -- Duplicated! -- Product data repeated per order item (even worse!) item1_product_id UUID, item1_product_name VARCHAR(255), -- Duplicated! item1_quantity INTEGER, item2_product_id UUID, item2_product_name VARCHAR(255), -- Duplicated! item2_quantity INTEGER -- etc...); -- Problems:-- 1. Update anomaly: Customer changes email → must update ALL their orders-- 2. Insert anomaly: Can't record a customer without an order-- 3. Delete anomaly: Deleting last order loses customer data-- 4. Storage waste: Same data repeated thousands of times1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- ✅ NORMALIZED structure (3NF)CREATE TABLE customers ( id UUID PRIMARY KEY, name VARCHAR(255) NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, phone VARCHAR(50)); CREATE TABLE addresses ( id UUID PRIMARY KEY, customer_id UUID REFERENCES customers(id), type VARCHAR(20) NOT NULL, -- 'billing' or 'shipping' street VARCHAR(255), city VARCHAR(100), country VARCHAR(100)); CREATE TABLE products ( id UUID PRIMARY KEY, name VARCHAR(255) NOT NULL, current_price DECIMAL(10, 2) NOT NULL); CREATE TABLE orders ( id UUID PRIMARY KEY, customer_id UUID REFERENCES customers(id), shipping_address_id UUID REFERENCES addresses(id), order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, status VARCHAR(50) NOT NULL); CREATE TABLE order_items ( id UUID PRIMARY KEY, order_id UUID REFERENCES orders(id), product_id UUID REFERENCES products(id), quantity INTEGER NOT NULL, unit_price DECIMAL(10, 2) NOT NULL -- Captured at order time); -- Benefits:-- 1. Customer email change: ONE update in customers table-- 2. Product name change: ONE update in products table -- 3. No data duplication-- 4. Logical structure matches domain modelNotice that unit_price is stored in order_items even though products has current_price. This is intentional denormalization—we capture the price at order time because it's a business requirement that historical orders show what was actually charged, not what the product costs now.
Normalization isn't just academic theory—it solves real, expensive problems in production systems.
12345678910111213141516171819202122232425262728293031323334353637383940414243
// Normalized structure maps cleanly to domain model class Customer { private id: string; private name: string; private email: string; private addresses: Address[] = []; // Update happens in one place, reflected everywhere changeEmail(newEmail: string): void { this.email = newEmail; // When saved, updates one row in customers table // All orders for this customer automatically reflect new email }} class Order { private id: string; private customer: Customer; // Reference, not embedded data private shippingAddress: Address; private items: OrderItem[] = []; // Getting customer data involves navigation (and possibly a query) getCustomerName(): string { return this.customer.getName(); }} class OrderItem { private id: string; private product: Product; private quantity: number; private unitPrice: Money; // Captured at order time (intentional denormalization) // Product current price may differ from order-time price getCurrentProductPrice(): Money { return this.product.getCurrentPrice(); } getOrderPrice(): Money { return this.unitPrice; // What was actually charged }}Start with normalized design. It's the safer default—you can always denormalize specific areas later when you identify performance bottlenecks. Denormalizing first and trying to renormalize later is extremely difficult once data exists.
Despite its benefits, normalization has costs that become painful at scale. Denormalization intentionally introduces redundancy to optimize for specific access patterns.
The join cost reality:
Every join has cost. At small scale, this cost is negligible. At large scale, it dominates:
123456789101112131415161718192021222324252627282930313233343536373839
-- Simple query on normalized data-- "Show order list with customer names" SELECT o.id, o.order_date, o.status, c.name as customer_nameFROM orders oJOIN customers c ON o.customer_id = c.idWHERE o.status = 'pending'ORDER BY o.order_date DESCLIMIT 100; -- This requires:-- 1. Index scan on orders (status, order_date)-- 2. Index lookup on customers (id) for EACH row-- 3. Nested loop or hash join -- At 100M orders, even indexed, this is expensive-- The join is executed 100 times (once per result row) -- With denormalized data:CREATE TABLE orders ( id UUID PRIMARY KEY, order_date TIMESTAMP, status VARCHAR(50), customer_id UUID, customer_name VARCHAR(255) -- Denormalized!); SELECT id, order_date, status, customer_nameFROM ordersWHERE status = 'pending'ORDER BY order_date DESCLIMIT 100; -- Single table scan, no joins-- Can be 10-100x faster at scaleEvery piece of denormalized data must be kept in sync. customer_name in orders must be updated when customers.name changes. This is where bugs hide. Denormalization trades write complexity and consistency risk for read performance.
Denormalization takes many forms, each with specific use cases and trade-offs.
Pattern 1: Cached Aggregates
Store computed values instead of recalculating on every read:
1234567891011121314
-- Instead of: SELECT COUNT(*) FROM orders WHERE customer_id = ?-- (scans order table every time) ALTER TABLE customers ADD COLUMN order_count INTEGER DEFAULT 0;ALTER TABLE customers ADD COLUMN total_spent DECIMAL(10,2) DEFAULT 0; -- Update on order creation (application code or trigger)UPDATE customers SET order_count = order_count + 1, total_spent = total_spent + :order_totalWHERE id = :customer_id; -- Now this is instant:SELECT order_count, total_spent FROM customers WHERE id = ?;Pattern 2: Embedded/Copied Fields
Copy frequently-accessed fields from related entities:
12345678910111213141516
-- Order needs customer name for display-- Instead of joining every time, store it CREATE TABLE orders ( id UUID PRIMARY KEY, customer_id UUID REFERENCES customers(id), customer_name VARCHAR(255), -- Copied from customer customer_email VARCHAR(255), -- Copied from customer -- ... other fields); -- Sync requirement: When customer changes name, update their orders-- Could be done by:-- 1. Application code on customer update-- 2. Database trigger-- 3. Async event handlerPattern 3: Precomputed/Materialized Views
Create read-optimized tables or views for specific query patterns:
123456789101112131415161718192021
-- Complex dashboard querySELECT DATE_TRUNC('day', o.order_date) as day, COUNT(*) as order_count, SUM(o.total) as revenue, COUNT(DISTINCT o.customer_id) as unique_customersFROM orders oGROUP BY DATE_TRUNC('day', o.order_date)ORDER BY day DESC; -- At scale, this is slow. Create a summary table: CREATE TABLE daily_order_summary ( day DATE PRIMARY KEY, order_count INTEGER NOT NULL, revenue DECIMAL(15,2) NOT NULL, unique_customers INTEGER NOT NULL); -- Populate periodically (batch job) or incrementally (on order creation)-- Dashboard queries against summary table are instantPattern 4: Historical Snapshots
Preserve state at a point in time rather than referencing current state:
12345678910111213141516
-- Order items need price AT ORDER TIME, not current price CREATE TABLE order_items ( id UUID PRIMARY KEY, order_id UUID REFERENCES orders(id), product_id UUID REFERENCES products(id), quantity INTEGER NOT NULL, -- Snapshot fields (denormalized for business reasons) product_name_at_order VARCHAR(255) NOT NULL, unit_price_at_order DECIMAL(10,2) NOT NULL, product_sku_at_order VARCHAR(50) NOT NULL); -- Product changes don't affect historical orders-- This is REQUIRED denormalization for business correctnessPattern 5: JSON/Document Embedding
Embed related data as JSON for flexible, schema-less storage:
12345678910111213141516171819202122232425
-- Shipping/billing addresses as embedded JSON CREATE TABLE orders ( id UUID PRIMARY KEY, customer_id UUID REFERENCES customers(id), -- Embedded JSON for complex, variable structure shipping_address JSONB NOT NULL, billing_address JSONB, -- Example: -- shipping_address = { -- "name": "John Doe", -- "street": "123 Main St", -- "city": "Portland", -- "state": "OR", -- "zip": "97201", -- "country": "USA" -- } order_metadata JSONB -- Flexible extra fields); -- Benefits: Single row read, flexible schema-- Drawbacks: Can't easily query/join on JSON fields (though PostgreSQL helps)Database denormalization forces code to manage consistency. This has significant implications for your domain model and data access layer.
Managing Cached Aggregates:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
// Customer has denormalized order_count and total_spent class Customer { private id: string; private name: string; private orderCount: number; private totalSpent: Money; // These are denormalized - must be updated on order changes _incrementOrderCount(): void { this.orderCount++; } _decrementOrderCount(): void { this.orderCount = Math.max(0, this.orderCount - 1); } _addToTotalSpent(amount: Money): void { this.totalSpent = this.totalSpent.add(amount); } _subtractFromTotalSpent(amount: Money): void { this.totalSpent = this.totalSpent.subtract(amount); }} class Order { constructor( private customer: Customer, private items: OrderItem[] ) { // Update denormalized aggregates on creation this.customer._incrementOrderCount(); this.customer._addToTotalSpent(this.getTotal()); } cancel(): void { if (this.status === OrderStatus.CANCELLED) return; this.status = OrderStatus.CANCELLED; // Update denormalized aggregates on cancellation this.customer._decrementOrderCount(); this.customer._subtractFromTotalSpent(this.getTotal()); } // Order total changes affect customer aggregate addItem(product: Product, quantity: number): void { const itemTotal = product.price.multiply(quantity); this.items.push(new OrderItem(this, product, quantity)); this._total = this._total.add(itemTotal); // Update customer aggregate this.customer._addToTotalSpent(itemTotal); }}Managing Copied Fields:
123456789101112131415161718192021222324252627282930313233343536
// Order stores customer_name as a denormalized field// Must propagate changes class Customer { private name: string; private orders: Order[] = []; // Or: access via repository changeName(newName: string): void { const oldName = this.name; this.name = newName; // Propagate to denormalized copies // Option 1: Direct update (if orders loaded) for (const order of this.orders) { order._updateCustomerName(newName); } // Option 2: Emit domain event (preferred for decoupling) this.addDomainEvent(new CustomerNameChanged(this.id, oldName, newName)); }} // Event handler updates denormalized dataclass CustomerNameChangedHandler { constructor(private orderRepo: OrderRepository) {} async handle(event: CustomerNameChanged): Promise<void> { // Bulk update all orders for this customer await this.orderRepo.updateCustomerName( event.customerId, event.newName ); // SQL: UPDATE orders SET customer_name = ? WHERE customer_id = ? }}Managing Historical Snapshots:
1234567891011121314151617181920212223242526272829303132333435
// OrderItem captures product state at order time class OrderItem { private productId: string; private quantity: number; // Snapshot fields - set at creation, never updated private readonly productNameAtOrder: string; private readonly productSkuAtOrder: string; private readonly unitPriceAtOrder: Money; constructor(order: Order, product: Product, quantity: number) { this.productId = product.id; this.quantity = quantity; // Capture current state as snapshot this.productNameAtOrder = product.name; this.productSkuAtOrder = product.sku; this.unitPriceAtOrder = product.currentPrice; // These never change - even if product is updated } // Use snapshot for display/calculation getDisplayName(): string { return this.productNameAtOrder; } getSubtotal(): Money { return this.unitPriceAtOrder.multiply(this.quantity); } // Can still access current product if needed // (but this requires loading product, which may have changed)}The most common bugs in denormalized systems: forgetting to update copied data in some code path, race conditions between updates, and aggregates drifting from true values over time. Always have reconciliation jobs that verify denormalized data matches the source of truth.
When you denormalize, you create a synchronization responsibility. Several patterns help manage this:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
// Strategy 1: Synchronous in-transactionclass OrderService { async createOrder(customerId: string, items: CartItem[]): Promise<Order> { return this.unitOfWork.transaction(async (tx) => { const customer = await this.customerRepo.findById(customerId, tx); const order = new Order(customer, items); await this.orderRepo.save(order, tx); // Update denormalized aggregate in same transaction customer._incrementOrderCount(); customer._addToTotalSpent(order.getTotal()); await this.customerRepo.save(customer, tx); // Both updates succeed or both fail return order; }); }} // Strategy 2: Domain events (async)class Order { private domainEvents: DomainEvent[] = []; constructor(customer: Customer, items: CartItem[]) { // ... create order // Emit event instead of direct update this.domainEvents.push(new OrderCreated( this.id, customer.id, this.getTotal() )); } getDomainEvents(): DomainEvent[] { return this.domainEvents; }} class OrderCreatedHandler { async handle(event: OrderCreated): Promise<void> { // Update denormalized data asynchronously await this.customerRepo.incrementOrderCount(event.customerId); await this.customerRepo.addToTotalSpent(event.customerId, event.total); }} // Strategy 3: Periodic reconciliationclass CustomerAggregatesReconciliation { async reconcile(): Promise<void> { // Recalculate from source of truth const aggregates = await this.db.query(` SELECT customer_id, COUNT(*) as true_order_count, SUM(total) as true_total_spent FROM orders WHERE status != 'cancelled' GROUP BY customer_id `); // Compare with stored values and fix discrepancies for (const agg of aggregates) { await this.db.query(` UPDATE customers SET order_count = ${agg.true_order_count}, total_spent = ${agg.true_total_spent} WHERE id = ${agg.customer_id} AND (order_count != ${agg.true_order_count} OR total_spent != ${agg.true_total_spent}) `); } }}| Strategy | Consistency | Performance | Complexity | Best For |
|---|---|---|---|---|
| In-transaction | Strong | Lower write perf | Medium | Critical data, low write volume |
| Database triggers | Strong | Lower write perf | Low (code), high (debug) | Simple cases, DBA-managed |
| Domain events | Eventual | Good | Higher | Microservices, decoupled systems |
| CDC | Eventual | Good | High infrastructure | Cross-system, legacy integration |
| Periodic reconciliation | Periodic | Best write perf | Low | Analytics, non-critical aggregates |
Use this framework to decide when to normalize and when to denormalize:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// Decision example: Product listing page // Normalized approach:async function getProductListing(categoryId: string): Promise<ProductDTO[]> { const products = await prisma.product.findMany({ where: { categoryId }, include: { category: true, reviews: { select: { rating: true } }, // For avg calculation stocks: true, // For availability images: { take: 1 } // For thumbnail } }); return products.map(p => ({ id: p.id, name: p.name, price: p.price, categoryName: p.category.name, // Joined avgRating: calculateAvg(p.reviews.map(r => r.rating)), // Computed inStock: p.stocks.some(s => s.quantity > 0), // Computed thumbnailUrl: p.images[0]?.url }));} // Problem: At 10,000 products, 4+ joins per query, avg calculation// for each product = slow page loads // Denormalized approach:// Product table stores: category_name, avg_rating, in_stock, thumbnail_url async function getProductListingOptimized(categoryId: string): Promise<ProductDTO[]> { return prisma.product.findMany({ where: { categoryId }, select: { id: true, name: true, price: true, categoryName: true, // Denormalized avgRating: true, // Precomputed inStock: true, // Precomputed thumbnailUrl: true // Denormalized } });} // Trade-off: When category name changes, must update all products in it// When review is added, must recalculate avgRating// When stock changes, must update inStock flagDon't denormalize based on assumption. Measure actual query performance in production-like conditions. Many slow queries can be fixed with proper indexing or query optimization rather than schema changes. Denormalize only when you have data proving it's necessary.
Real-world systems rarely use pure normalization or pure denormalization. Effective architectures are hybrid—normalized core with selective denormalization for specific needs.
CQRS (Command Query Responsibility Segregation):
Maintain normalized model for writes, denormalized model for reads:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
// CQRS: Separate write and read models // WRITE SIDE: Normalized, consistent, domain-richclass Order { // Full domain entity with behavior private customer: Customer; private items: OrderItem[]; constructor(customer: Customer, items: OrderItemData[]) { this.validateItems(items); this.items = items.map(d => new OrderItem(this, d)); // Publish event for read side to consume this.publishEvent(new OrderCreated(this.toSnapshot())); }} class OrderRepository { // Writes to normalized tables async save(order: Order): Promise<void> { await this.db.orders.insert(order.toRecord()); await this.db.orderItems.insertMany(order.items.map(i => i.toRecord())); }} // READ SIDE: Denormalized, fast, query-optimizedinterface OrderReadModel { id: string; orderDate: Date; customerName: string; // Denormalized customerEmail: string; // Denormalized itemCount: number; // Precomputed total: number; // Precomputed status: string;} class OrderReadRepository { // Reads from denormalized view/table async findForListing(filters: OrderFilters): Promise<OrderReadModel[]> { return this.db.orderReadModels.find(filters); }} // Projection handler updates read model when events occurclass OrderProjection { async onOrderCreated(event: OrderCreated): Promise<void> { await this.db.orderReadModels.insert({ id: event.orderId, orderDate: event.orderDate, customerName: event.customerName, customerEmail: event.customerEmail, itemCount: event.items.length, total: event.items.reduce((sum, i) => sum + i.subtotal, 0), status: 'pending' }); }}Database + Search Index:
Normalized database for transactions, denormalized search index for queries:
12345678910111213141516171819202122232425262728293031323334353637
// Database stores normalized data (source of truth)// Elasticsearch/Algolia stores denormalized data (search) class ProductService { async createProduct(data: ProductData): Promise<Product> { // 1. Save to normalized database const product = await this.productRepo.save(new Product(data)); // 2. Index to search with denormalized data await this.searchIndex.index({ id: product.id, name: product.name, description: product.description, categoryName: product.category.name, // Flattened brandName: product.brand.name, // Flattened price: product.price, avgRating: 0, // Precomputed, starts at 0 reviewCount: 0, tags: product.tags.map(t => t.name), // Flattened array inStock: product.inventory > 0 }); return product; } // Search uses denormalized index async search(query: string): Promise<ProductSearchResult[]> { return this.searchIndex.search(query); // Fast, pre-joined, no database calls } // Detailed view uses normalized database async getProductDetail(id: string): Promise<ProductDetail> { return this.productRepo.findWithAllRelations(id); // Full, up-to-date, complete data }}Hybrid approaches often mean the read model lags behind writes. When a product name changes, search results might show the old name for seconds to minutes. Decide if this is acceptable for each use case. Critical displays should query the normalized source.
Module Complete:
With this page, we've completed the Data Modeling for LLD module. You now understand:
These foundational concepts underpin every persistence layer you'll ever build. Master them, and your data access code will be clean, performant, and maintainable.
You've mastered Data Modeling for LLD—from entity relationships through loading strategies to normalization trade-offs. You now have the conceptual framework to make informed decisions about data structure, relationship representation, and the code patterns that bridge your domain model to database storage.