Schema Design Problems - Learning Module

Loading content...

0/241

Design Review

The Final Checkpoint

A database schema is never finished until it's been critically reviewed. Design review is the systematic process of evaluating a schema against requirements, identifying potential issues, and validating that the design will perform adequately under real-world conditions.

In production environments, design reviews catch errors before they become expensive migrations. In interviews, the ability to critique both your own designs and those presented to you demonstrates senior-level thinking—the ability to see around corners and anticipate problems before they manifest.

What You Will Learn

By the end of this page, you will master systematic design review: comprehensive checklists, common anti-patterns to detect, performance validation techniques, and how to present and defend your designs in technical discussions and interviews.

Why Design Reviews Matter

Design reviews aren't bureaucratic overhead—they're a high-leverage investment that prevents costly problems.

Design Review Benefits

•Early problem detection — Issues found during review cost 10-100x less to fix than those found in production.
•Knowledge sharing — Reviews spread understanding of the schema across the team; no single-point-of-failure knowledge.
•Requirement validation — Formal review ensures all requirements are actually addressed in the design.
•Performance anticipation — Review catches potential bottlenecks before they affect users.
•Consistency enforcement — Reviews ensure the design follows organizational standards and patterns.
•Documentation creation — The review process generates documentation as a byproduct.

Cost of Issues by Detection Phase
Detection Phase	Relative Fix Cost	Example Issue
Design Review	1x	Missing index identified before implementation
Development	5x	Wrong data type causes ORM changes
Testing	15x	Constraint violation requires schema change + test rewrite
Staging	30x	Performance issue requires redesign + migration planning
Production	100x+	Data integrity issue requires emergency fix, data cleanup, incident response

Self-Review First

Before asking others to review your design, review it yourself using a checklist. This catches obvious issues and shows respect for reviewers' time. The checklist that follows works for both self-review and peer review.

The Design Review Checklist

A comprehensive checklist ensures consistent, thorough reviews. Walk through each category systematically.

Requirements Traceability

•Every requirement has a design element — Can you trace each requirement to specific tables/columns?
•No orphan design elements — Is every table/column justified by a requirement?
•All queries are supported — Can each required query be expressed against this schema efficiently?
•Future requirements considered — Will likely future needs require major restructuring?

Data Integrity

•Primary keys defined — Does every table have an appropriate primary key?
•Foreign keys enforced — Are all relationships protected by foreign key constraints?
•NOT NULL where required — Are mandatory fields marked NOT NULL?
•Check constraints present — Are business rules enforced at the database level?
•Unique constraints applied — Are natural keys protected as UNIQUE?
•Referential actions specified — Is ON DELETE/ON UPDATE behavior explicit and appropriate?

Normalization

•Target normal form achieved — Is the schema in 3NF/BCNF as intended?
•Denormalization justified — Is each denormalization decision documented with rationale?
•Synchronization mechanism defined — For denormalized data, how is consistency maintained?
•No unnecessary redundancy — Is any data duplicated without justification?

Data Types and Precision

•Appropriate types selected — Are data types correctly matched to data characteristics?
•Precision sufficient — Are numeric types precise enough (especially for monetary values)?
•String lengths reasonable — Are VARCHAR limits based on actual requirements?
•Timestamps timezone-aware — Are points in time stored with timezone?
•Encoding considered — Will the schema handle international characters correctly?

Indexing

•Foreign keys indexed — Do all FK columns have indexes for join performance?
•Query patterns covered — Are common WHERE clause columns indexed?
•Composite indexes ordered correctly — Is the leftmost column the most frequently filtered?
•No over-indexing — Are there indexes that won't be used?
•Write impact considered — Will index maintenance cause write performance issues?

Naming and Conventions

•Consistent naming — Do all names follow the same convention (snake_case, etc.)?
•Self-documenting names — Can someone understand the purpose from the name alone?
•Reserved words avoided — Are SQL reserved words avoided or properly quoted?
•Constraints named — Do constraints have meaningful names for error messages?

Checklist as Interview Tool

In interviews, mentally run through this checklist as you present your design. Proactively address each area: 'I've ensured foreign keys are indexed...', 'I'm using DECIMAL for monetary values because...' This demonstrates thoroughness.

Common Design Anti-Patterns

Experienced reviewers recognize these recurring problems. Learning to detect them makes you both a better designer and reviewer.

Schema Design Anti-Patterns
Anti-Pattern	Description	Problems Caused	Solution
God Table	Single table with too many columns covering multiple concepts	Update anomalies, poor query performance, unclear ownership	Decompose into properly normalized tables
Polymorphic Associations	Foreign key that can reference multiple tables	No referential integrity, complex queries	Use proper junction tables or single-table inheritance
Entity-Attribute-Value (EAV)	Key-value pairs instead of proper columns	No type safety, complex queries, poor performance	Use proper columns or document database if truly dynamic
Implicit Nulls	NULL used to convey special meaning	Ambiguous semantics, three-valued logic bugs	Use explicit status columns or separate tables
Soft Delete Only	is_deleted flag without considering implications	Broken unique constraints, complex queries everywhere	Implement properly with partial indexes and clear policy
Missing Audit Trail	No tracking of who/when for changes	Compliance issues, debugging difficulty	Add audit columns or use audit logging table
Stringly Typed	Using VARCHAR for structured data (JSON, dates)	No validation, parsing overhead, storage waste	Use appropriate types: JSON, DATE, ENUM

Deep Dive: The Entity-Attribute-Value Anti-Pattern

EAV is particularly common and problematic. It looks flexible but causes severe issues.

eav_antipattern
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- ANTI-PATTERN: Entity-Attribute-Value (don't do this)
CREATE TABLE product_attributes (
    product_id       INT REFERENCES product(product_id),
    attribute_name   VARCHAR(100),
    attribute_value  VARCHAR(255),  -- Everything becomes a string!
    PRIMARY KEY (product_id, attribute_name)
);
 
-- Problems with EAV:
-- 1. No type safety: "2024-01-15" and "invalid_date" both accepted
-- 2. No constraints: Can't enforce "price > 0" 
-- 3. Complex queries to get product details:
 
SELECT p.product_id, p.name,
    MAX(CASE WHEN a.attribute_name = 'color' THEN a.attribute_value END) AS color,
    MAX(CASE WHEN a.attribute_name = 'size' THEN a.attribute_value END) AS size,
    MAX(CASE WHEN a.attribute_name = 'weight' THEN a.attribute_value END) AS weight
FROM product p
LEFT JOIN product_attributes a ON p.product_id = a.product_id
GROUP BY p.product_id, p.name;
-- This becomes a nightmare with 50 attributes!
 
-- BETTER: Proper columns for known attributes
CREATE TABLE product (
    product_id   SERIAL PRIMARY KEY,
    name         VARCHAR(200) NOT NULL,
    color        VARCHAR(50),
    size         VARCHAR(20),
    weight_kg    DECIMAL(8, 3) CHECK (weight_kg > 0)
    -- Additional columns as needed
);
 
-- If truly dynamic attributes are needed, use JSONB (PostgreSQL):
CREATE TABLE product (
    product_id       SERIAL PRIMARY KEY,
    name             VARCHAR(200) NOT NULL,
    -- Known, frequently-queried attributes as columns
    category_id      INT NOT NULL,
    base_price       DECIMAL(10, 2) NOT NULL,
    -- Dynamic attributes in validated JSON
    attributes       JSONB DEFAULT '{}'::jsonb,
    
    -- Can still index and query JSON:
    -- CREATE INDEX idx_product_color ON product((attributes->>'color'));
);

When EAV Seems Tempting

EAV often looks attractive when requirements include 'arbitrary user-defined attributes.' Before accepting EAV, ask: (1) Are the attributes truly unbounded, or is there a reasonable fixed set? (2) Would a document database (MongoDB) be more appropriate? (3) Can JSONB columns provide flexibility without EAV's downsides?

Performance Validation

Before deploying a schema, validate that it will meet performance requirements. This involves estimating data volumes, analyzing query plans, and stress testing.

Capacity Estimation

Estimate storage requirements and record counts to identify potential scale issues.

capacity_estimation

Estimation Template

CAPACITY ESTIMATION TEMPLATE
 
Current Volumes:
├── Customers:        100,000
├── Products:          10,000
├── Orders/day:         5,000
└── Order items/order:      3 avg
 
Projected 3-Year Growth (assuming 50% YoY):
├── Customers:        337,500
├── Products:          25,000
├── Total Orders:   ~8.2 Million (5000 × 365 × 3 with growth)
└── Order Items:    ~24.6 Million
 
Row Size Estimates:
├── Customer:          ~500 bytes (with address JSON)
├── Product:           ~1 KB (with description)
├── Order:             ~800 bytes (with address snapshots)
└── Order Item:        ~200 bytes
 
Storage Projections:
├── Customers:         ~160 MB
├── Products:          ~25 MB
├── Orders:            ~6.5 GB
├── Order Items:       ~5 GB
└── Total (data only): ~12 GB
└── With indexes (2x): ~24 GB
 
Query Volume Projections:
├── Product searches:  500,000/day (read)
├── Order creation:    5,000/day (write)
├── Order lookups:     50,000/day (read)
└── Dashboard queries: 10,000/day (aggregate)
 
IMPLICATIONS:
├── Order and OrderItem tables need partition strategy after 2 years
├── Product search needs optimized indexing (full-text?)
├── Dashboard queries might benefit from materialized views
└── Consider read replicas for search workload

Query Plan Analysis

For critical queries, analyze execution plans before deployment.

query_plan_analysis
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Analyze query execution plan (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.order_id, o.order_date, c.email, 
       SUM(oi.line_total) as total
FROM "order" o
JOIN customer c ON o.customer_id = c.customer_id
JOIN order_item oi ON o.order_id = oi.order_id
WHERE o.status = 'pending'
  AND o.ordered_at > CURRENT_DATE - INTERVAL '7 days'
GROUP BY o.order_id, o.order_date, c.email;
 
/* ANALYZE OUTPUT (look for red flags):
 
GOOD signs:
✓ "Index Scan" or "Index Only Scan" on filtered columns
✓ "Nested Loop" with small inner table
✓ Low "actual rows" relative to table size
✓ "Buffers: shared hit" (reading from cache)
 
BAD signs (investigate):
✗ "Seq Scan" on large tables with filters
✗ "Sort" without index support
✗ High "actual rows" with low "rows removed by filter"
✗ "Buffers: shared read" (disk I/O)
✗ "Hash Join" on very large tables (memory pressure)
*/
 
-- If sequential scan is used, verify index exists:
SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'order';
 
-- If index exists but not used, check statistics:
ANALYZE "order";  -- Update table statistics
 
-- Consider adding index if missing:
CREATE INDEX idx_order_status_date 
    ON "order"(status, ordered_at DESC) 
    WHERE status IN ('pending', 'processing');

Production-Like Data

Query plan analysis is only meaningful with production-like data volumes. Indexes that look great with 1000 rows may behave differently with 10 million. Always test with realistic data quantities and distributions.

Design Review Meeting Structure

Formal design reviews follow a structured process to maximize effectiveness. Whether reviewing your own work or participating in a team review, this structure ensures thorough coverage.

Review Meeting Agenda

•Context Setting (5 min) — Designer presents business context and key requirements. What problem are we solving? What are the constraints?
•Design Walkthrough (15-20 min) — Designer presents the schema systematically. Entities first, then relationships, then constraints and indexes. Explain key decisions.
•Clarifying Questions (5-10 min) — Reviewers ask questions to understand the design. No criticism yet—just understanding.
•Critical Review (15-20 min) — Reviewers raise concerns systematically. Designer responds but doesn't debate—capture all feedback.
•Discussion & Resolution (10-15 min) — Address critical issues. Agree on changes or document items for follow-up.
•Action Items (5 min) — Document required changes, owners, and deadlines. Set re-review date if needed.

Review Roles and Responsibilities

Designer Responsibilities

•Prepare clear documentation before the review
•Explain decisions and trade-offs proactively
•Listen to feedback without defensiveness
•Distinguish between suggestions and requirements
•Follow up on action items promptly

Reviewer Responsibilities

•Review materials before the meeting
•Focus on significant issues, not style preferences
•Phrase feedback constructively
•Explain why something is problematic, not just what
•Prioritize feedback (critical vs. nice-to-have)

The 'Yes, And' Technique

When receiving challenging feedback, use 'Yes, and...' rather than 'But...'. 'Yes, you're right that this needs an index, and I was also thinking we might want a partial index to exclude inactive records.' This acknowledges the feedback while demonstrating further thinking.

Presenting Your Design Effectively

Whether in a formal review or an interview, how you present your design is almost as important as the design itself. Structure your presentation to demonstrate systematic thinking.

presentation_structure
Presentation Template
DESIGN PRESENTATION STRUCTURE
 
1. REQUIREMENTS SUMMARY (Brief)
   "We need to model an e-commerce system with customers, products, 
   orders, and reviews. Key requirements include: multi-address customers,
   product variants, order history preservation, and 99.9% read availability.
   Expected scale is 1M customers and 50K orders/day."
 
2. KEY ENTITIES (High Level)
   "I've identified six core entities: Customer, Address, Product, 
   ProductVariant, Order, and OrderItem. Let me walk through each..."
 
3. RELATIONSHIP MAPPING
   "Customers have a one-to-many relationship with Addresses and Orders.
   Orders have a one-to-many relationship with OrderItems. Products have
   one-to-many with ProductVariants..."
 
4. CRITICAL DESIGN DECISIONS (Highlight Trade-offs)
   "I made several key decisions:
    - DECISION: Storing address snapshot in Order rather than FK
      RATIONALE: Preserves historical accuracy; address may change
    
    - DECISION: Denormalizing product name into OrderItem
      RATIONALE: Products may be deleted; we need historical record
    
    - DECISION: Using JSONB for variant attributes
      RATIONALE: Variant attributes differ by product type"
 
5. CONSTRAINTS AND INTEGRITY
   "I've added CHECK constraints for positive prices, valid status transitions,
   and total amount calculations. Foreign keys use CASCADE for dependent
   entities like OrderItems, RESTRICT for master data like Products."
 
6. INDEXING STRATEGY
   "Based on the query patterns, I've indexed:
    - customer_id on Orders (order lookup by customer)
    - status + ordered_at on Orders (dashboard queries)
    - product_id on OrderItems (inventory queries)
    - Partial index on pending orders for fulfillment"
 
7. KNOWN LIMITATIONS/FUTURE CONSIDERATIONS
   "This design assumes single-currency. For multi-currency, we'd need
   to add currency_code fields. I've noted that the Order table may
   need partitioning after 2 years based on volume projections."

Presentation Best Practices

•Start with context — Remind reviewers of the problem before diving into solution
•Use visuals — ER diagrams are more effective than DDL for initial overview
•Highlight decisions — Call out where you made deliberate trade-offs
•Acknowledge limitations — Proactively mention known issues or future considerations
•Invite questions — Pause after major sections for clarification
•Own uncertainty — If you're unsure about something, say so rather than bluffing

Defending Design Decisions

In reviews and interviews, you'll face challenging questions about your decisions. Effective defense requires preparation and structured responses.

Common Challenge Questions and Responses

challenge_responses
Defense Patterns
CHALLENGE: "Why didn't you use UUID for primary keys?"
 
WEAK RESPONSE: "I prefer integers" (no reasoning)
 
STRONG RESPONSE: "I considered UUIDs and chose BIGINT SERIAL for several reasons:
1. We have a single write master, so auto-increment collision isn't a concern
2. BIGINT is 8 bytes vs UUID's 16 bytes—significant at our projected scale
3. Integer comparison is faster for joins
4. Human-readable order IDs are easier for customer service
For the specific case of distributed writes, UUIDs would be preferable."
 
---
 
CHALLENGE: "This seems over-normalized. Won't the JOINs be slow?"
 
WEAK RESPONSE: "Normalization is best practice" (dogmatic)
 
STRONG RESPONSE: "I started normalized because data integrity is critical for 
financial records. However, I've identified the specific access patterns:
- Product listings: 50K/day, need < 100ms
- Order creation: 5K/day, can tolerate 500ms
 
For product listings, I'd add a materialized view refreshed every 5 minutes.
The order creation path uses only 3 JOINs with indexed FKs—EXPLAIN ANALYZE 
shows ~10ms. I'm comfortable with this until volume increases 10x."
 
---
 
CHALLENGE: "Why store the address in Order instead of just linking to CustomerAddress?"
 
WEAK RESPONSE: "Because addresses change" (incomplete)
 
STRONG RESPONSE: "This was a deliberate trade-off. The alternatives were:
1. FK to CustomerAddress (current address might change)
2. Soft-delete addresses, always add new (complex, storage-heavy)
3. Temporal address versioning (complex, overkill for our needs)
4. Copy address at order time (what I chose)
 
I chose copying because:
- Legal requirement to show delivery address at time of order
- Customer may delete addresses but we need order history
- Simplifies queries—no JOIN for order details
- Storage cost is acceptable (JSON adds ~300 bytes per order)
 
The trade-off is that address corrections require manual data fix."

The STAR Method for Design Decisions

Structure your defense using STAR: Situation (context/requirements), Task (the decision to be made), Action (what you decided and alternatives considered), Result (why this choice is superior for the given context). This demonstrates systematic thinking.

Interview Case: Review This Design

A common interview exercise is reviewing someone else's schema. Let's practice identifying issues in a deliberately flawed design.

review_exercise
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- SCHEMA TO REVIEW: Simple E-Commerce
-- Find the issues!
 
CREATE TABLE Users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50),
    email VARCHAR(50),
    address VARCHAR(100),
    creditCard VARCHAR(20),
    balance FLOAT
);
 
CREATE TABLE Products (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    price FLOAT,
    category VARCHAR(50),
    categoryDescription TEXT
);
 
CREATE TABLE Orders (
    id INT PRIMARY KEY AUTO_INCREMENT,
    userId INT,
    productId INT,
    productName VARCHAR(50),
    productPrice FLOAT,
    quantity INT,
    orderDate VARCHAR(20),
    status VARCHAR(10)
);

Data Type Issues

•FLOAT for money (balance, price) — Use DECIMAL for exact precision
•VARCHAR(20) for credit card — Security issue! Never store full CC numbers
•VARCHAR(20) for orderDate — Use TIMESTAMP or DATE type
•VARCHAR(50) for email — Too short (RFC allows 254)

Normalization Issues

•categoryDescription in Products — Transitive dependency; Category should be separate table
•productName/productPrice in Orders — Redundant; but might be intentional for history
•Single address field — Should support multiple addresses

Constraint Issues

•No NOT NULL constraints — Most fields should be required
•No UNIQUE on email — Allows duplicate accounts
•No FOREIGN KEY constraints — userId and productId not enforced
•No CHECK constraints — price, quantity could be negative
•No valid status values — VARCHAR allows any string

Structural Issues

•Orders stores single product — Should be Order + OrderItems for multiple products
•No audit columns — created_at, updated_at missing
•No indexes defined — userId in Orders needs index
•Inconsistent naming — Users vs userId (mixed conventions)

Summary and Module Conclusion

Design Review Key Takeaways

•Reviews catch costly errors early — Investment in review pays dividends in avoided production issues.
•Use systematic checklists — Cover requirements, integrity, normalization, types, indexes, and naming.
•Know the anti-patterns — God tables, EAV, polymorphic associations, implicit nulls—recognize and avoid them.
•Validate performance assumptions — Capacity estimation and query plan analysis before deployment.
•Present designs clearly — Structure presentations from context through entities, relationships, decisions, and limitations.
•Defend decisions with reasoning — Explain alternatives considered and why your choice fits the context.

Module Conclusion: Schema Design Problems

You've now completed a comprehensive journey through database schema design for interviews and production systems:

Requirements Analysis — Systematic extraction and documentation of what the database must support
ER Modeling — Visual representation of entities, relationships, and constraints
Normalization Decisions — When to normalize for integrity vs. denormalize for performance
Table Design — Translation to physical schema with appropriate types, constraints, and indexes
Design Review — Validation and defense of design decisions

These skills combine to make you effective at the most common interview design exercises and invaluable in production database work.

Module Complete

Congratulations! You've mastered schema design from requirements through review. You can now approach database design problems systematically, make and defend trade-off decisions, and produce production-quality schemas. The next module covers SQL Query Writing—applying your schema knowledge to write complex, efficient queries.