Loading content...
A database schema is never finished until it's been critically reviewed. Design review is the systematic process of evaluating a schema against requirements, identifying potential issues, and validating that the design will perform adequately under real-world conditions.
In production environments, design reviews catch errors before they become expensive migrations. In interviews, the ability to critique both your own designs and those presented to you demonstrates senior-level thinking—the ability to see around corners and anticipate problems before they manifest.
By the end of this page, you will master systematic design review: comprehensive checklists, common anti-patterns to detect, performance validation techniques, and how to present and defend your designs in technical discussions and interviews.
Design reviews aren't bureaucratic overhead—they're a high-leverage investment that prevents costly problems.
| Detection Phase | Relative Fix Cost | Example Issue |
|---|---|---|
| Design Review | 1x | Missing index identified before implementation |
| Development | 5x | Wrong data type causes ORM changes |
| Testing | 15x | Constraint violation requires schema change + test rewrite |
| Staging | 30x | Performance issue requires redesign + migration planning |
| Production | 100x+ | Data integrity issue requires emergency fix, data cleanup, incident response |
Before asking others to review your design, review it yourself using a checklist. This catches obvious issues and shows respect for reviewers' time. The checklist that follows works for both self-review and peer review.
A comprehensive checklist ensures consistent, thorough reviews. Walk through each category systematically.
In interviews, mentally run through this checklist as you present your design. Proactively address each area: 'I've ensured foreign keys are indexed...', 'I'm using DECIMAL for monetary values because...' This demonstrates thoroughness.
Experienced reviewers recognize these recurring problems. Learning to detect them makes you both a better designer and reviewer.
| Anti-Pattern | Description | Problems Caused | Solution |
|---|---|---|---|
| God Table | Single table with too many columns covering multiple concepts | Update anomalies, poor query performance, unclear ownership | Decompose into properly normalized tables |
| Polymorphic Associations | Foreign key that can reference multiple tables | No referential integrity, complex queries | Use proper junction tables or single-table inheritance |
| Entity-Attribute-Value (EAV) | Key-value pairs instead of proper columns | No type safety, complex queries, poor performance | Use proper columns or document database if truly dynamic |
| Implicit Nulls | NULL used to convey special meaning | Ambiguous semantics, three-valued logic bugs | Use explicit status columns or separate tables |
| Soft Delete Only | is_deleted flag without considering implications | Broken unique constraints, complex queries everywhere | Implement properly with partial indexes and clear policy |
| Missing Audit Trail | No tracking of who/when for changes | Compliance issues, debugging difficulty | Add audit columns or use audit logging table |
| Stringly Typed | Using VARCHAR for structured data (JSON, dates) | No validation, parsing overhead, storage waste | Use appropriate types: JSON, DATE, ENUM |
EAV is particularly common and problematic. It looks flexible but causes severe issues.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- ANTI-PATTERN: Entity-Attribute-Value (don't do this)CREATE TABLE product_attributes ( product_id INT REFERENCES product(product_id), attribute_name VARCHAR(100), attribute_value VARCHAR(255), -- Everything becomes a string! PRIMARY KEY (product_id, attribute_name)); -- Problems with EAV:-- 1. No type safety: "2024-01-15" and "invalid_date" both accepted-- 2. No constraints: Can't enforce "price > 0" -- 3. Complex queries to get product details: SELECT p.product_id, p.name, MAX(CASE WHEN a.attribute_name = 'color' THEN a.attribute_value END) AS color, MAX(CASE WHEN a.attribute_name = 'size' THEN a.attribute_value END) AS size, MAX(CASE WHEN a.attribute_name = 'weight' THEN a.attribute_value END) AS weightFROM product pLEFT JOIN product_attributes a ON p.product_id = a.product_idGROUP BY p.product_id, p.name;-- This becomes a nightmare with 50 attributes! -- BETTER: Proper columns for known attributesCREATE TABLE product ( product_id SERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL, color VARCHAR(50), size VARCHAR(20), weight_kg DECIMAL(8, 3) CHECK (weight_kg > 0) -- Additional columns as needed); -- If truly dynamic attributes are needed, use JSONB (PostgreSQL):CREATE TABLE product ( product_id SERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL, -- Known, frequently-queried attributes as columns category_id INT NOT NULL, base_price DECIMAL(10, 2) NOT NULL, -- Dynamic attributes in validated JSON attributes JSONB DEFAULT '{}'::jsonb, -- Can still index and query JSON: -- CREATE INDEX idx_product_color ON product((attributes->>'color')););EAV often looks attractive when requirements include 'arbitrary user-defined attributes.' Before accepting EAV, ask: (1) Are the attributes truly unbounded, or is there a reasonable fixed set? (2) Would a document database (MongoDB) be more appropriate? (3) Can JSONB columns provide flexibility without EAV's downsides?
Before deploying a schema, validate that it will meet performance requirements. This involves estimating data volumes, analyzing query plans, and stress testing.
Estimate storage requirements and record counts to identify potential scale issues.
CAPACITY ESTIMATION TEMPLATE Current Volumes:├── Customers: 100,000├── Products: 10,000├── Orders/day: 5,000└── Order items/order: 3 avg Projected 3-Year Growth (assuming 50% YoY):├── Customers: 337,500├── Products: 25,000├── Total Orders: ~8.2 Million (5000 × 365 × 3 with growth)└── Order Items: ~24.6 Million Row Size Estimates:├── Customer: ~500 bytes (with address JSON)├── Product: ~1 KB (with description)├── Order: ~800 bytes (with address snapshots)└── Order Item: ~200 bytes Storage Projections:├── Customers: ~160 MB├── Products: ~25 MB├── Orders: ~6.5 GB├── Order Items: ~5 GB└── Total (data only): ~12 GB└── With indexes (2x): ~24 GB Query Volume Projections:├── Product searches: 500,000/day (read)├── Order creation: 5,000/day (write)├── Order lookups: 50,000/day (read)└── Dashboard queries: 10,000/day (aggregate) IMPLICATIONS:├── Order and OrderItem tables need partition strategy after 2 years├── Product search needs optimized indexing (full-text?)├── Dashboard queries might benefit from materialized views└── Consider read replicas for search workloadFor critical queries, analyze execution plans before deployment.
123456789101112131415161718192021222324252627282930313233343536373839
-- Analyze query execution plan (PostgreSQL)EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT o.order_id, o.order_date, c.email, SUM(oi.line_total) as totalFROM "order" oJOIN customer c ON o.customer_id = c.customer_idJOIN order_item oi ON o.order_id = oi.order_idWHERE o.status = 'pending' AND o.ordered_at > CURRENT_DATE - INTERVAL '7 days'GROUP BY o.order_id, o.order_date, c.email; /* ANALYZE OUTPUT (look for red flags): GOOD signs:✓ "Index Scan" or "Index Only Scan" on filtered columns✓ "Nested Loop" with small inner table✓ Low "actual rows" relative to table size✓ "Buffers: shared hit" (reading from cache) BAD signs (investigate):✗ "Seq Scan" on large tables with filters✗ "Sort" without index support✗ High "actual rows" with low "rows removed by filter"✗ "Buffers: shared read" (disk I/O)✗ "Hash Join" on very large tables (memory pressure)*/ -- If sequential scan is used, verify index exists:SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'order'; -- If index exists but not used, check statistics:ANALYZE "order"; -- Update table statistics -- Consider adding index if missing:CREATE INDEX idx_order_status_date ON "order"(status, ordered_at DESC) WHERE status IN ('pending', 'processing');Query plan analysis is only meaningful with production-like data volumes. Indexes that look great with 1000 rows may behave differently with 10 million. Always test with realistic data quantities and distributions.
Formal design reviews follow a structured process to maximize effectiveness. Whether reviewing your own work or participating in a team review, this structure ensures thorough coverage.
When receiving challenging feedback, use 'Yes, and...' rather than 'But...'. 'Yes, you're right that this needs an index, and I was also thinking we might want a partial index to exclude inactive records.' This acknowledges the feedback while demonstrating further thinking.
Whether in a formal review or an interview, how you present your design is almost as important as the design itself. Structure your presentation to demonstrate systematic thinking.
DESIGN PRESENTATION STRUCTURE 1. REQUIREMENTS SUMMARY (Brief) "We need to model an e-commerce system with customers, products, orders, and reviews. Key requirements include: multi-address customers, product variants, order history preservation, and 99.9% read availability. Expected scale is 1M customers and 50K orders/day." 2. KEY ENTITIES (High Level) "I've identified six core entities: Customer, Address, Product, ProductVariant, Order, and OrderItem. Let me walk through each..." 3. RELATIONSHIP MAPPING "Customers have a one-to-many relationship with Addresses and Orders. Orders have a one-to-many relationship with OrderItems. Products have one-to-many with ProductVariants..." 4. CRITICAL DESIGN DECISIONS (Highlight Trade-offs) "I made several key decisions: - DECISION: Storing address snapshot in Order rather than FK RATIONALE: Preserves historical accuracy; address may change - DECISION: Denormalizing product name into OrderItem RATIONALE: Products may be deleted; we need historical record - DECISION: Using JSONB for variant attributes RATIONALE: Variant attributes differ by product type" 5. CONSTRAINTS AND INTEGRITY "I've added CHECK constraints for positive prices, valid status transitions, and total amount calculations. Foreign keys use CASCADE for dependent entities like OrderItems, RESTRICT for master data like Products." 6. INDEXING STRATEGY "Based on the query patterns, I've indexed: - customer_id on Orders (order lookup by customer) - status + ordered_at on Orders (dashboard queries) - product_id on OrderItems (inventory queries) - Partial index on pending orders for fulfillment" 7. KNOWN LIMITATIONS/FUTURE CONSIDERATIONS "This design assumes single-currency. For multi-currency, we'd need to add currency_code fields. I've noted that the Order table may need partitioning after 2 years based on volume projections."In reviews and interviews, you'll face challenging questions about your decisions. Effective defense requires preparation and structured responses.
CHALLENGE: "Why didn't you use UUID for primary keys?" WEAK RESPONSE: "I prefer integers" (no reasoning) STRONG RESPONSE: "I considered UUIDs and chose BIGINT SERIAL for several reasons:1. We have a single write master, so auto-increment collision isn't a concern2. BIGINT is 8 bytes vs UUID's 16 bytes—significant at our projected scale3. Integer comparison is faster for joins4. Human-readable order IDs are easier for customer serviceFor the specific case of distributed writes, UUIDs would be preferable." --- CHALLENGE: "This seems over-normalized. Won't the JOINs be slow?" WEAK RESPONSE: "Normalization is best practice" (dogmatic) STRONG RESPONSE: "I started normalized because data integrity is critical for financial records. However, I've identified the specific access patterns:- Product listings: 50K/day, need < 100ms- Order creation: 5K/day, can tolerate 500ms For product listings, I'd add a materialized view refreshed every 5 minutes.The order creation path uses only 3 JOINs with indexed FKs—EXPLAIN ANALYZE shows ~10ms. I'm comfortable with this until volume increases 10x." --- CHALLENGE: "Why store the address in Order instead of just linking to CustomerAddress?" WEAK RESPONSE: "Because addresses change" (incomplete) STRONG RESPONSE: "This was a deliberate trade-off. The alternatives were:1. FK to CustomerAddress (current address might change)2. Soft-delete addresses, always add new (complex, storage-heavy)3. Temporal address versioning (complex, overkill for our needs)4. Copy address at order time (what I chose) I chose copying because:- Legal requirement to show delivery address at time of order- Customer may delete addresses but we need order history- Simplifies queries—no JOIN for order details- Storage cost is acceptable (JSON adds ~300 bytes per order) The trade-off is that address corrections require manual data fix."Structure your defense using STAR: Situation (context/requirements), Task (the decision to be made), Action (what you decided and alternatives considered), Result (why this choice is superior for the given context). This demonstrates systematic thinking.
A common interview exercise is reviewing someone else's schema. Let's practice identifying issues in a deliberately flawed design.
123456789101112131415161718192021222324252627282930
-- SCHEMA TO REVIEW: Simple E-Commerce-- Find the issues! CREATE TABLE Users ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(50), email VARCHAR(50), address VARCHAR(100), creditCard VARCHAR(20), balance FLOAT); CREATE TABLE Products ( id INT PRIMARY KEY, name VARCHAR(50), price FLOAT, category VARCHAR(50), categoryDescription TEXT); CREATE TABLE Orders ( id INT PRIMARY KEY AUTO_INCREMENT, userId INT, productId INT, productName VARCHAR(50), productPrice FLOAT, quantity INT, orderDate VARCHAR(20), status VARCHAR(10));You've now completed a comprehensive journey through database schema design for interviews and production systems:
These skills combine to make you effective at the most common interview design exercises and invaluable in production database work.
Congratulations! You've mastered schema design from requirements through review. You can now approach database design problems systematically, make and defend trade-off decisions, and produce production-quality schemas. The next module covers SQL Query Writing—applying your schema knowledge to write complex, efficient queries.