Database Management SystemsLogical Design

Logical Design: From Concepts to Schemas

LevelIntermediate

Duration90 mins

TopicLogical Design

5 / 5

Refinement

The Art of Schema Refinement

Schema refinement is the iterative process that bridges initial logical design and production implementation. While mapping, normalization, and constraint specification provide structured methodologies, refinement is where design judgment, domain expertise, and practical experience converge.

A first-pass logical schema, even when technically correct, is rarely optimal. It may:

Miss edge cases in business rules
Fail to anticipate common query patterns
Overfit to current requirements while ignoring extensibility
Contain naming inconsistencies that cause confusion
Lack documentation needed for long-term maintenance

Refinement is not an afterthought—it's a deliberate phase.

Professional database designers schedule explicit refinement cycles: technical reviews, stakeholder walkthroughs, performance modeling, and documentation passes. Skipping refinement leads to schemas that are technically valid but practically problematic—causing friction throughout the system's lifetime.

Learning Objectives

This page covers the complete refinement lifecycle: schema review techniques, validation methodologies, naming conventions, documentation standards, stakeholder alignment processes, and the criteria for determining when logical design is complete and ready for physical design.

The Refinement Lifecycle

Effective refinement follows a structured lifecycle with distinct phases. Each phase has specific objectives, participants, and outputs.

Phase 1: Self-Review

The designer's first pass through the completed schema:

Check for mapping completeness (all entities, relationships, constraints)
Verify normalization level is appropriate
Ensure constraint coverage matches business rules
Review naming consistency
Document open questions

Phase 2: Technical Review

Peer review by other database professionals:

Fresh eyes catch blind spots
Knowledge sharing and team alignment
Standard practice enforcement
Performance concern identification

Phase 3: Domain Expert Review

Review with business stakeholders:

Validate semantic accuracy (does the schema model reality?)
Uncover missing requirements
Clarify ambiguous rules
Align terminology with business vocabulary

Phase 4: Integration Review

Review with application developers and architects:

Query pattern analysis
API design alignment
Integration point identification
Performance expectation calibration

Phase 5: Documentation and Finalization

Final documentation and handoff:

Complete data dictionary
Document design decisions and rationale
Create visual diagrams
Prepare for physical design transition

Converting Mermaid diagram...

Iterative, Not Waterfall

Refinement cycles are normal, not failures. A schema that passes first review typically indicates either excellent initial design or insufficient review rigor. Budget time for 2-3 refinement cycles on major projects.

Schema Review Techniques

Systematic review techniques catch errors that informal scanning misses. Apply these techniques methodically during self-review and recommend them for peer reviewers.

Technique 1: Entity Completeness Check

For each entity in the conceptual model:

Is there a corresponding table in the logical schema?
Are all attributes present (including components of composites)?
Is the correct primary key identified?
Are alternate keys captured as UNIQUE constraints?

Technique 2: Relationship Tracing

For each relationship in the ER model:

How is it represented (FK, junction table, merged)?
Is the cardinality correctly enforced (UNIQUE for 1:1, etc.)?
Are participation constraints reflected (NOT NULL for total)?
Are relationship attributes captured?

Technique 3: Constraint Coverage Matrix

Create a matrix mapping business rules to constraints:

List all stated business rules from requirements
For each rule, identify the enforcing constraint(s)
Flag rules with no database-level enforcement
Decide: database constraint, trigger, or application logic?

Constraint Coverage Matrix Example
Business Rule	DB Constraint	Type	Gap?
Every employee has unique email	UNIQUE (Email)	Declarative	No
Salary must be positive	CHECK (Salary > 0)	Declarative	No
Manager must be in same dept	—	Trigger needed	Yes - add trigger
Orders over $10K need approval	—	Application logic	Document decision
End date >= start date	CHECK (EndDate >= StartDate)	Declarative	No
One primary contact per customer	Partial unique index	Declarative	No

Technique 4: Sample Data Walkthrough

Create sample data and walk through scenarios:

Insert typical valid records
Attempt invalid inserts (constraint violations expected)
Perform expected queries (are they efficient/possible?)
Apply updates representing real operations
Delete records (cascade/restrict behavior correct?)

This is essentially manual testing of the schema before implementation.

Technique 5: Edge Case Analysis

Explicitly consider boundary conditions:

What if a table is empty?
What if there are millions of rows?
What if nullable columns are all NULL?
What if a user has no associated records in related tables?
What happens at rollover points (year change, sequence max)?

Checklists Are Your Friend

Create standardized review checklists for your organization. Pilots use checklists despite years of experience; database designers should too. Checklists prevent 'I forgot to check that' errors and ensure consistent review quality.

Naming Conventions and Standards

Consistent naming is not cosmetic—it directly impacts maintainability, reduces errors, and eases onboarding. Establish naming conventions before any development begins.

Table Naming Conventions:

Common approaches:

PascalCase singular: Customer, OrderLine, ProductCategory
snake_case singular: customer, order_line, product_category
Plural or singular: Choose one and be consistent

Recommendation: Match the casing of your target database's default behavior. PostgreSQL lowercases unquoted identifiers; Oracle uppercases; SQL Server preserves case.

Column Naming Conventions:

Use consistent casing (match table convention)
Primary key: TableNameID or table_name_id or just ID
Foreign keys: Match the referenced column name, or use ReferencedTableID
Boolean columns: Use prefixes like Is, Has, Can (IsActive, HasExpired)
Timestamps: Use suffixes like _at or At (CreatedAt, UpdatedAt)
Avoid reserved words (don't name columns 'Order', 'User', 'Select')

naming_convention_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
-- ==================================================
-- NAMING CONVENTION EXAMPLES
-- ==================================================
 
-- CONVENTION 1: PascalCase with explicit prefixes
-- Pros: Readable, explicit, no ambiguity
-- Cons: Verbose, more typing
 
CREATE TABLE Customer (
    CustomerID          SERIAL          PRIMARY KEY,
    CustomerName        VARCHAR(100)    NOT NULL,
    CustomerEmail       VARCHAR(254)    NOT NULL UNIQUE,
    CustomerPhone       VARCHAR(20),
    IsActive            BOOLEAN         NOT NULL DEFAULT TRUE,
    CreatedAt           TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UpdatedAt           TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE CustomerAddress (
    CustomerAddressID   SERIAL          PRIMARY KEY,
    CustomerID          INT             NOT NULL REFERENCES Customer(CustomerID),
    AddressType         VARCHAR(20)     NOT NULL,
    StreetAddress       VARCHAR(200)    NOT NULL,
    City                VARCHAR(100)    NOT NULL,
    StateProvince       VARCHAR(100),
    PostalCode          VARCHAR(20),
    CountryCode         CHAR(2)         NOT NULL,
    IsPrimary           BOOLEAN         NOT NULL DEFAULT FALSE
);
 
 
-- CONVENTION 2: snake_case with shorter names
-- Pros: Compact, Unix/Linux convention, Python-friendly
-- Cons: Less explicit, some find harder to read
 
CREATE TABLE customers (
    id                  SERIAL          PRIMARY KEY,
    name                VARCHAR(100)    NOT NULL,
    email               VARCHAR(254)    NOT NULL UNIQUE,
    phone               VARCHAR(20),
    is_active           BOOLEAN         NOT NULL DEFAULT TRUE,
    created_at          TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at          TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE customer_addresses (
    id                  SERIAL          PRIMARY KEY,
    customer_id         INT             NOT NULL REFERENCES customers(id),
    address_type        VARCHAR(20)     NOT NULL,
    street              VARCHAR(200)    NOT NULL,
    city                VARCHAR(100)    NOT NULL,
    state               VARCHAR(100),
    postal_code         VARCHAR(20),
    country_code        CHAR(2)         NOT NULL,
    is_primary          BOOLEAN         NOT NULL DEFAULT FALSE
);
 
 
-- ==================================================
-- CONSTRAINT NAMING CONVENTIONS
-- ==================================================
 
-- Pattern: {type}_{table}_{columns/description}
-- pk = primary key, fk = foreign key, uk = unique key
-- chk = check, idx = index
 
CREATE TABLE orders (
    id                  SERIAL,
    customer_id         INT             NOT NULL,
    order_date          DATE            NOT NULL DEFAULT CURRENT_DATE,
    status              VARCHAR(20)     NOT NULL DEFAULT 'pending',
    total_amount        DECIMAL(12,2)   NOT NULL,
    shipped_at          TIMESTAMP,
    
    -- Named constraints
    CONSTRAINT pk_orders 
        PRIMARY KEY (id),
    
    CONSTRAINT fk_orders_customer 
        FOREIGN KEY (customer_id) REFERENCES customers(id),
    
    CONSTRAINT chk_orders_status 
        CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    
    CONSTRAINT chk_orders_total_positive 
        CHECK (total_amount >= 0),
    
    CONSTRAINT chk_orders_ship_date 
        CHECK (shipped_at IS NULL OR shipped_at >= order_date)
);
 
-- Named indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_status ON orders(status);
 
 
-- ==================================================
-- JUNCTION TABLE NAMING
-- ==================================================
 
-- Option 1: Combined entity names (alphabetical)
CREATE TABLE customer_product (
    customer_id     INT NOT NULL,
    product_id      INT NOT NULL,
    PRIMARY KEY (customer_id, product_id)
);
 
-- Option 2: Describe the relationship
CREATE TABLE product_wishlist (
    customer_id     INT NOT NULL,
    product_id      INT NOT NULL,
    added_at        TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (customer_id, product_id)
);
 
-- Option 3: Verb-based naming
CREATE TABLE customer_follows_vendor (
    customer_id     INT NOT NULL,
    vendor_id       INT NOT NULL,
    followed_at     TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (customer_id, vendor_id)
);

Naming Convention Best Practices

•Document the conventions: Create a style guide before any development
•Be consistent: The worst convention is an inconsistent one
•Avoid abbreviations: CustomerID not CustID, unless universally understood
•Use meaningful names: ShipmentDate not SD, AccountBalance not Bal
•Name constraints: Named constraints appear in error messages
•Avoid reserved words: Even if your DB allows it, ORMs and tools may not
•Consider the ecosystem: Match conventions of your primary language/framework

Query Pattern Analysis

Logical design must anticipate how the schema will be used. Query pattern analysis ensures the schema supports required operations efficiently.

Identifying Query Patterns:

Gather query requirements from:

Application specifications: What screens/features need what data?
User stories: What questions do users need answered?
Reports: What aggregations and summaries are required?
Integration requirements: What data do external systems request?
Search requirements: What filters and search types are needed?

Query Pattern Categories:

Point lookups: Get customer by ID (primary key access)
Range queries: Orders from last 30 days (date range filters)
Aggregations: Total sales by region (GROUP BY operations)
Joins: Customer with all orders and line items (multi-table)
Text search: Find products containing 'wireless' (LIKE or full-text)
Sorting: Orders by date descending (ORDER BY)

query_pattern_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
-- ==================================================
-- QUERY PATTERN ANALYSIS EXAMPLES
-- ==================================================
 
-- Scenario: E-commerce order management
-- Analyze the logical schema for common query patterns
 
-- PATTERN 1: Customer Order History (High frequency)
-- "Show me all orders for customer X with details"
 
-- Required join path: Customer -> Orders -> OrderLines -> Products
-- Analysis: 
--   - CustomerID in Orders allows direct lookup
--   - OrderID in OrderLines allows efficient join
--   - ProductID in OrderLines references Products
 
SELECT 
    c.CustomerName,
    o.OrderID,
    o.OrderDate,
    o.Status,
    ol.Quantity,
    p.ProductName,
    ol.UnitPrice,
    ol.Quantity * ol.UnitPrice AS LineTotal
FROM Customer c
JOIN Orders o ON c.CustomerID = o.CustomerID
JOIN OrderLine ol ON o.OrderID = ol.OrderID
JOIN Product p ON ol.ProductID = p.ProductID
WHERE c.CustomerID = :customerId
ORDER BY o.OrderDate DESC;
 
-- Schema supports this: ✓
-- Potential optimization: Index on Orders(CustomerID, OrderDate DESC)
 
 
-- PATTERN 2: Product Sales Summary (Reporting)
-- "Total sales by product category this month"
 
-- Required: Products with categories, line items with amounts
SELECT 
    pc.CategoryName,
    SUM(ol.Quantity * ol.UnitPrice) AS TotalSales,
    COUNT(DISTINCT o.OrderID) AS OrderCount,
    SUM(ol.Quantity) AS UnitsSold
FROM ProductCategory pc
JOIN Product p ON pc.CategoryID = p.CategoryID
JOIN OrderLine ol ON p.ProductID = ol.ProductID
JOIN Orders o ON ol.OrderID = o.OrderID
WHERE o.OrderDate >= DATE_TRUNC('month', CURRENT_DATE)
  AND o.Status IN ('confirmed', 'shipped', 'delivered')
GROUP BY pc.CategoryID, pc.CategoryName
ORDER BY TotalSales DESC;
 
-- Schema supports this: ✓
-- Requires: Product.CategoryID (FK to ProductCategory)
-- Potential optimization: Index on Orders(OrderDate, Status)
 
 
-- PATTERN 3: Inventory Alert (Low stock items)
-- "Show products with stock below reorder threshold"
 
-- Analysis: Requires inventory tracking fields
SELECT 
    p.ProductID,
    p.ProductName,
    p.StockLevel,
    p.ReorderThreshold,
    p.ReorderQuantity
FROM Product p
WHERE p.StockLevel < p.ReorderThreshold
  AND p.IsActive = TRUE
ORDER BY p.StockLevel ASC;
 
-- Schema check: Does Product table have inventory fields?
-- If not: Need to add StockLevel, ReorderThreshold
-- Or: Separate Inventory table for larger systems
 
 
-- PATTERN 4: Customer Search (Text search)
-- "Find customers by partial name or email"
 
SELECT CustomerID, CustomerName, Email, Phone
FROM Customer
WHERE CustomerName ILIKE '%' || :searchTerm || '%'
   OR Email ILIKE '%' || :searchTerm || '%'
ORDER BY CustomerName
LIMIT 50;
 
-- Schema supports this: ✓
-- Considerations:
--   - ILIKE with leading wildcard prevents index use
--   - For high-volume search: Consider full-text search or trigram indexes
 
-- PostgreSQL trigram index for efficient partial matching:
-- CREATE EXTENSION pg_trgm;
-- CREATE INDEX idx_customer_name_trgm ON Customer USING gin (CustomerName gin_trgm_ops);
 
 
-- PATTERN 5: Statistics Query (Complex aggregation)
-- "Average order value by customer segment over time"
 
-- Requires: Customer.Segment or derived from behavior
 
SELECT 
    DATE_TRUNC('month', o.OrderDate) AS Month,
    c.CustomerSegment,
    COUNT(o.OrderID) AS OrderCount,
    AVG(o.TotalAmount) AS AvgOrderValue,
    SUM(o.TotalAmount) AS TotalRevenue
FROM Orders o
JOIN Customer c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate >= CURRENT_DATE - INTERVAL '12 months'
  AND o.Status != 'cancelled'
GROUP BY DATE_TRUNC('month', o.OrderDate), c.CustomerSegment
ORDER BY Month DESC, c.CustomerSegment;
 
-- Schema check: Does Customer have Segment field?
-- If segmentation is complex: Consider derived/computed field or separate table
 
 
-- ==================================================
-- QUERY PATTERN → SCHEMA REFINEMENT ACTIONS
-- ==================================================
 
/*
After query pattern analysis, common refinements:
 
1. ADD COLUMNS
   - Customer.Segment for segmented reporting
   - Product.StockLevel, ReorderThreshold for inventory
   - Orders.TotalAmount (denormalized for fast aggregation)
 
2. ADD INDEXES (Physical design, but identify now)
   - Orders(CustomerID, OrderDate DESC)
   - Orders(Status, OrderDate)
   - Product(CategoryID)
   - Customer(Email) - for login lookup
 
3. ADD TABLES
   - AuditLog for compliance requirements
   - CachedReport for frequently-accessed summaries
   - SearchIndex for full-text search
 
4. MODIFY STRUCTURE
   - Denormalize frequently-joined data
   - Add redundant columns for query performance
   - Create materialized views for complex reports
*/

Don't Over-Optimize Prematurely

Query pattern analysis informs design but shouldn't drive premature optimization. Capture patterns, note potential indexes, but defer most physical optimizations until you have real data and measured performance. The logical schema should be clean first.

Documentation Requirements

Documentation is not an afterthought—it's a deliverable. Undocumented schemas become legacy liabilities, understandable only to their original creator (who eventually leaves).

Essential Documentation Artifacts:

1. Data Dictionary

A complete catalog of all schema elements:

Tables: Name, purpose, business owner
Columns: Name, type, constraints, description, valid values
Keys: Primary, foreign, unique with descriptions
Relationships: Cardinality, meaning, referential actions

2. Entity-Relationship Diagram

Visual representation as authoritative reference:

Notation legend included
Major entities prominently displayed
Cardinality clearly marked
Version number and date

3. Design Decision Log

Record why choices were made:

Why surrogate keys over natural keys?
Why this normalization level?
Why triggers over application logic for certain rules?
What alternatives were considered?

documentation_templates.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# ==================================================
# DATA DICTIONARY TEMPLATE
# ==================================================
 
## Table: Customer
 
**Purpose**: Stores core customer information for CRM and billing.
 
**Business Owner**: Sales Operations Team
 
**Privacy Classification**: PII - Restricted
 
**Retention Policy**: 7 years after account closure
 
### Columns
 
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| CustomerID | INT | No | AUTO | Unique customer identifier (surrogate key) |
| CustomerCode | VARCHAR(20) | No | - | Business-assigned customer code (natural key) |
| CustomerName | VARCHAR(100) | No | - | Legal name (individual or company) |
| Email | VARCHAR(254) | No | - | Primary contact email, used for login |
| Phone | VARCHAR(20) | Yes | NULL | Primary phone, E.164 format preferred |
| CustomerType | ENUM | No | 'individual' | Values: 'individual', 'business', 'enterprise' |
| CreditLimit | DECIMAL(12,2) | No | 10000.00 | Maximum outstanding order value |
| IsActive | BOOLEAN | No | TRUE | FALSE = soft-deleted account |
| CreatedAt | TIMESTAMP | No | NOW() | Record creation timestamp (UTC) |
| UpdatedAt | TIMESTAMP | No | NOW() | Last modification timestamp (UTC) |
 
### Keys and Constraints
 
| Constraint | Type | Columns | Description |
|------------|------|---------|-------------|
| pk_customer | PRIMARY KEY | CustomerID | Surrogate primary key |
| uk_customer_code | UNIQUE | CustomerCode | Business identifier uniqueness |
| uk_customer_email | UNIQUE | Email | Login credential uniqueness |
| chk_customer_credit | CHECK | CreditLimit | CreditLimit >= 0 |
 
### Relationships
 
| Relationship | References | Cardinality | On Delete | Description |
|--------------|------------|-------------|-----------|-------------|
| fk_customer_address | CustomerAddress | 1:N | CASCADE | Customer addresses |
| fk_orders_customer | Orders | 1:N | RESTRICT | Customer orders |
 
### Notes
 
- CustomerCode is assigned by Sales and follows pattern: CC-NNNNNN
- Email is case-insensitive; store lowercase, compare case-insensitively
- Soft delete via IsActive flag; hard delete only via DBA with approval
 
 
# ==================================================
# DESIGN DECISION LOG TEMPLATE
# ==================================================
 
## Decision: Surrogate vs Natural Key for Customer
 
**Date**: 2024-03-15
 
**Decision**: Use surrogate key (CustomerID) with natural key (CustomerCode) as unique constraint.
 
**Context**: 
Customer identification could use business-assigned CustomerCode (natural) or 
system-generated CustomerID (surrogate).
 
**Options Considered**:
 
1. **Natural key (CustomerCode)**
   - Pros: Meaningful, no lookup needed for understanding
   - Cons: Assigned by humans (errors), may need to change (mergers), 
     variable length
 
2. **Surrogate key (CustomerID)** ← Selected
   - Pros: Stable, compact, fast joins, never changes
   - Cons: Meaningless, requires lookup
 
3. **UUID**
   - Pros: Globally unique, no coordination
   - Cons: Large, not human-readable, index fragmentation
 
**Rationale**:
CustomerCode has changed twice in past 5 years due to rebranding. Foreign 
keys in Orders, Contacts, ActivityLog would all require updates. Surrogate 
key provides stability while CustomerCode remains as UNIQUE business identifier.
 
**Implications**:
- All foreign key references use CustomerID
- CustomerCode must be displayed in UIs for user recognition
- CustomerCode can be changed without cascading updates
 
 
## Decision: Trigger vs Application Logic for Credit Limit Check
 
**Date**: 2024-03-18
 
**Decision**: Use database trigger for credit limit enforcement.
 
**Context**: 
Business rule: Customer's total outstanding orders cannot exceed CreditLimit.
 
**Options Considered**:
 
1. **Application logic only**
   - Pros: Full language power, easy testing
   - Cons: Bypassable, must implement in every access point
 
2. **Database trigger** ← Selected
   - Pros: Always enforced, single implementation point
   - Cons: Hidden logic, debugging complexity
 
3. **Stored procedure API**
   - Pros: Controlled access, documented
   - Cons: Requires discipline, can be bypassed
 
**Rationale**:
Credit limit is a financial control—bypassing it is unacceptable. Multiple
applications (web, mobile, batch import) access Orders table. Trigger 
ensures enforcement regardless of access path.
 
**Implications**:
- Application should still check (for UX) but database is authoritative
- Trigger must be documented in data dictionary
- Batch imports must handle constraint violations gracefully

Living Documentation

Store documentation alongside code in version control. Use SQL COMMENT statements for in-database documentation. Treat documentation updates as part of schema changes—if the schema changes, documentation must change too.

Stakeholder Alignment

Database design affects many stakeholders. Alignment ensures the schema serves everyone's needs and prevents costly late-stage changes.

Key Stakeholders:

Business Stakeholders

Product owners: Does the schema support product requirements?
Domain experts: Are business rules correctly modeled?
Compliance: Are privacy, retention, and audit requirements met?

Technical Stakeholders

Application developers: Is the schema usable from application code?
Data engineers: Does it integrate with data pipelines?
DBAs: Is it maintainable, backup-able, recoverable?
Security team: Are access control needs accommodated?

Operational Stakeholders

Support teams: Can they troubleshoot effectively?
Report generators: Are analytical queries feasible?
Auditors: Is there sufficient history and tracking?

Alignment Techniques:

Schema walkthroughs: Present the design, explain decisions
Query demonstrations: Show how common needs are met
Scenario walkthroughs: Step through user journeys with data
Terminology glossary: Align database terms with business language

Stakeholder Review Focus Areas
Stakeholder	Primary Concerns	Review Questions
Product Owner	Feature support, flexibility	Can we add X feature? How hard to extend for Y?
Domain Expert	Accuracy, completeness	Does this capture all variations of Z? Is terminology correct?
Developer	Usability, query efficiency	How do I join these? What indexes exist?
DBA	Operations, maintenance	Backup size? Migration path? Monitoring approach?
Security	Access control, audit	Row-level security possible? Is PII identified?
Compliance	Regulatory requirements	Retention enforced? Audit trail complete? GDPR deletion possible?
Analytics	Reporting capability	Can we aggregate by region? Time-series possible?

Get Sign-off in Writing

After alignment sessions, document agreements and get explicit sign-off. 'We discussed and agreed' is not the same as 'Alice approved via email on March 15.' Written sign-off protects everyone when requirements allegedly 'change.'

Refinement Completion Criteria

How do you know when logical design refinement is complete and it's time to proceed to physical design? Use explicit completion criteria.

Functional Completeness:

☑ All entities from conceptual model have corresponding tables ☑ All relationships are properly represented (FK, junction tables) ☑ All stated business rules have enforcement mechanisms ☑ All required queries are expressible (walk through each) ☑ Sample data can be inserted without constraint issues ☑ Edge cases have been analyzed and addressed

Quality Criteria:

☑ Schema is in target normal form (typically 3NF, BCNF where needed) ☑ Naming conventions are consistent throughout ☑ No unnecessary redundancy exists ☑ Constraints are comprehensive but not over-constraining ☑ Schema documentation is complete

Stakeholder Criteria:

☑ Technical review passed (peer sign-off) ☑ Domain expert validation received ☑ Application developer alignment confirmed ☑ DBA review for operability completed ☑ Security/compliance requirements verified

Final Pre-Physical-Design Checklist

•Schema diagram is current, version-controlled, and accessible
•Data dictionary is complete with all tables, columns, constraints documented
•Design decision log captures all significant choices and rationale
•Query pattern document lists known access patterns and their feasibility
•Constraint coverage matrix maps all business rules to enforcement mechanisms
•Sample data script demonstrates valid insertions and expected rejections
•Stakeholder approvals are documented with names, dates, and scope
•Open issues log is empty or contains only deferred-to-physical-design items

Don't Rush to Physical Design

The temptation to 'get to the database' is strong. Resist it. Errors caught in logical design cost 10x less to fix than errors found in production. A week of refinement can save months of migration pain later.

Summary and Key Takeaways

Refinement transforms technically correct schemas into production-ready designs. It's the quality gate before physical implementation.

Refinement Essentials

•Refinement is a deliberate phase, not an afterthought—budget time for multiple review cycles.
•Systematic review techniques (entity completeness, constraint coverage, sample data) catch errors that casual inspection misses.
•Consistent naming conventions improve maintainability and reduce onboarding friction.
•Query pattern analysis ensures the schema supports required operations before implementation.
•Comprehensive documentation (data dictionary, decision log, diagrams) enables long-term maintenance.
•Stakeholder alignment prevents costly late-stage changes and ensures broad organizational buy-in.
•Explicit completion criteria define when logical design is ready for physical design—don't proceed without meeting them.
•Invest in refinement now to avoid expensive production fixes later.

Module Complete:

With refinement, we've completed the logical design module. You now understand the complete journey from conceptual models to production-ready logical schemas:

Conceptual-to-Logical Mapping: Transforming ER models to relational structures
Relational Schema: Formal notation, documentation, and versioning
Normalization: Eliminating redundancy through systematic decomposition
Constraint Specification: Encoding business rules in the database
Refinement: Iterative improvement until ready for implementation

The next phase—Physical Design—translates this logical schema into actual database implementation: storage structures, indexing strategies, partitioning, and performance optimization.

Module Complete: Logical Design

You've mastered logical database design: the bridge between conceptual modeling and physical implementation. These skills enable you to create schemas that are not just technically correct but practically excellent—supporting application needs, enforcing business rules, and enabling long-term maintenance.

5 / 5

Loading learning content...

Database Management SystemsLogical Design

Logical Design: From Concepts to Schemas

LevelIntermediate

Duration90 mins

TopicLogical Design

5 / 5

Refinement

The Art of Schema Refinement

A first-pass logical schema, even when technically correct, is rarely optimal. It may:

Miss edge cases in business rules
Fail to anticipate common query patterns
Overfit to current requirements while ignoring extensibility
Contain naming inconsistencies that cause confusion
Lack documentation needed for long-term maintenance

Refinement is not an afterthought—it's a deliberate phase.

Learning Objectives

The Refinement Lifecycle

Effective refinement follows a structured lifecycle with distinct phases. Each phase has specific objectives, participants, and outputs.

Phase 1: Self-Review

The designer's first pass through the completed schema:

Check for mapping completeness (all entities, relationships, constraints)
Verify normalization level is appropriate
Ensure constraint coverage matches business rules
Review naming consistency
Document open questions

Phase 2: Technical Review

Peer review by other database professionals:

Fresh eyes catch blind spots
Knowledge sharing and team alignment
Standard practice enforcement
Performance concern identification

Phase 3: Domain Expert Review

Review with business stakeholders:

Validate semantic accuracy (does the schema model reality?)
Uncover missing requirements
Clarify ambiguous rules
Align terminology with business vocabulary

Phase 4: Integration Review

Review with application developers and architects:

Query pattern analysis
API design alignment
Integration point identification
Performance expectation calibration

Phase 5: Documentation and Finalization

Final documentation and handoff:

Complete data dictionary
Document design decisions and rationale
Create visual diagrams
Prepare for physical design transition

Converting Mermaid diagram...

Iterative, Not Waterfall

Schema Review Techniques

Systematic review techniques catch errors that informal scanning misses. Apply these techniques methodically during self-review and recommend them for peer reviewers.

Technique 1: Entity Completeness Check

For each entity in the conceptual model:

Is there a corresponding table in the logical schema?
Are all attributes present (including components of composites)?
Is the correct primary key identified?
Are alternate keys captured as UNIQUE constraints?

Technique 2: Relationship Tracing

For each relationship in the ER model:

How is it represented (FK, junction table, merged)?
Is the cardinality correctly enforced (UNIQUE for 1:1, etc.)?
Are participation constraints reflected (NOT NULL for total)?
Are relationship attributes captured?

Technique 3: Constraint Coverage Matrix

Create a matrix mapping business rules to constraints:

List all stated business rules from requirements
For each rule, identify the enforcing constraint(s)
Flag rules with no database-level enforcement
Decide: database constraint, trigger, or application logic?

Constraint Coverage Matrix Example
Business Rule	DB Constraint	Type	Gap?
Every employee has unique email	UNIQUE (Email)	Declarative	No
Salary must be positive	CHECK (Salary > 0)	Declarative	No
Manager must be in same dept	—	Trigger needed	Yes - add trigger
Orders over $10K need approval	—	Application logic	Document decision
End date >= start date	CHECK (EndDate >= StartDate)	Declarative	No
One primary contact per customer	Partial unique index	Declarative	No

Technique 4: Sample Data Walkthrough

Create sample data and walk through scenarios:

Insert typical valid records
Attempt invalid inserts (constraint violations expected)
Perform expected queries (are they efficient/possible?)
Apply updates representing real operations
Delete records (cascade/restrict behavior correct?)

This is essentially manual testing of the schema before implementation.

Technique 5: Edge Case Analysis

Explicitly consider boundary conditions:

What if a table is empty?
What if there are millions of rows?
What if nullable columns are all NULL?
What if a user has no associated records in related tables?
What happens at rollover points (year change, sequence max)?

Checklists Are Your Friend

Naming Conventions and Standards

Consistent naming is not cosmetic—it directly impacts maintainability, reduces errors, and eases onboarding. Establish naming conventions before any development begins.

Table Naming Conventions:

Common approaches:

PascalCase singular: Customer, OrderLine, ProductCategory
snake_case singular: customer, order_line, product_category
Plural or singular: Choose one and be consistent

Recommendation: Match the casing of your target database's default behavior. PostgreSQL lowercases unquoted identifiers; Oracle uppercases; SQL Server preserves case.

Column Naming Conventions:

Use consistent casing (match table convention)
Primary key: TableNameID or table_name_id or just ID
Foreign keys: Match the referenced column name, or use ReferencedTableID
Boolean columns: Use prefixes like Is, Has, Can (IsActive, HasExpired)
Timestamps: Use suffixes like _at or At (CreatedAt, UpdatedAt)
Avoid reserved words (don't name columns 'Order', 'User', 'Select')

naming_convention_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
-- ==================================================
-- NAMING CONVENTION EXAMPLES
-- ==================================================
 
-- CONVENTION 1: PascalCase with explicit prefixes
-- Pros: Readable, explicit, no ambiguity
-- Cons: Verbose, more typing
 
CREATE TABLE Customer (
    CustomerID          SERIAL          PRIMARY KEY,
    CustomerName        VARCHAR(100)    NOT NULL,
    CustomerEmail       VARCHAR(254)    NOT NULL UNIQUE,
    CustomerPhone       VARCHAR(20),
    IsActive            BOOLEAN         NOT NULL DEFAULT TRUE,
    CreatedAt           TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UpdatedAt           TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE CustomerAddress (
    CustomerAddressID   SERIAL          PRIMARY KEY,
    CustomerID          INT             NOT NULL REFERENCES Customer(CustomerID),
    AddressType         VARCHAR(20)     NOT NULL,
    StreetAddress       VARCHAR(200)    NOT NULL,
    City                VARCHAR(100)    NOT NULL,
    StateProvince       VARCHAR(100),
    PostalCode          VARCHAR(20),
    CountryCode         CHAR(2)         NOT NULL,
    IsPrimary           BOOLEAN         NOT NULL DEFAULT FALSE
);
 
 
-- CONVENTION 2: snake_case with shorter names
-- Pros: Compact, Unix/Linux convention, Python-friendly
-- Cons: Less explicit, some find harder to read
 
CREATE TABLE customers (
    id                  SERIAL          PRIMARY KEY,
    name                VARCHAR(100)    NOT NULL,
    email               VARCHAR(254)    NOT NULL UNIQUE,
    phone               VARCHAR(20),
    is_active           BOOLEAN         NOT NULL DEFAULT TRUE,
    created_at          TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at          TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE customer_addresses (
    id                  SERIAL          PRIMARY KEY,
    customer_id         INT             NOT NULL REFERENCES customers(id),
    address_type        VARCHAR(20)     NOT NULL,
    street              VARCHAR(200)    NOT NULL,
    city                VARCHAR(100)    NOT NULL,
    state               VARCHAR(100),
    postal_code         VARCHAR(20),
    country_code        CHAR(2)         NOT NULL,
    is_primary          BOOLEAN         NOT NULL DEFAULT FALSE
);
 
 
-- ==================================================
-- CONSTRAINT NAMING CONVENTIONS
-- ==================================================
 
-- Pattern: {type}_{table}_{columns/description}
-- pk = primary key, fk = foreign key, uk = unique key
-- chk = check, idx = index
 
CREATE TABLE orders (
    id                  SERIAL,
    customer_id         INT             NOT NULL,
    order_date          DATE            NOT NULL DEFAULT CURRENT_DATE,
    status              VARCHAR(20)     NOT NULL DEFAULT 'pending',
    total_amount        DECIMAL(12,2)   NOT NULL,
    shipped_at          TIMESTAMP,
    
    -- Named constraints
    CONSTRAINT pk_orders 
        PRIMARY KEY (id),
    
    CONSTRAINT fk_orders_customer 
        FOREIGN KEY (customer_id) REFERENCES customers(id),
    
    CONSTRAINT chk_orders_status 
        CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    
    CONSTRAINT chk_orders_total_positive 
        CHECK (total_amount >= 0),
    
    CONSTRAINT chk_orders_ship_date 
        CHECK (shipped_at IS NULL OR shipped_at >= order_date)
);
 
-- Named indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_status ON orders(status);
 
 
-- ==================================================
-- JUNCTION TABLE NAMING
-- ==================================================
 
-- Option 1: Combined entity names (alphabetical)
CREATE TABLE customer_product (
    customer_id     INT NOT NULL,
    product_id      INT NOT NULL,
    PRIMARY KEY (customer_id, product_id)
);
 
-- Option 2: Describe the relationship
CREATE TABLE product_wishlist (
    customer_id     INT NOT NULL,
    product_id      INT NOT NULL,
    added_at        TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (customer_id, product_id)
);
 
-- Option 3: Verb-based naming
CREATE TABLE customer_follows_vendor (
    customer_id     INT NOT NULL,
    vendor_id       INT NOT NULL,
    followed_at     TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (customer_id, vendor_id)
);

Naming Convention Best Practices

•Document the conventions: Create a style guide before any development
•Be consistent: The worst convention is an inconsistent one
•Avoid abbreviations: CustomerID not CustID, unless universally understood
•Use meaningful names: ShipmentDate not SD, AccountBalance not Bal
•Name constraints: Named constraints appear in error messages
•Avoid reserved words: Even if your DB allows it, ORMs and tools may not
•Consider the ecosystem: Match conventions of your primary language/framework

Query Pattern Analysis

Logical design must anticipate how the schema will be used. Query pattern analysis ensures the schema supports required operations efficiently.

Identifying Query Patterns:

Gather query requirements from:

Application specifications: What screens/features need what data?
User stories: What questions do users need answered?
Reports: What aggregations and summaries are required?
Integration requirements: What data do external systems request?
Search requirements: What filters and search types are needed?

Query Pattern Categories:

Point lookups: Get customer by ID (primary key access)
Range queries: Orders from last 30 days (date range filters)
Aggregations: Total sales by region (GROUP BY operations)
Joins: Customer with all orders and line items (multi-table)
Text search: Find products containing 'wireless' (LIKE or full-text)
Sorting: Orders by date descending (ORDER BY)

query_pattern_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
-- ==================================================
-- QUERY PATTERN ANALYSIS EXAMPLES
-- ==================================================
 
-- Scenario: E-commerce order management
-- Analyze the logical schema for common query patterns
 
-- PATTERN 1: Customer Order History (High frequency)
-- "Show me all orders for customer X with details"
 
-- Required join path: Customer -> Orders -> OrderLines -> Products
-- Analysis: 
--   - CustomerID in Orders allows direct lookup
--   - OrderID in OrderLines allows efficient join
--   - ProductID in OrderLines references Products
 
SELECT 
    c.CustomerName,
    o.OrderID,
    o.OrderDate,
    o.Status,
    ol.Quantity,
    p.ProductName,
    ol.UnitPrice,
    ol.Quantity * ol.UnitPrice AS LineTotal
FROM Customer c
JOIN Orders o ON c.CustomerID = o.CustomerID
JOIN OrderLine ol ON o.OrderID = ol.OrderID
JOIN Product p ON ol.ProductID = p.ProductID
WHERE c.CustomerID = :customerId
ORDER BY o.OrderDate DESC;
 
-- Schema supports this: ✓
-- Potential optimization: Index on Orders(CustomerID, OrderDate DESC)
 
 
-- PATTERN 2: Product Sales Summary (Reporting)
-- "Total sales by product category this month"
 
-- Required: Products with categories, line items with amounts
SELECT 
    pc.CategoryName,
    SUM(ol.Quantity * ol.UnitPrice) AS TotalSales,
    COUNT(DISTINCT o.OrderID) AS OrderCount,
    SUM(ol.Quantity) AS UnitsSold
FROM ProductCategory pc
JOIN Product p ON pc.CategoryID = p.CategoryID
JOIN OrderLine ol ON p.ProductID = ol.ProductID
JOIN Orders o ON ol.OrderID = o.OrderID
WHERE o.OrderDate >= DATE_TRUNC('month', CURRENT_DATE)
  AND o.Status IN ('confirmed', 'shipped', 'delivered')
GROUP BY pc.CategoryID, pc.CategoryName
ORDER BY TotalSales DESC;
 
-- Schema supports this: ✓
-- Requires: Product.CategoryID (FK to ProductCategory)
-- Potential optimization: Index on Orders(OrderDate, Status)
 
 
-- PATTERN 3: Inventory Alert (Low stock items)
-- "Show products with stock below reorder threshold"
 
-- Analysis: Requires inventory tracking fields
SELECT 
    p.ProductID,
    p.ProductName,
    p.StockLevel,
    p.ReorderThreshold,
    p.ReorderQuantity
FROM Product p
WHERE p.StockLevel < p.ReorderThreshold
  AND p.IsActive = TRUE
ORDER BY p.StockLevel ASC;
 
-- Schema check: Does Product table have inventory fields?
-- If not: Need to add StockLevel, ReorderThreshold
-- Or: Separate Inventory table for larger systems
 
 
-- PATTERN 4: Customer Search (Text search)
-- "Find customers by partial name or email"
 
SELECT CustomerID, CustomerName, Email, Phone
FROM Customer
WHERE CustomerName ILIKE '%' || :searchTerm || '%'
   OR Email ILIKE '%' || :searchTerm || '%'
ORDER BY CustomerName
LIMIT 50;
 
-- Schema supports this: ✓
-- Considerations:
--   - ILIKE with leading wildcard prevents index use
--   - For high-volume search: Consider full-text search or trigram indexes
 
-- PostgreSQL trigram index for efficient partial matching:
-- CREATE EXTENSION pg_trgm;
-- CREATE INDEX idx_customer_name_trgm ON Customer USING gin (CustomerName gin_trgm_ops);
 
 
-- PATTERN 5: Statistics Query (Complex aggregation)
-- "Average order value by customer segment over time"
 
-- Requires: Customer.Segment or derived from behavior
 
SELECT 
    DATE_TRUNC('month', o.OrderDate) AS Month,
    c.CustomerSegment,
    COUNT(o.OrderID) AS OrderCount,
    AVG(o.TotalAmount) AS AvgOrderValue,
    SUM(o.TotalAmount) AS TotalRevenue
FROM Orders o
JOIN Customer c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate >= CURRENT_DATE - INTERVAL '12 months'
  AND o.Status != 'cancelled'
GROUP BY DATE_TRUNC('month', o.OrderDate), c.CustomerSegment
ORDER BY Month DESC, c.CustomerSegment;
 
-- Schema check: Does Customer have Segment field?
-- If segmentation is complex: Consider derived/computed field or separate table
 
 
-- ==================================================
-- QUERY PATTERN → SCHEMA REFINEMENT ACTIONS
-- ==================================================
 
/*
After query pattern analysis, common refinements:
 
1. ADD COLUMNS
   - Customer.Segment for segmented reporting
   - Product.StockLevel, ReorderThreshold for inventory
   - Orders.TotalAmount (denormalized for fast aggregation)
 
2. ADD INDEXES (Physical design, but identify now)
   - Orders(CustomerID, OrderDate DESC)
   - Orders(Status, OrderDate)
   - Product(CategoryID)
   - Customer(Email) - for login lookup
 
3. ADD TABLES
   - AuditLog for compliance requirements
   - CachedReport for frequently-accessed summaries
   - SearchIndex for full-text search
 
4. MODIFY STRUCTURE
   - Denormalize frequently-joined data
   - Add redundant columns for query performance
   - Create materialized views for complex reports
*/

Don't Over-Optimize Prematurely

Documentation Requirements

Documentation is not an afterthought—it's a deliverable. Undocumented schemas become legacy liabilities, understandable only to their original creator (who eventually leaves).

Essential Documentation Artifacts:

1. Data Dictionary

A complete catalog of all schema elements:

Tables: Name, purpose, business owner
Columns: Name, type, constraints, description, valid values
Keys: Primary, foreign, unique with descriptions
Relationships: Cardinality, meaning, referential actions

2. Entity-Relationship Diagram

Visual representation as authoritative reference:

Notation legend included
Major entities prominently displayed
Cardinality clearly marked
Version number and date

3. Design Decision Log

Record why choices were made:

Why surrogate keys over natural keys?
Why this normalization level?
Why triggers over application logic for certain rules?
What alternatives were considered?

documentation_templates.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# ==================================================
# DATA DICTIONARY TEMPLATE
# ==================================================
 
## Table: Customer
 
**Purpose**: Stores core customer information for CRM and billing.
 
**Business Owner**: Sales Operations Team
 
**Privacy Classification**: PII - Restricted
 
**Retention Policy**: 7 years after account closure
 
### Columns
 
| Column | Type | Nullable | Default | Description |
|--------|------|----------|---------|-------------|
| CustomerID | INT | No | AUTO | Unique customer identifier (surrogate key) |
| CustomerCode | VARCHAR(20) | No | - | Business-assigned customer code (natural key) |
| CustomerName | VARCHAR(100) | No | - | Legal name (individual or company) |
| Email | VARCHAR(254) | No | - | Primary contact email, used for login |
| Phone | VARCHAR(20) | Yes | NULL | Primary phone, E.164 format preferred |
| CustomerType | ENUM | No | 'individual' | Values: 'individual', 'business', 'enterprise' |
| CreditLimit | DECIMAL(12,2) | No | 10000.00 | Maximum outstanding order value |
| IsActive | BOOLEAN | No | TRUE | FALSE = soft-deleted account |
| CreatedAt | TIMESTAMP | No | NOW() | Record creation timestamp (UTC) |
| UpdatedAt | TIMESTAMP | No | NOW() | Last modification timestamp (UTC) |
 
### Keys and Constraints
 
| Constraint | Type | Columns | Description |
|------------|------|---------|-------------|
| pk_customer | PRIMARY KEY | CustomerID | Surrogate primary key |
| uk_customer_code | UNIQUE | CustomerCode | Business identifier uniqueness |
| uk_customer_email | UNIQUE | Email | Login credential uniqueness |
| chk_customer_credit | CHECK | CreditLimit | CreditLimit >= 0 |
 
### Relationships
 
| Relationship | References | Cardinality | On Delete | Description |
|--------------|------------|-------------|-----------|-------------|
| fk_customer_address | CustomerAddress | 1:N | CASCADE | Customer addresses |
| fk_orders_customer | Orders | 1:N | RESTRICT | Customer orders |
 
### Notes
 
- CustomerCode is assigned by Sales and follows pattern: CC-NNNNNN
- Email is case-insensitive; store lowercase, compare case-insensitively
- Soft delete via IsActive flag; hard delete only via DBA with approval
 
 
# ==================================================
# DESIGN DECISION LOG TEMPLATE
# ==================================================
 
## Decision: Surrogate vs Natural Key for Customer
 
**Date**: 2024-03-15
 
**Decision**: Use surrogate key (CustomerID) with natural key (CustomerCode) as unique constraint.
 
**Context**: 
Customer identification could use business-assigned CustomerCode (natural) or 
system-generated CustomerID (surrogate).
 
**Options Considered**:
 
1. **Natural key (CustomerCode)**
   - Pros: Meaningful, no lookup needed for understanding
   - Cons: Assigned by humans (errors), may need to change (mergers), 
     variable length
 
2. **Surrogate key (CustomerID)** ← Selected
   - Pros: Stable, compact, fast joins, never changes
   - Cons: Meaningless, requires lookup
 
3. **UUID**
   - Pros: Globally unique, no coordination
   - Cons: Large, not human-readable, index fragmentation
 
**Rationale**:
CustomerCode has changed twice in past 5 years due to rebranding. Foreign 
keys in Orders, Contacts, ActivityLog would all require updates. Surrogate 
key provides stability while CustomerCode remains as UNIQUE business identifier.
 
**Implications**:
- All foreign key references use CustomerID
- CustomerCode must be displayed in UIs for user recognition
- CustomerCode can be changed without cascading updates
 
 
## Decision: Trigger vs Application Logic for Credit Limit Check
 
**Date**: 2024-03-18
 
**Decision**: Use database trigger for credit limit enforcement.
 
**Context**: 
Business rule: Customer's total outstanding orders cannot exceed CreditLimit.
 
**Options Considered**:
 
1. **Application logic only**
   - Pros: Full language power, easy testing
   - Cons: Bypassable, must implement in every access point
 
2. **Database trigger** ← Selected
   - Pros: Always enforced, single implementation point
   - Cons: Hidden logic, debugging complexity
 
3. **Stored procedure API**
   - Pros: Controlled access, documented
   - Cons: Requires discipline, can be bypassed
 
**Rationale**:
Credit limit is a financial control—bypassing it is unacceptable. Multiple
applications (web, mobile, batch import) access Orders table. Trigger 
ensures enforcement regardless of access path.
 
**Implications**:
- Application should still check (for UX) but database is authoritative
- Trigger must be documented in data dictionary
- Batch imports must handle constraint violations gracefully

Living Documentation

Stakeholder Alignment

Database design affects many stakeholders. Alignment ensures the schema serves everyone's needs and prevents costly late-stage changes.

Key Stakeholders:

Business Stakeholders

Product owners: Does the schema support product requirements?
Domain experts: Are business rules correctly modeled?
Compliance: Are privacy, retention, and audit requirements met?

Technical Stakeholders

Application developers: Is the schema usable from application code?
Data engineers: Does it integrate with data pipelines?
DBAs: Is it maintainable, backup-able, recoverable?
Security team: Are access control needs accommodated?

Operational Stakeholders

Support teams: Can they troubleshoot effectively?
Report generators: Are analytical queries feasible?
Auditors: Is there sufficient history and tracking?

Alignment Techniques:

Schema walkthroughs: Present the design, explain decisions
Query demonstrations: Show how common needs are met
Scenario walkthroughs: Step through user journeys with data
Terminology glossary: Align database terms with business language

Stakeholder Review Focus Areas
Stakeholder	Primary Concerns	Review Questions
Product Owner	Feature support, flexibility	Can we add X feature? How hard to extend for Y?
Domain Expert	Accuracy, completeness	Does this capture all variations of Z? Is terminology correct?
Developer	Usability, query efficiency	How do I join these? What indexes exist?
DBA	Operations, maintenance	Backup size? Migration path? Monitoring approach?
Security	Access control, audit	Row-level security possible? Is PII identified?
Compliance	Regulatory requirements	Retention enforced? Audit trail complete? GDPR deletion possible?
Analytics	Reporting capability	Can we aggregate by region? Time-series possible?

Get Sign-off in Writing

Refinement Completion Criteria

How do you know when logical design refinement is complete and it's time to proceed to physical design? Use explicit completion criteria.

Functional Completeness:

Quality Criteria:

Stakeholder Criteria:

Final Pre-Physical-Design Checklist

•Schema diagram is current, version-controlled, and accessible
•Data dictionary is complete with all tables, columns, constraints documented
•Design decision log captures all significant choices and rationale
•Query pattern document lists known access patterns and their feasibility
•Constraint coverage matrix maps all business rules to enforcement mechanisms
•Sample data script demonstrates valid insertions and expected rejections
•Stakeholder approvals are documented with names, dates, and scope
•Open issues log is empty or contains only deferred-to-physical-design items

Don't Rush to Physical Design

Summary and Key Takeaways

Refinement transforms technically correct schemas into production-ready designs. It's the quality gate before physical implementation.

Refinement Essentials

•Refinement is a deliberate phase, not an afterthought—budget time for multiple review cycles.
•Systematic review techniques (entity completeness, constraint coverage, sample data) catch errors that casual inspection misses.
•Consistent naming conventions improve maintainability and reduce onboarding friction.
•Query pattern analysis ensures the schema supports required operations before implementation.
•Comprehensive documentation (data dictionary, decision log, diagrams) enables long-term maintenance.
•Stakeholder alignment prevents costly late-stage changes and ensures broad organizational buy-in.
•Explicit completion criteria define when logical design is ready for physical design—don't proceed without meeting them.
•Invest in refinement now to avoid expensive production fixes later.

Module Complete:

With refinement, we've completed the logical design module. You now understand the complete journey from conceptual models to production-ready logical schemas:

Conceptual-to-Logical Mapping: Transforming ER models to relational structures
Relational Schema: Formal notation, documentation, and versioning
Normalization: Eliminating redundancy through systematic decomposition
Constraint Specification: Encoding business rules in the database
Refinement: Iterative improvement until ready for implementation

The next phase—Physical Design—translates this logical schema into actual database implementation: storage structures, indexing strategies, partitioning, and performance optimization.

Module Complete: Logical Design

5 / 5