Design Phases - Learning Module

Loading content...

0/241

Logical Design

From Abstract Models to Implementable Schemas

The conceptual model—elegant, abstract, technology-independent—now requires transformation into something a database management system can actually implement. This is the domain of logical design: the systematic conversion of conceptual schemas into formal structures that define exactly what tables exist, what columns they contain, what constraints govern them, and how they relate through keys.

Logical design is where the theoretical becomes practical, where ER diagrams become CREATE TABLE statements, where cardinality constraints become foreign key relationships. It is a phase that requires both systematic precision (following well-defined mapping algorithms) and engineering judgment (making trade-offs for performance, maintainability, and clarity).

The logical schema you create here will persist for years or decades. Every application built on this database will be shaped by the decisions you make. A well-designed logical schema makes applications easier to build, queries faster to execute, and systems more maintainable. A poorly designed schema creates ongoing technical debt that compounds with every new feature.

What You Will Learn

By the end of this page, you will master the logical design process: mapping entities to tables, attributes to columns, relationships to keys, and constraints to database-level enforcement. You will understand normalization theory and when to apply it, learn systematic algorithms for ER-to-relational transformation, and develop the judgment to make sound trade-off decisions.

Logical Design Fundamentals

Logical design produces a logical schema—a formal specification of the database structure that is specific to a particular data model (usually relational) but still independent of any particular DBMS product or physical storage considerations.

Formal Definition:

A logical schema specifies the structure of data at the logical level of abstraction: tables (relations), columns (attributes), data types, primary keys, foreign keys, and integrity constraints. It describes what is stored without specifying how it is physically organized.

The Relational Model as Target:

The vast majority of logical design targets the relational model, characterized by:

Relations (Tables): Named collections of tuples with a fixed schema
Attributes (Columns): Named, typed properties of each tuple
Tuples (Rows): Individual data instances conforming to the schema
Keys: Attributes that uniquely identify tuples
Constraints: Rules that valid database states must satisfy

Logical Design Objectives:

Quality Criteria for Logical Schemas

•Semantic Preservation — The logical schema accurately represents all information from the conceptual model; no semantic content is lost or distorted
•Minimality — No redundant information is stored; each fact is recorded exactly once (subject to deliberate denormalization decisions)
•Constraint Enforcement — Business rules from requirements are enforced by the schema itself, not just by application code
•Naming Clarity — Table and column names are clear, consistent, and follow organizational conventions
•Query Support — The schema structure supports the queries and transactions identified in requirements efficiently
•Evolution Readiness — The schema can accommodate foreseeable changes without structural redesign

The ER-to-Relational Mapping Process:

Logical design follows a systematic transformation:

Conceptual Element	Logical Element	Mapping Rule
Entity Type	Table	One table per strong entity
Attribute	Column	One column per simple attribute
Composite Attribute	Multiple Columns	Flatten or create structured type
Multi-valued Attribute	Separate Table	New table with foreign key to parent
Relationship	Foreign Key or Table	Depends on cardinality
Weak Entity	Table + Partial Key	Includes owner's primary key
Generalization	Various approaches	Single table, multiple tables, or hybrid

These rules provide a systematic algorithm for transformation, though they often require judgment in application.

Tool Support

Modern database design tools (ERwin, Enterprise Architect, DbSchema, MySQL Workbench) automate ER-to-relational mapping. However, understanding the underlying algorithms is essential—automated tools make decisions that may not suit your specific requirements, and you must be able to evaluate and adjust their output.

Mapping Entities to Tables

The most straightforward mapping: each strong entity type becomes a table. The entity's simple attributes become columns, and the entity's key attribute(s) become the primary key.

Basic Entity Mapping Algorithm:

For each strong entity E:
  1. Create table T with name derived from E
  2. For each simple attribute A of E:
     - Add column A to T with appropriate data type
  3. For each composite attribute C of E:
     - Add columns for each component of C
     - OR create structured type (if DBMS supports)
  4. For key attribute(s) K of E:
     - Designate K column(s) as PRIMARY KEY

Example Transformation:

entity-mapping-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Conceptual: CUSTOMER entity
-- Attributes: customer_id (key), name (composite: first, last), 
--             email, registration_date
 
-- Logical Mapping:
CREATE TABLE customers (
    customer_id     INTEGER         PRIMARY KEY,
    first_name      VARCHAR(50)     NOT NULL,
    last_name       VARCHAR(50)     NOT NULL,
    email           VARCHAR(255)    NOT NULL UNIQUE,
    registration_date DATE          NOT NULL DEFAULT CURRENT_DATE
);
 
-- Note the decisions made:
-- 1. Table name: lowercase plural (convention)
-- 2. Composite name → two columns (first_name, last_name)
-- 3. Email marked UNIQUE (alternate key from conceptual model)
-- 4. registration_date has default (implementation enhancement)

Handling Composite Attributes:

Composite attributes present a design choice:

Option A: Flatten to multiple columns

-- Address as composite attribute
CREATE TABLE customers (
    ...
    address_street    VARCHAR(100),
    address_city      VARCHAR(50),
    address_state     CHAR(2),
    address_zip       VARCHAR(10),
    address_country   VARCHAR(50)
);

Advantages: Simple queries, wide tool support Disadvantages: Repetitive if same structure appears multiple times

Option B: Create structured type (if DBMS supports)

-- PostgreSQL composite type
CREATE TYPE address_type AS (
    street    VARCHAR(100),
    city      VARCHAR(50),
    state     CHAR(2),
    zip       VARCHAR(10),
    country   VARCHAR(50)
);

CREATE TABLE customers (
    ...
    address address_type
);

Advantages: Reusable, cleaner schema Disadvantages: Limited DBMS support, complex querying

Handling Multi-valued Attributes:

Multi-valued attributes cannot be stored directly in a relational column (violates 1NF). They require a separate table:

multivalued-attribute-mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Conceptual: CUSTOMER has multi-valued attribute phone_numbers
 
-- WRONG: Storing multiple phones in one column
-- CREATE TABLE customers (
--     phone_numbers VARCHAR(500) -- "555-1234, 555-5678"  WRONG!
-- );
 
-- CORRECT: Separate table for multi-valued attribute
CREATE TABLE customer_phones (
    customer_id     INTEGER         NOT NULL,
    phone_number    VARCHAR(20)     NOT NULL,
    phone_type      VARCHAR(10)     DEFAULT 'mobile',
    is_primary      BOOLEAN         DEFAULT FALSE,
    
    PRIMARY KEY (customer_id, phone_number),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
        ON DELETE CASCADE
);
 
-- Each row represents ONE phone for ONE customer
-- Multiple rows for customers with multiple phones

Derived Attributes

Derived attributes (age from birth_date, order_total from line items) are typically NOT stored as columns. They're computed at query time using expressions or views. Exceptions exist for performance-critical derived values that are expensive to compute—these may be materialized with triggers to maintain consistency.

Mapping Relationships to Keys and Tables

Relationship mapping is where logical design becomes nuanced. The mapping strategy depends on the relationship's cardinality, participation constraints, and whether the relationship has its own attributes.

Mapping Strategy by Cardinality:

One-to-Many (1:N) Mapping:

The most common relationship type. The foreign key is placed in the table on the "many" (N) side.

Algorithm:

Create tables for both entities
Add a foreign key column to the N-side table
The FK references the PK of the 1-side table
If relationship has attributes, add them to the N-side table

Example: Department (1) — employs → (N) Employee

CREATE TABLE departments (
    department_id   INTEGER PRIMARY KEY,
    name            VARCHAR(100) NOT NULL,
    budget          DECIMAL(12,2)
);

CREATE TABLE employees (
    employee_id     INTEGER PRIMARY KEY,
    name            VARCHAR(100) NOT NULL,
    hire_date       DATE NOT NULL,
    -- Foreign key implements 1:N relationship
    department_id   INTEGER NOT NULL,
    FOREIGN KEY (department_id) 
        REFERENCES departments(department_id)
);

Participation Constraints:

NOT NULL on FK = Total participation (employee MUST have department)
NULL allowed on FK = Partial participation (employee MAY have no department)

Relationship Attributes: If the 'employs' relationship had attributes (e.g., position_title, start_date in department), they go in the employees table alongside the FK.

Weak Entity Mapping:

Weak entities depend on an owner entity for identification. Their table includes the owner's primary key as part of their own composite key:

weak-entity-mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Conceptual: ORDER (strong) --contains--> ORDER_LINE (weak)
-- ORDER_LINE is identified by order_id + line_number
 
CREATE TABLE orders (
    order_id        INTEGER PRIMARY KEY,
    customer_id     INTEGER NOT NULL,
    order_date      DATE NOT NULL,
    status          VARCHAR(20) NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
 
CREATE TABLE order_lines (
    order_id        INTEGER NOT NULL,
    line_number     INTEGER NOT NULL,  -- Partial key (discriminator)
    product_id      INTEGER NOT NULL,
    quantity        INTEGER NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10,2) NOT NULL,
    
    -- Composite primary key includes owner's PK
    PRIMARY KEY (order_id, line_number),
    
    -- Identifying relationship
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
        ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Foreign Key Actions

Specify ON DELETE and ON UPDATE actions for foreign keys: CASCADE (propagate changes), SET NULL (nullify reference), SET DEFAULT, RESTRICT (prevent operation), NO ACTION (defer check). These enforce referential integrity automatically. Weak entity FKs typically use CASCADE; other FKs often use RESTRICT or SET NULL depending on business rules.

Normalization: Theory and Application

Normalization is the systematic process of organizing tables to minimize redundancy and dependency anomalies. It provides a theoretical foundation for evaluating schema quality and guides decomposition decisions.

Why Normalize?

Unnormalized schemas suffer from anomalies:

Insert Anomaly: Cannot insert data without unrelated data (can't record a department without an employee)
Update Anomaly: Updating one fact requires updating multiple rows (change department name in every employee row)
Delete Anomaly: Deleting data inadvertently removes unrelated data (deleting last employee loses department info)

Normal Forms Overview:

Normal Forms and Their Requirements
Normal Form	Requirement	Eliminates	Practical Importance
1NF	Atomic values, no repeating groups	Multi-valued attributes in cells	Essential — fundamental to relational model
2NF	1NF + No partial dependencies on composite key	Redundancy from partial key dependencies	Important for composite key tables
3NF	2NF + No transitive dependencies	Redundancy from non-key dependencies	Standard target for OLTP systems
BCNF	Every determinant is a candidate key	Remaining partial and transitive issues	Stronger version of 3NF
4NF	No multi-valued dependencies	Independent multi-valued facts	Specialized situations
5NF	No join dependencies	Lossy decomposition risks	Rarely needed in practice

First Normal Form (1NF):

A table is in 1NF if:

All attributes contain atomic (indivisible) values
No repeating groups or arrays
Each row is unique (has a primary key)

Violation Example:

STUDENT(id, name, courses)  -- courses = "CS101, CS102, CS201"

1NF Solution:

STUDENT(id, name)
STUDENT_COURSES(student_id, course)  -- One row per course

Second Normal Form (2NF):

A table is in 2NF if:

It is in 1NF
Every non-key attribute depends on the ENTIRE primary key (no partial dependencies)

Only relevant for tables with composite primary keys.

Violation Example:

ORDER_ITEMS(order_id, product_id, quantity, product_name, product_price)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        product_name and product_price depend only on product_id, not on (order_id, product_id)

2NF Solution:

ORDER_ITEMS(order_id, product_id, quantity)
PRODUCTS(product_id, product_name, product_price)

Third Normal Form (3NF):

A table is in 3NF if:

It is in 2NF
No non-key attribute depends on another non-key attribute (no transitive dependencies)

Memorize: "Every non-key attribute must depend on the key, the whole key, and nothing but the key."

Violation Example:

EMPLOYEES(emp_id, emp_name, dept_id, dept_name, dept_location)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          dept_name and dept_location depend on dept_id, not emp_id

3NF Solution:

EMPLOYEES(emp_id, emp_name, dept_id)
DEPARTMENTS(dept_id, dept_name, dept_location)

Boyce-Codd Normal Form (BCNF):

A table is in BCNF if:

For every functional dependency X → Y, X is a superkey

BCNF is stronger than 3NF. Tables in BCNF are automatically in 3NF, but not vice versa. The difference matters in rare cases with overlapping candidate keys.

Practical Normalization Target

For most OLTP (transaction processing) systems, normalizing to 3NF or BCNF is the standard. Higher normal forms (4NF, 5NF) address specialized scenarios rarely encountered in practice. OLAP (analytical) systems often deliberately denormalize for query performance. The key is understanding the trade-offs and making conscious decisions.

Constraint Specification

Constraints transform business rules into database-enforced guarantees. Properly specified constraints prevent invalid data states without relying on application code—providing defense in depth when applications have bugs or when data is modified through direct database access.

Constraint Categories:

Database Constraint Types
Constraint Type	Purpose	Scope	Example
NOT NULL	Attribute must have a value	Single column	`email VARCHAR(255) NOT NULL`
UNIQUE	No duplicate values allowed	Column(s)	`UNIQUE (email)`
PRIMARY KEY	Unique row identifier, implies NOT NULL	Column(s)	`PRIMARY KEY (order_id)`
FOREIGN KEY	Referential integrity between tables	Column(s)	`FOREIGN KEY (dept_id) REFERENCES departments`
CHECK	Custom boolean condition must be true	Row or column	`CHECK (quantity > 0)`
DEFAULT	Value when none specified	Single column	`status VARCHAR(20) DEFAULT 'pending'`

CHECK Constraints for Business Rules:

CHECK constraints encode business rules directly in the schema:

check-constraints-examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Range constraints
CREATE TABLE products (
    product_id      INTEGER PRIMARY KEY,
    name            VARCHAR(100) NOT NULL,
    price           DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    weight_kg       DECIMAL(8,3) CHECK (weight_kg > 0),
    discount_pct    DECIMAL(5,2) CHECK (discount_pct BETWEEN 0 AND 100)
);
 
-- Enumerated values
CREATE TABLE orders (
    order_id        INTEGER PRIMARY KEY,
    status          VARCHAR(20) NOT NULL 
                    CHECK (status IN ('pending', 'processing', 
                           'shipped', 'delivered', 'cancelled'))
);
 
-- Cross-column constraints
CREATE TABLE date_ranges (
    range_id        INTEGER PRIMARY KEY,
    start_date      DATE NOT NULL,
    end_date        DATE,
    CHECK (end_date IS NULL OR end_date >= start_date)
);
 
-- Pattern matching (where supported)
CREATE TABLE contacts (
    contact_id      INTEGER PRIMARY KEY,
    email           VARCHAR(255) 
                    CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'),
    phone           VARCHAR(20)
                    CHECK (phone ~ '^\+?[0-9\-\s]+$')
);

Advanced Constraint Patterns:

Conditional NOT NULL (nullable based on other column):

-- If status is 'shipped', ship_date must exist
CHECK (status != 'shipped' OR ship_date IS NOT NULL)

Mutually Exclusive Relationships:

-- A payment references either card OR bank account, not both
CHECK ((card_id IS NULL) != (bank_account_id IS NULL))

Exactly One Required:

-- Must have email OR phone OR both
CHECK (email IS NOT NULL OR phone IS NOT NULL)

Cross-Table Constraints (Triggers): Some constraints cannot be expressed with CHECK (they span multiple tables). These require triggers or application logic:

"An employee's salary cannot exceed their manager's salary"
"A department must have at least one employee"
"Order total must equal sum of line items"

Defense in Depth

Implement constraints at multiple levels: database constraints as the ultimate safety net, application validation for user-friendly error messages, and frontend validation for immediate feedback. Database constraints catch issues that slip through application bugs or direct database access.

Mapping Generalization Hierarchies

Generalization (supertype-subtype) hierarchies from the conceptual model can be mapped to relational tables in several ways. Each approach has trade-offs affecting query complexity, storage efficiency, and constraint enforcement.

Three Primary Mapping Strategies:

Strategy A: Single Table (Table-per-Hierarchy)

All entity types in the hierarchy are merged into one table with a type discriminator column.

CREATE TABLE persons (
    person_id   INTEGER PRIMARY KEY,
    person_type VARCHAR(20) NOT NULL,
    -- Common attributes
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255),
    -- Employee attributes (nullable)
    hire_date   DATE,
    salary      DECIMAL(10,2),
    -- Customer attributes (nullable)
    credit_limit DECIMAL(10,2),
    loyalty_tier VARCHAR(20),
    
    CHECK (person_type IN ('employee', 'customer'))
);

Pros:

Simple queries (no joins for complete entity)
Easy polymorphic queries
Single table to maintain

Cons:

Many NULL columns
Wasted storage
Subtype constraints are harder to enforce

Strategy B: Multiple Tables (Table-per-Type)

Each entity type gets its own table. Subtypes reference supertype via foreign key.

CREATE TABLE persons (
    person_id   INTEGER PRIMARY KEY,
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255)
);

CREATE TABLE employees (
    person_id   INTEGER PRIMARY KEY,
    hire_date   DATE NOT NULL,
    salary      DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (person_id) 
        REFERENCES persons(person_id)
);

CREATE TABLE customers (
    person_id   INTEGER PRIMARY KEY,
    credit_limit DECIMAL(10,2),
    loyalty_tier VARCHAR(20),
    FOREIGN KEY (person_id) 
        REFERENCES persons(person_id)
);

Pros:

No NULL columns
Subtype attributes have proper NOT NULL constraints
Storage efficient

Cons:

Joins required for complete entity
More tables to manage

Strategy C: Subtype Tables Only (Table-per-Concrete-Class)

No supertype table exists. Each subtype contains all attributes, including inherited ones.

CREATE TABLE employees (
    employee_id INTEGER PRIMARY KEY,
    -- Repeated from supertype
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255),
    -- Employee-specific
    hire_date   DATE NOT NULL,
    salary      DECIMAL(10,2) NOT NULL
);

CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    -- Repeated from supertype
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255),
    -- Customer-specific
    credit_limit DECIMAL(10,2),
    loyalty_tier VARCHAR(20)
);

Pros:

Fast queries for single subtype (no joins)
Each table is self-contained

Cons:

Redundant supertype columns
Polymorphic queries require UNION
Harder to maintain consistent supertype definition

Strategy Selection Guidelines:

Generalization Mapping Strategy Selection
Consideration	Single Table	Multiple Tables	Subtype Only
Few subtypes, many shared attributes	✓ Best	Good	Acceptable
Many subtypes, distinct attributes	Avoid	✓ Best	Good
Frequent polymorphic queries	✓ Best	Acceptable	Avoid
Strict subtype constraints needed	Harder	✓ Best	Good
Overlapping subtypes allowed	Complex	✓ Best	Avoid
Performance-critical single-type queries	Good	Slower	✓ Best

Disjoint vs. Overlapping Enforcement

Disjoint generalization constraints (entity can be in only one subtype) are difficult to enforce with table constraints alone. For Strategy B, preventing a person from being both employee AND customer requires either application logic, triggers, or a type discriminator in the supertype table with FK constraints.

Schema Documentation and Standards

A logical schema is not complete without documentation. The DDL alone doesn't capture the why behind design decisions, the business meaning of tables and columns, or the conventions followed. Comprehensive documentation ensures the schema remains maintainable as teams change.

Essential Documentation Elements:

Schema Documentation Checklist

•Schema Overview — Purpose, scope, major functional areas, design philosophy
•Table Descriptions — Business purpose of each table, relationship to real-world entities or concepts
•Column Definitions — Business meaning (not just data type), valid values, business rules enforced
•Relationship Descriptions — Business meaning of each FK relationship, cardinality rationale
•Design Decisions — Why specific choices were made (normalization level, generalization strategy, etc.)
•Naming Conventions — Standards for table names, column names, constraint names, etc.
•Diagrams — ER diagrams, table relationship diagrams at various levels of detail

Naming Conventions:

Consistent naming dramatically improves schema usability:

Element	Convention (Example)	Notes
Tables	lowercase_plural	`customers`, `order_lines`
Columns	lowercase_snake_case	`first_name`, `created_at`
Primary Keys	table_singular_id	`customer_id`, `order_id`
Foreign Keys	referenced_table_id	`customer_id` in orders table
Junction Tables	entity1_entity2	`student_courses`, `user_roles`
Indexes	idx_table_columns	`idx_orders_customer_date`
Constraints	type_table_column	`chk_orders_status`, `fk_orders_customer`

In-Database Documentation:

schema-documentation-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- PostgreSQL COMMENT syntax (most DBMS have equivalent)
 
COMMENT ON TABLE customers IS 
'Core customer entity representing individuals or businesses 
who purchase products. Source of truth for customer identity 
and contact information.';
 
COMMENT ON COLUMN customers.customer_id IS 
'System-generated unique identifier. Never exposed externally; 
use customer_code for external references.';
 
COMMENT ON COLUMN customers.customer_code IS 
'Human-readable customer identifier (CUS-XXXXXX format). 
Displayed on invoices, used in customer communications.';
 
COMMENT ON COLUMN customers.credit_limit IS 
'Maximum outstanding balance allowed before requiring payment. 
Set by finance department based on credit assessment. 
NULL indicates no credit extended (cash-only customer).';
 
COMMENT ON COLUMN customers.created_at IS 
'Timestamp when customer record was created. 
Immutable after insertion. Used for customer age calculations 
and anniversary notifications.';
 
-- Document constraints
COMMENT ON CONSTRAINT chk_customers_email ON customers IS 
'Ensures email follows valid format per RFC 5322. 
Applied after UI validation as defense-in-depth.';

Living Documentation

Treat schema documentation like code: version control it, review changes, and keep it synchronized with the actual schema. Tools like SchemaSpy, dbdocs.io, or built-in DBMS documentation features can generate reference documentation from comments. Out-of-date documentation is worse than no documentation—it misleads.

Summary: From Concepts to Tables

Logical design transforms abstract conceptual models into concrete, implementable database schemas. This phase requires both systematic application of mapping algorithms and engineering judgment for trade-off decisions.

Key Takeaways

•Entity mapping is systematic — Strong entities become tables; attributes become columns; keys are designated. Composite and multi-valued attributes require special handling.
•Relationship mapping depends on cardinality — 1:N uses foreign keys on the N-side; M:N requires junction tables; 1:1 offers multiple valid approaches based on participation.
•Normalization eliminates anomalies — Understanding 1NF through BCNF enables evaluation of schema quality. 3NF is the standard target for OLTP systems.
•Constraints enforce business rules — NOT NULL, UNIQUE, CHECK, and FOREIGN KEY constraints provide database-level enforcement independent of application code.
•Generalization mapping involves trade-offs — Single table, multiple table, and subtype-only strategies each have performance and maintenance implications.
•Documentation is essential — In-database comments, external documentation, and consistent naming conventions ensure long-term maintainability.

What's Next:

With the logical schema defined—tables, columns, keys, and constraints—we enter the physical design phase. Here, we make decisions about how the logical schema will be physically implemented: storage structures, indexing strategies, partitioning schemes, and performance optimizations that transform a correct schema into a fast schema.

The next page explores physical design: index selection, partitioning strategies, storage parameters, and the performance considerations that bridge the gap between logical correctness and production readiness.

Page Complete

You now understand logical design as the systematic transformation of conceptual models into relational schemas. You can map entities to tables, apply normalization principles, specify constraints, choose generalization strategies, and document your decisions. Next, we'll optimize this schema for real-world performance through physical design.

Logical Design

From Abstract Models to Implementable Schemas

What You Will Learn

Logical Design Fundamentals

Formal Definition:

The Relational Model as Target:

The vast majority of logical design targets the relational model, characterized by:

Relations (Tables): Named collections of tuples with a fixed schema
Attributes (Columns): Named, typed properties of each tuple
Tuples (Rows): Individual data instances conforming to the schema
Keys: Attributes that uniquely identify tuples
Constraints: Rules that valid database states must satisfy

Logical Design Objectives:

Quality Criteria for Logical Schemas

•Semantic Preservation — The logical schema accurately represents all information from the conceptual model; no semantic content is lost or distorted
•Minimality — No redundant information is stored; each fact is recorded exactly once (subject to deliberate denormalization decisions)
•Constraint Enforcement — Business rules from requirements are enforced by the schema itself, not just by application code
•Naming Clarity — Table and column names are clear, consistent, and follow organizational conventions
•Query Support — The schema structure supports the queries and transactions identified in requirements efficiently
•Evolution Readiness — The schema can accommodate foreseeable changes without structural redesign

The ER-to-Relational Mapping Process:

Logical design follows a systematic transformation:

Conceptual Element	Logical Element	Mapping Rule
Entity Type	Table	One table per strong entity
Attribute	Column	One column per simple attribute
Composite Attribute	Multiple Columns	Flatten or create structured type
Multi-valued Attribute	Separate Table	New table with foreign key to parent
Relationship	Foreign Key or Table	Depends on cardinality
Weak Entity	Table + Partial Key	Includes owner's primary key
Generalization	Various approaches	Single table, multiple tables, or hybrid

These rules provide a systematic algorithm for transformation, though they often require judgment in application.

Tool Support

Mapping Entities to Tables

The most straightforward mapping: each strong entity type becomes a table. The entity's simple attributes become columns, and the entity's key attribute(s) become the primary key.

Basic Entity Mapping Algorithm:

For each strong entity E:
  1. Create table T with name derived from E
  2. For each simple attribute A of E:
     - Add column A to T with appropriate data type
  3. For each composite attribute C of E:
     - Add columns for each component of C
     - OR create structured type (if DBMS supports)
  4. For key attribute(s) K of E:
     - Designate K column(s) as PRIMARY KEY

Example Transformation:

entity-mapping-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Conceptual: CUSTOMER entity
-- Attributes: customer_id (key), name (composite: first, last), 
--             email, registration_date
 
-- Logical Mapping:
CREATE TABLE customers (
    customer_id     INTEGER         PRIMARY KEY,
    first_name      VARCHAR(50)     NOT NULL,
    last_name       VARCHAR(50)     NOT NULL,
    email           VARCHAR(255)    NOT NULL UNIQUE,
    registration_date DATE          NOT NULL DEFAULT CURRENT_DATE
);
 
-- Note the decisions made:
-- 1. Table name: lowercase plural (convention)
-- 2. Composite name → two columns (first_name, last_name)
-- 3. Email marked UNIQUE (alternate key from conceptual model)
-- 4. registration_date has default (implementation enhancement)

Handling Composite Attributes:

Composite attributes present a design choice:

Option A: Flatten to multiple columns

-- Address as composite attribute
CREATE TABLE customers (
    ...
    address_street    VARCHAR(100),
    address_city      VARCHAR(50),
    address_state     CHAR(2),
    address_zip       VARCHAR(10),
    address_country   VARCHAR(50)
);

Advantages: Simple queries, wide tool support Disadvantages: Repetitive if same structure appears multiple times

Option B: Create structured type (if DBMS supports)

-- PostgreSQL composite type
CREATE TYPE address_type AS (
    street    VARCHAR(100),
    city      VARCHAR(50),
    state     CHAR(2),
    zip       VARCHAR(10),
    country   VARCHAR(50)
);

CREATE TABLE customers (
    ...
    address address_type
);

Advantages: Reusable, cleaner schema Disadvantages: Limited DBMS support, complex querying

Handling Multi-valued Attributes:

Multi-valued attributes cannot be stored directly in a relational column (violates 1NF). They require a separate table:

multivalued-attribute-mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Conceptual: CUSTOMER has multi-valued attribute phone_numbers
 
-- WRONG: Storing multiple phones in one column
-- CREATE TABLE customers (
--     phone_numbers VARCHAR(500) -- "555-1234, 555-5678"  WRONG!
-- );
 
-- CORRECT: Separate table for multi-valued attribute
CREATE TABLE customer_phones (
    customer_id     INTEGER         NOT NULL,
    phone_number    VARCHAR(20)     NOT NULL,
    phone_type      VARCHAR(10)     DEFAULT 'mobile',
    is_primary      BOOLEAN         DEFAULT FALSE,
    
    PRIMARY KEY (customer_id, phone_number),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
        ON DELETE CASCADE
);
 
-- Each row represents ONE phone for ONE customer
-- Multiple rows for customers with multiple phones

Derived Attributes

Mapping Relationships to Keys and Tables

Mapping Strategy by Cardinality:

One-to-Many (1:N) Mapping:

The most common relationship type. The foreign key is placed in the table on the "many" (N) side.

Algorithm:

Create tables for both entities
Add a foreign key column to the N-side table
The FK references the PK of the 1-side table
If relationship has attributes, add them to the N-side table

Example: Department (1) — employs → (N) Employee

CREATE TABLE departments (
    department_id   INTEGER PRIMARY KEY,
    name            VARCHAR(100) NOT NULL,
    budget          DECIMAL(12,2)
);

CREATE TABLE employees (
    employee_id     INTEGER PRIMARY KEY,
    name            VARCHAR(100) NOT NULL,
    hire_date       DATE NOT NULL,
    -- Foreign key implements 1:N relationship
    department_id   INTEGER NOT NULL,
    FOREIGN KEY (department_id) 
        REFERENCES departments(department_id)
);

Participation Constraints:

NOT NULL on FK = Total participation (employee MUST have department)
NULL allowed on FK = Partial participation (employee MAY have no department)

Relationship Attributes: If the 'employs' relationship had attributes (e.g., position_title, start_date in department), they go in the employees table alongside the FK.

Weak Entity Mapping:

Weak entities depend on an owner entity for identification. Their table includes the owner's primary key as part of their own composite key:

weak-entity-mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Conceptual: ORDER (strong) --contains--> ORDER_LINE (weak)
-- ORDER_LINE is identified by order_id + line_number
 
CREATE TABLE orders (
    order_id        INTEGER PRIMARY KEY,
    customer_id     INTEGER NOT NULL,
    order_date      DATE NOT NULL,
    status          VARCHAR(20) NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
 
CREATE TABLE order_lines (
    order_id        INTEGER NOT NULL,
    line_number     INTEGER NOT NULL,  -- Partial key (discriminator)
    product_id      INTEGER NOT NULL,
    quantity        INTEGER NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10,2) NOT NULL,
    
    -- Composite primary key includes owner's PK
    PRIMARY KEY (order_id, line_number),
    
    -- Identifying relationship
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
        ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Foreign Key Actions

Normalization: Theory and Application

Why Normalize?

Unnormalized schemas suffer from anomalies:

Insert Anomaly: Cannot insert data without unrelated data (can't record a department without an employee)
Update Anomaly: Updating one fact requires updating multiple rows (change department name in every employee row)
Delete Anomaly: Deleting data inadvertently removes unrelated data (deleting last employee loses department info)

Normal Forms Overview:

Normal Forms and Their Requirements
Normal Form	Requirement	Eliminates	Practical Importance
1NF	Atomic values, no repeating groups	Multi-valued attributes in cells	Essential — fundamental to relational model
2NF	1NF + No partial dependencies on composite key	Redundancy from partial key dependencies	Important for composite key tables
3NF	2NF + No transitive dependencies	Redundancy from non-key dependencies	Standard target for OLTP systems
BCNF	Every determinant is a candidate key	Remaining partial and transitive issues	Stronger version of 3NF
4NF	No multi-valued dependencies	Independent multi-valued facts	Specialized situations
5NF	No join dependencies	Lossy decomposition risks	Rarely needed in practice

First Normal Form (1NF):

A table is in 1NF if:

All attributes contain atomic (indivisible) values
No repeating groups or arrays
Each row is unique (has a primary key)

Violation Example:

STUDENT(id, name, courses)  -- courses = "CS101, CS102, CS201"

1NF Solution:

STUDENT(id, name)
STUDENT_COURSES(student_id, course)  -- One row per course

Second Normal Form (2NF):

A table is in 2NF if:

It is in 1NF
Every non-key attribute depends on the ENTIRE primary key (no partial dependencies)

Only relevant for tables with composite primary keys.

Violation Example:

ORDER_ITEMS(order_id, product_id, quantity, product_name, product_price)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        product_name and product_price depend only on product_id, not on (order_id, product_id)

2NF Solution:

ORDER_ITEMS(order_id, product_id, quantity)
PRODUCTS(product_id, product_name, product_price)

Third Normal Form (3NF):

A table is in 3NF if:

It is in 2NF
No non-key attribute depends on another non-key attribute (no transitive dependencies)

Memorize: "Every non-key attribute must depend on the key, the whole key, and nothing but the key."

Violation Example:

EMPLOYEES(emp_id, emp_name, dept_id, dept_name, dept_location)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          dept_name and dept_location depend on dept_id, not emp_id

3NF Solution:

EMPLOYEES(emp_id, emp_name, dept_id)
DEPARTMENTS(dept_id, dept_name, dept_location)

Boyce-Codd Normal Form (BCNF):

A table is in BCNF if:

For every functional dependency X → Y, X is a superkey

BCNF is stronger than 3NF. Tables in BCNF are automatically in 3NF, but not vice versa. The difference matters in rare cases with overlapping candidate keys.

Practical Normalization Target

Constraint Specification

Constraint Categories:

Database Constraint Types
Constraint Type	Purpose	Scope	Example
NOT NULL	Attribute must have a value	Single column	`email VARCHAR(255) NOT NULL`
UNIQUE	No duplicate values allowed	Column(s)	`UNIQUE (email)`
PRIMARY KEY	Unique row identifier, implies NOT NULL	Column(s)	`PRIMARY KEY (order_id)`
FOREIGN KEY	Referential integrity between tables	Column(s)	`FOREIGN KEY (dept_id) REFERENCES departments`
CHECK	Custom boolean condition must be true	Row or column	`CHECK (quantity > 0)`
DEFAULT	Value when none specified	Single column	`status VARCHAR(20) DEFAULT 'pending'`

CHECK Constraints for Business Rules:

CHECK constraints encode business rules directly in the schema:

check-constraints-examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Range constraints
CREATE TABLE products (
    product_id      INTEGER PRIMARY KEY,
    name            VARCHAR(100) NOT NULL,
    price           DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    weight_kg       DECIMAL(8,3) CHECK (weight_kg > 0),
    discount_pct    DECIMAL(5,2) CHECK (discount_pct BETWEEN 0 AND 100)
);
 
-- Enumerated values
CREATE TABLE orders (
    order_id        INTEGER PRIMARY KEY,
    status          VARCHAR(20) NOT NULL 
                    CHECK (status IN ('pending', 'processing', 
                           'shipped', 'delivered', 'cancelled'))
);
 
-- Cross-column constraints
CREATE TABLE date_ranges (
    range_id        INTEGER PRIMARY KEY,
    start_date      DATE NOT NULL,
    end_date        DATE,
    CHECK (end_date IS NULL OR end_date >= start_date)
);
 
-- Pattern matching (where supported)
CREATE TABLE contacts (
    contact_id      INTEGER PRIMARY KEY,
    email           VARCHAR(255) 
                    CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'),
    phone           VARCHAR(20)
                    CHECK (phone ~ '^\+?[0-9\-\s]+$')
);

Advanced Constraint Patterns:

Conditional NOT NULL (nullable based on other column):

-- If status is 'shipped', ship_date must exist
CHECK (status != 'shipped' OR ship_date IS NOT NULL)

Mutually Exclusive Relationships:

-- A payment references either card OR bank account, not both
CHECK ((card_id IS NULL) != (bank_account_id IS NULL))

Exactly One Required:

-- Must have email OR phone OR both
CHECK (email IS NOT NULL OR phone IS NOT NULL)

Cross-Table Constraints (Triggers): Some constraints cannot be expressed with CHECK (they span multiple tables). These require triggers or application logic:

"An employee's salary cannot exceed their manager's salary"
"A department must have at least one employee"
"Order total must equal sum of line items"

Defense in Depth

Mapping Generalization Hierarchies

Three Primary Mapping Strategies:

Strategy A: Single Table (Table-per-Hierarchy)

All entity types in the hierarchy are merged into one table with a type discriminator column.

CREATE TABLE persons (
    person_id   INTEGER PRIMARY KEY,
    person_type VARCHAR(20) NOT NULL,
    -- Common attributes
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255),
    -- Employee attributes (nullable)
    hire_date   DATE,
    salary      DECIMAL(10,2),
    -- Customer attributes (nullable)
    credit_limit DECIMAL(10,2),
    loyalty_tier VARCHAR(20),
    
    CHECK (person_type IN ('employee', 'customer'))
);

Pros:

Simple queries (no joins for complete entity)
Easy polymorphic queries
Single table to maintain

Cons:

Many NULL columns
Wasted storage
Subtype constraints are harder to enforce

Strategy B: Multiple Tables (Table-per-Type)

Each entity type gets its own table. Subtypes reference supertype via foreign key.

CREATE TABLE persons (
    person_id   INTEGER PRIMARY KEY,
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255)
);

CREATE TABLE employees (
    person_id   INTEGER PRIMARY KEY,
    hire_date   DATE NOT NULL,
    salary      DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (person_id) 
        REFERENCES persons(person_id)
);

CREATE TABLE customers (
    person_id   INTEGER PRIMARY KEY,
    credit_limit DECIMAL(10,2),
    loyalty_tier VARCHAR(20),
    FOREIGN KEY (person_id) 
        REFERENCES persons(person_id)
);

Pros:

No NULL columns
Subtype attributes have proper NOT NULL constraints
Storage efficient

Cons:

Joins required for complete entity
More tables to manage

Strategy C: Subtype Tables Only (Table-per-Concrete-Class)

No supertype table exists. Each subtype contains all attributes, including inherited ones.

CREATE TABLE employees (
    employee_id INTEGER PRIMARY KEY,
    -- Repeated from supertype
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255),
    -- Employee-specific
    hire_date   DATE NOT NULL,
    salary      DECIMAL(10,2) NOT NULL
);

CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    -- Repeated from supertype
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(255),
    -- Customer-specific
    credit_limit DECIMAL(10,2),
    loyalty_tier VARCHAR(20)
);

Pros:

Fast queries for single subtype (no joins)
Each table is self-contained

Cons:

Redundant supertype columns
Polymorphic queries require UNION
Harder to maintain consistent supertype definition

Strategy Selection Guidelines:

Generalization Mapping Strategy Selection
Consideration	Single Table	Multiple Tables	Subtype Only
Few subtypes, many shared attributes	✓ Best	Good	Acceptable
Many subtypes, distinct attributes	Avoid	✓ Best	Good
Frequent polymorphic queries	✓ Best	Acceptable	Avoid
Strict subtype constraints needed	Harder	✓ Best	Good
Overlapping subtypes allowed	Complex	✓ Best	Avoid
Performance-critical single-type queries	Good	Slower	✓ Best

Disjoint vs. Overlapping Enforcement

Schema Documentation and Standards

Essential Documentation Elements:

Schema Documentation Checklist

•Schema Overview — Purpose, scope, major functional areas, design philosophy
•Table Descriptions — Business purpose of each table, relationship to real-world entities or concepts
•Column Definitions — Business meaning (not just data type), valid values, business rules enforced
•Relationship Descriptions — Business meaning of each FK relationship, cardinality rationale
•Design Decisions — Why specific choices were made (normalization level, generalization strategy, etc.)
•Naming Conventions — Standards for table names, column names, constraint names, etc.
•Diagrams — ER diagrams, table relationship diagrams at various levels of detail

Naming Conventions:

Consistent naming dramatically improves schema usability:

Element	Convention (Example)	Notes
Tables	lowercase_plural	`customers`, `order_lines`
Columns	lowercase_snake_case	`first_name`, `created_at`
Primary Keys	table_singular_id	`customer_id`, `order_id`
Foreign Keys	referenced_table_id	`customer_id` in orders table
Junction Tables	entity1_entity2	`student_courses`, `user_roles`
Indexes	idx_table_columns	`idx_orders_customer_date`
Constraints	type_table_column	`chk_orders_status`, `fk_orders_customer`

In-Database Documentation:

schema-documentation-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- PostgreSQL COMMENT syntax (most DBMS have equivalent)
 
COMMENT ON TABLE customers IS 
'Core customer entity representing individuals or businesses 
who purchase products. Source of truth for customer identity 
and contact information.';
 
COMMENT ON COLUMN customers.customer_id IS 
'System-generated unique identifier. Never exposed externally; 
use customer_code for external references.';
 
COMMENT ON COLUMN customers.customer_code IS 
'Human-readable customer identifier (CUS-XXXXXX format). 
Displayed on invoices, used in customer communications.';
 
COMMENT ON COLUMN customers.credit_limit IS 
'Maximum outstanding balance allowed before requiring payment. 
Set by finance department based on credit assessment. 
NULL indicates no credit extended (cash-only customer).';
 
COMMENT ON COLUMN customers.created_at IS 
'Timestamp when customer record was created. 
Immutable after insertion. Used for customer age calculations 
and anniversary notifications.';
 
-- Document constraints
COMMENT ON CONSTRAINT chk_customers_email ON customers IS 
'Ensures email follows valid format per RFC 5322. 
Applied after UI validation as defense-in-depth.';

Living Documentation

Summary: From Concepts to Tables

Key Takeaways

•Entity mapping is systematic — Strong entities become tables; attributes become columns; keys are designated. Composite and multi-valued attributes require special handling.
•Relationship mapping depends on cardinality — 1:N uses foreign keys on the N-side; M:N requires junction tables; 1:1 offers multiple valid approaches based on participation.
•Normalization eliminates anomalies — Understanding 1NF through BCNF enables evaluation of schema quality. 3NF is the standard target for OLTP systems.
•Constraints enforce business rules — NOT NULL, UNIQUE, CHECK, and FOREIGN KEY constraints provide database-level enforcement independent of application code.
•Generalization mapping involves trade-offs — Single table, multiple table, and subtype-only strategies each have performance and maintenance implications.
•Documentation is essential — In-database comments, external documentation, and consistent naming conventions ensure long-term maintainability.

What's Next:

Page Complete