Database Management SystemsDatabase Design Process

Design Phases

LevelIntermediate

Duration90 mins

TopicDatabase Design Process

5 / 5

Implementation

From Design Artifacts to Running Systems

All the careful work of requirements analysis, conceptual modeling, logical design, and physical optimization now converges on a single goal: creating a working database system. Implementation is where diagrams become tables, where specifications become constraints, where capacity projections meet real hardware.

But implementation is far more than simply running CREATE TABLE statements. It encompasses deployment strategies that minimize downtime, data migration that preserves integrity across systems, testing regimens that validate correctness under real conditions, and operational handoffs that ensure the database can be maintained long-term.

This phase separates database architects who deliver on paper from those who deliver systems that run reliably in production for years. The best design in the world is worthless if implementation fails—if data is corrupted during migration, if deployments cause outages, or if operational teams cannot maintain what was built.

What You Will Learn

By the end of this page, you will master the implementation phase: DDL script development and organization, deployment strategies for zero-downtime changes, data migration techniques, comprehensive testing approaches, and operational documentation. You will understand how to transition from design completion to production operation.

DDL Script Development and Organization

The Data Definition Language (DDL) scripts are the source code of your database. Like application code, they should be organized, versioned, documented, and reviewed. Unlike application code that can be easily rebuilt, DDL scripts must handle the complexity of modifying systems that contain irreplaceable data.

DDL Script Organization:

project-structure.txt

Text

database/
├── migrations/              # Versioned schema changes
│   ├── V001__initial_schema.sql
│   ├── V002__add_customer_phone.sql
│   ├── V003__create_orders_table.sql
│   └── ...
├── baseline/               # Clean schema (for new environments)
│   ├── 01_types.sql        # Custom types, enums
│   ├── 02_tables.sql       # All table definitions
│   ├── 03_constraints.sql  # FK, CHECK constraints
│   ├── 04_indexes.sql      # All indexes
│   ├── 05_views.sql        # View definitions
│   ├── 06_functions.sql    # Stored functions
│   └── 07_triggers.sql     # Trigger definitions
├── seed/                   # Initial/reference data
│   ├── countries.sql
│   ├── currencies.sql
│   └── system_config.sql
├── rollback/              # Rollback scripts (if needed)
│   ├── R001__rollback_initial.sql
│   └── ...
└── docs/
    ├── schema_diagram.png
    └── data_dictionary.md

Migration Script Best Practices:

DDL Script Standards

•Sequential versioning — Use numbered prefixes (V001, V002) to ensure execution order
•One logical change per script — Don't mix unrelated changes; easier to debug and rollback
•Idempotent where possible — Scripts that can safely run multiple times (IF NOT EXISTS)
•Include comments — Document the why, not just the what; reference ticket numbers
•Test before committing — Run against development database before adding to version control
•Never modify applied migrations — Once a migration has run in any environment, create a new migration instead

V003__create_orders_table.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
-- Migration: V003__create_orders_table.sql
-- Author: J. Smith
-- Date: 2024-03-15
-- Ticket: PROJ-1234
-- Description: Creates orders and order_lines tables for order management
 
-- =====================================================
-- ORDERS TABLE
-- Stores customer orders with status tracking
-- =====================================================
CREATE TABLE IF NOT EXISTS orders (
    order_id        BIGSERIAL       PRIMARY KEY,
    customer_id     INTEGER         NOT NULL,
    order_date      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status          VARCHAR(20)     NOT NULL DEFAULT 'pending',
    shipping_address_id INTEGER,
    billing_address_id INTEGER,
    subtotal        DECIMAL(10,2)   NOT NULL DEFAULT 0,
    tax_amount      DECIMAL(10,2)   NOT NULL DEFAULT 0,
    shipping_cost   DECIMAL(10,2)   NOT NULL DEFAULT 0,
    total           DECIMAL(10,2)   NOT NULL GENERATED ALWAYS AS 
                    (subtotal + tax_amount + shipping_cost) STORED,
    notes           TEXT,
    created_at      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    
    -- Constraints
    CONSTRAINT fk_orders_customer 
        FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    CONSTRAINT fk_orders_shipping_address 
        FOREIGN KEY (shipping_address_id) REFERENCES addresses(address_id),
    CONSTRAINT fk_orders_billing_address 
        FOREIGN KEY (billing_address_id) REFERENCES addresses(address_id),
    CONSTRAINT chk_orders_status 
        CHECK (status IN ('pending', 'confirmed', 'processing', 
                          'shipped', 'delivered', 'cancelled')),
    CONSTRAINT chk_orders_amounts_positive
        CHECK (subtotal >= 0 AND tax_amount >= 0 AND shipping_cost >= 0)
);
 
-- Comments for documentation
COMMENT ON TABLE orders IS 'Customer orders with status tracking and totals';
COMMENT ON COLUMN orders.status IS 'Order lifecycle status: pending→confirmed→processing→shipped→delivered';
COMMENT ON COLUMN orders.total IS 'Computed total = subtotal + tax + shipping; auto-maintained';
 
-- =====================================================
-- ORDER_LINES TABLE  
-- Line items for each order
-- =====================================================
CREATE TABLE IF NOT EXISTS order_lines (
    order_id        BIGINT          NOT NULL,
    line_number     INTEGER         NOT NULL,
    product_id      INTEGER         NOT NULL,
    quantity        INTEGER         NOT NULL,
    unit_price      DECIMAL(10,2)   NOT NULL,
    discount_pct    DECIMAL(5,2)    NOT NULL DEFAULT 0,
    line_total      DECIMAL(10,2)   NOT NULL GENERATED ALWAYS AS
                    (quantity * unit_price * (1 - discount_pct/100)) STORED,
    
    PRIMARY KEY (order_id, line_number),
    CONSTRAINT fk_order_lines_order 
        FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
    CONSTRAINT fk_order_lines_product 
        FOREIGN KEY (product_id) REFERENCES products(product_id),
    CONSTRAINT chk_order_lines_quantity 
        CHECK (quantity > 0),
    CONSTRAINT chk_order_lines_price 
        CHECK (unit_price >= 0),
    CONSTRAINT chk_order_lines_discount 
        CHECK (discount_pct BETWEEN 0 AND 100)
);
 
-- =====================================================
-- INDEXES
-- Based on expected query patterns
-- =====================================================
CREATE INDEX IF NOT EXISTS idx_orders_customer 
    ON orders(customer_id);
    
CREATE INDEX IF NOT EXISTS idx_orders_status_date 
    ON orders(status, order_date DESC);
    
CREATE INDEX IF NOT EXISTS idx_orders_date 
    ON orders(order_date DESC);
 
CREATE INDEX IF NOT EXISTS idx_order_lines_product 
    ON order_lines(product_id);
 
-- =====================================================
-- TRIGGERS
-- Auto-update timestamp on modification
-- =====================================================
CREATE OR REPLACE FUNCTION update_orders_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
DROP TRIGGER IF EXISTS trg_orders_updated ON orders;
CREATE TRIGGER trg_orders_updated
    BEFORE UPDATE ON orders
    FOR EACH ROW
    EXECUTE FUNCTION update_orders_timestamp();

Migration Tools

Use migration tools like Flyway, Liquibase, Alembic (Python), or Knex (Node.js) to manage DDL versioning. These tools track which migrations have been applied, ensure consistent execution order, and prevent accidental re-execution of completed migrations.

Deployment Strategies

Database deployment differs fundamentally from application deployment. Applications can be replaced atomically; databases must be modified in place while preserving data. A failed application deployment can be rolled back by reverting to the previous version; a failed database migration may have already modified millions of rows.

Deployment Approaches:

Database Deployment Strategies
Strategy	Description	Use When	Risks
Maintenance Window	Take system offline, apply changes, bring back	Infrequent, complex changes; acceptable downtime	Downtime impacts users; rollback pressure
Rolling Deployment	Apply to replicas one at a time	Read replicas, eventual consistency acceptable	Version skew between nodes during rollout
Blue-Green	Maintain two environments, switch traffic	Full cutover with instant rollback	Double infrastructure cost; data sync complexity
Online Migration	Change schema while system runs	Zero-downtime requirements	Complex tooling; long migration windows for large tables

Online Schema Change Techniques:

For systems requiring zero downtime, schema changes must happen while queries continue to execute:

Expand-Contract Pattern:

Phase 1: EXPAND — Add new structure, don't remove old
Phase 2: MIGRATE — Copy/transform data, dual-write
Phase 3: CONTRACT — Remove old structure after verification

Example: Renaming a Column

-- WRONG: Instant breakage
-- ALTER TABLE customers RENAME COLUMN phone TO phone_number;

-- CORRECT: Expand-Contract
-- Phase 1: Add new column (applications still use 'phone')
ALTER TABLE customers ADD COLUMN phone_number VARCHAR(20);

-- Phase 2: Backfill existing data
UPDATE customers SET phone_number = phone WHERE phone_number IS NULL;

-- Phase 2b: Deploy application that writes to BOTH columns
-- Phase 2c: Deploy application that reads from new column

-- Phase 3: After all applications updated, drop old column
ALTER TABLE customers DROP COLUMN phone;

Large Table Alterations:

ALTER TABLE operations on large tables can lock the table for extended periods. Specialized tools handle this:

Online Schema Change Tools

•gh-ost (GitHub) — MySQL online schema migration using binlog parsing; no triggers
•pt-online-schema-change (Percona) — MySQL tool using triggers to capture changes
•pg_repack (PostgreSQL) — Repacks tables to remove bloat without exclusive locks
•LHM (Large Hadron Migrator) — Ruby library for MySQL online migrations
•Native Online DDL — MySQL 8.0+ and PostgreSQL 11+ support many alterations online

Lock Awareness

Know which DDL operations acquire exclusive locks in your DBMS. Adding a column with a default value locks the entire table in older PostgreSQL versions. Adding a NOT NULL constraint requires a full table scan. Always test DDL operations on a copy of production data to understand lock duration.

Data Migration Techniques

Data migration—moving data from legacy systems, flat files, or other databases into the new schema—is often the most complex and error-prone part of implementation. Data quality issues lurking in source systems surface during migration; transformations may have unexpected edge cases; volumes may overwhelm initial capacity estimates.

Migration Types:

Initial Load Approaches

•Bulk insert — COPY/LOAD DATA for maximum speed; disable indexes during load
•Batch processing — Load in chunks with commits between batches; recoverable
•Parallel streams — Multiple processes loading different data subsets
•Staged loading — Load to staging tables, transform, then move to final tables

Ongoing Sync Approaches

•Change Data Capture (CDC) — Capture changes from source, replay to target
•Dual-write — Application writes to both old and new systems during transition
•Trigger-based — Triggers on source tables capture and forward changes
•Log-based replication — Parse transaction logs for change extraction

ETL Pipeline Design:

migration-pipeline-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- Example: Migrating customer data from legacy system
 
-- Step 1: Create staging table matching source structure
CREATE TABLE staging_customers (
    legacy_id       VARCHAR(20),
    full_name       VARCHAR(200),
    email_addr      VARCHAR(255),
    phone1          VARCHAR(30),
    phone2          VARCHAR(30),
    addr_line1      VARCHAR(100),
    addr_line2      VARCHAR(100),
    addr_city       VARCHAR(50),
    addr_state      VARCHAR(50),
    addr_zip        VARCHAR(20),
    created_date    VARCHAR(30),  -- Text, needs parsing
    status_code     VARCHAR(5),
    raw_data        JSONB         -- Store original for debugging
);
 
-- Step 2: Bulk load raw data
COPY staging_customers 
FROM '/data/exports/legacy_customers.csv' 
WITH (FORMAT csv, HEADER true);
 
-- Step 3: Data quality checks BEFORE transformation
-- Check for duplicates
SELECT email_addr, COUNT(*) 
FROM staging_customers 
GROUP BY email_addr HAVING COUNT(*) > 1;
 
-- Check for invalid emails
SELECT legacy_id, email_addr 
FROM staging_customers 
WHERE email_addr !~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';
 
-- Check for missing required fields
SELECT COUNT(*) FROM staging_customers WHERE full_name IS NULL;
 
-- Step 4: Transformation and loading
INSERT INTO customers (
    legacy_customer_id,
    first_name,
    last_name,
    email,
    created_at,
    status
)
SELECT 
    legacy_id,
    SPLIT_PART(full_name, ' ', 1) AS first_name,
    SPLIT_PART(full_name, ' ', 2) AS last_name,
    LOWER(TRIM(email_addr)),
    TO_TIMESTAMP(created_date, 'MM/DD/YYYY'),
    CASE status_code 
        WHEN 'A' THEN 'active'
        WHEN 'I' THEN 'inactive'
        WHEN 'S' THEN 'suspended'
        ELSE 'unknown'
    END
FROM staging_customers
WHERE email_addr IS NOT NULL  -- Skip invalid records
  AND full_name IS NOT NULL;
 
-- Step 5: Load phones to separate table (denormalized in source)
INSERT INTO customer_phones (customer_id, phone_number, phone_type)
SELECT c.customer_id, s.phone1, 'primary'
FROM staging_customers s
JOIN customers c ON c.legacy_customer_id = s.legacy_id
WHERE s.phone1 IS NOT NULL
UNION ALL
SELECT c.customer_id, s.phone2, 'secondary'
FROM staging_customers s
JOIN customers c ON c.legacy_customer_id = s.legacy_id
WHERE s.phone2 IS NOT NULL;
 
-- Step 6: Verification counts
SELECT 'staging' as source, COUNT(*) FROM staging_customers
UNION ALL
SELECT 'loaded', COUNT(*) FROM customers WHERE legacy_customer_id IS NOT NULL;
 
-- Step 7: Log exceptions for manual review
INSERT INTO migration_exceptions (source_table, source_id, issue, raw_data)
SELECT 'staging_customers', legacy_id, 'missing_email', to_jsonb(s.*)
FROM staging_customers s
WHERE email_addr IS NULL;

Data Quality Reality

Legacy data is never as clean as expected. Budget significant time for data cleansing: duplicates, missing values, invalid formats, encoding issues, orphaned references. Create exception handling that logs problematic records for manual review rather than failing the entire migration.

Comprehensive Database Testing

Database testing validates that the implementation correctly realizes the design. It encompasses schema correctness, constraint enforcement, data integrity, performance characteristics, and integration behavior.

Testing Categories:

Database Testing Types
Test Type	What It Validates	Example Tests
Schema Tests	Structure matches specification	Table exists, columns have correct types, constraints present
Constraint Tests	Business rules enforced	FK violations rejected, CHECK constraints fire, UNIQUE enforced
Data Integrity	Data remains consistent	No orphans, aggregates match, temporal consistency
Performance Tests	Queries meet SLAs	Query response times, throughput under load, index effectiveness
Migration Tests	Data transformed correctly	Row counts match, values translated properly, no data loss
Integration Tests	Application works with database	CRUD operations succeed, transactions behave correctly

Schema Validation Tests:

schema-tests.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Schema Test Suite (PostgreSQL example)
-- Can be run via pgTAP or similar testing framework
 
-- Test 1: Required tables exist
SELECT has_table('customers', 'Table customers should exist');
SELECT has_table('orders', 'Table orders should exist');
SELECT has_table('order_lines', 'Table order_lines should exist');
 
-- Test 2: Required columns exist with correct types
SELECT has_column('customers', 'customer_id', 'customers.customer_id exists');
SELECT col_type_is('customers', 'customer_id', 'integer', 
    'customer_id is integer');
SELECT col_type_is('customers', 'email', 'character varying(255)', 
    'email is varchar(255)');
 
-- Test 3: Primary keys defined
SELECT has_pk('customers', 'customers has primary key');
SELECT has_pk('orders', 'orders has primary key');
 
-- Test 4: Foreign keys defined
SELECT has_fk('orders', 'orders has foreign key');
SELECT fk_ok('orders', 'customer_id', 'customers', 'customer_id',
    'orders.customer_id references customers.customer_id');
 
-- Test 5: Indexes exist
SELECT has_index('orders', 'idx_orders_customer', 
    'Index on orders.customer_id exists');
SELECT has_index('orders', 'idx_orders_status_date',
    'Index on orders(status, order_date) exists');
 
-- Test 6: Check constraints exist
SELECT has_check('orders', 'orders has check constraint');

Constraint Enforcement Tests:

constraint-tests.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Constraint Test Suite
-- These tests should FAIL (constraint prevents invalid data)
 
-- Test: FK prevents orphan orders
BEGIN;
DO $$
BEGIN
    INSERT INTO orders (customer_id, order_date) 
    VALUES (999999, CURRENT_DATE);
    -- Should not reach here
    RAISE EXCEPTION 'FK constraint failed to prevent orphan order';
EXCEPTION
    WHEN foreign_key_violation THEN
        RAISE NOTICE 'PASS: FK correctly prevents orphan order';
END $$;
ROLLBACK;
 
-- Test: CHECK prevents negative quantities
BEGIN;
DO $$
BEGIN
    INSERT INTO order_lines (order_id, line_number, product_id, quantity, unit_price)
    VALUES (1, 1, 1, -5, 10.00);
    RAISE EXCEPTION 'CHECK constraint failed to prevent negative quantity';
EXCEPTION
    WHEN check_violation THEN
        RAISE NOTICE 'PASS: CHECK correctly prevents negative quantity';
END $$;
ROLLBACK;
 
-- Test: UNIQUE prevents duplicate emails
BEGIN;
DO $$
BEGIN
    INSERT INTO customers (first_name, last_name, email) 
    VALUES ('Test', 'User', 'existing@example.com');
    INSERT INTO customers (first_name, last_name, email) 
    VALUES ('Test', 'Duplicate', 'existing@example.com');
    RAISE EXCEPTION 'UNIQUE constraint failed to prevent duplicate email';
EXCEPTION
    WHEN unique_violation THEN
        RAISE NOTICE 'PASS: UNIQUE correctly prevents duplicate email';
END $$;
ROLLBACK;

Automated Testing Pipeline

Integrate database tests into CI/CD pipelines. On every commit: spin up a test database, apply all migrations, run schema/constraint tests, execute performance benchmarks on sample data, and fail the build if any test fails. This prevents broken migrations from reaching production.

Performance Validation and Benchmarking

Performance testing validates that the physical design meets the requirements defined during the design phase. This must be done with realistic data volumes and query patterns—tests on small datasets are not meaningful indicators of production behavior.

Benchmark Design:

Performance Testing Elements

•Representative data volume — Test with production-scale data or realistic synthetic data
•Realistic data distribution — Cardinalities, null ratios, and value distributions should match production
•Actual query patterns — Test the specific queries applications will execute
•Concurrent load — Simulate multiple simultaneous users/connections
•Mixed workload — Combine reads and writes in production ratios
•Stress testing — Push beyond expected load to find breaking points

performance-benchmark.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Performance Benchmark Script
 
-- Generate test data (10 million orders)
INSERT INTO orders (customer_id, order_date, status, total)
SELECT 
    (random() * 100000)::int,  -- 100K customers
    timestamp '2020-01-01' + (random() * 1460) * interval '1 day',
    (ARRAY['pending','confirmed','shipped','delivered'])[ceil(random()*4)],
    (random() * 500 + 10)::decimal(10,2)
FROM generate_series(1, 10000000);
 
-- Analyze tables after loading
ANALYZE customers;
ANALYZE orders;
ANALYZE order_lines;
 
-- Benchmark Query 1: Recent orders for customer
\timing on
 
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, array_agg(ol.product_id) as products
FROM orders o
LEFT JOIN order_lines ol ON o.order_id = ol.order_id
WHERE o.customer_id = 12345
  AND o.order_date > CURRENT_DATE - interval '1 year'
GROUP BY o.order_id
ORDER BY o.order_date DESC
LIMIT 20;
 
-- Target: < 50ms
 
-- Benchmark Query 2: Daily sales aggregate
EXPLAIN (ANALYZE, BUFFERS)
SELECT DATE(order_date), COUNT(*), SUM(total)
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY DATE(order_date)
ORDER BY 1;
 
-- Target: < 500ms
 
-- Benchmark Query 3: Product sales ranking
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    p.product_id, 
    p.name,
    SUM(ol.quantity) as units_sold,
    SUM(ol.line_total) as revenue
FROM products p
JOIN order_lines ol ON p.product_id = ol.product_id
JOIN orders o ON ol.order_id = o.order_id
WHERE o.order_date > CURRENT_DATE - interval '30 days'
GROUP BY p.product_id, p.name
ORDER BY revenue DESC
LIMIT 100;
 
-- Target: < 1000ms
 
-- Record results
CREATE TABLE IF NOT EXISTS benchmark_results (
    run_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    query_name VARCHAR(50),
    execution_time_ms DECIMAL(10,2),
    rows_returned INTEGER,
    notes TEXT
);

Interpreting EXPLAIN Output:

Warning Sign	Indicates	Action
Seq Scan on large table	Missing or unused index	Add appropriate index
High Rows Removed by Filter	Inefficient index	Consider different index columns
Sort using disk	Insufficient work_mem	Increase memory or pre-sort via index
Nested Loop with high iterations	Cartesian product risk	Verify join conditions
Bitmap Heap Scan	Index used but not covering	Consider covering index

Load Testing Tools:

pgbench — PostgreSQL built-in benchmarking
sysbench — Multi-database OLTP benchmarking
HammerDB — TPC-C/TPC-H style benchmarks
Custom scripts — Application-specific workload simulation

Production-Like Environment

Performance tests must run on hardware similar to production. Testing on a developer laptop with SSD and 32GB RAM tells you nothing about performance on production servers with different characteristics. Cloud environments should use the same instance types and storage configurations as production.

Operational Documentation and Handoff

The database will outlive the project team. Comprehensive operational documentation ensures that future administrators, developers, and support personnel can maintain and evolve the system effectively.

Essential Documentation:

Operational Documentation Deliverables
Document	Audience	Content
Schema Documentation	Developers, DBAs	ERD, table/column descriptions, relationships, business rules
Runbook	Operations, DBA	Common tasks, maintenance procedures, troubleshooting guides
Backup/Recovery Plan	Operations, DR team	Backup schedule, retention, recovery procedures, RTO/RPO
Monitoring Guide	Operations, SRE	Key metrics, alert thresholds, escalation procedures
Security Documentation	Security, Audit	Access controls, encryption, audit logging, compliance
Capacity Plan	Infrastructure, Management	Current utilization, growth projections, scaling triggers

Runbook Template:

database-runbook.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Database Operations Runbook
## orders_db Production Database
 
### Quick Reference
| Item | Value |
|------|-------|
| DBMS | PostgreSQL 15.2 |
| Host | prod-db-01.example.com |
| Port | 5432 |
| Primary DB | orders_production |
| Replica DBs | prod-db-02, prod-db-03 |
| DBA Contact | dba-team@example.com |
| Escalation | PagerDuty: OPS-DB-001 |
 
### Common Operations
 
#### 1. Checking Database Status
```bash
psql -h prod-db-01 -U ops_readonly -c "SELECT pg_is_in_recovery();"
# Returns 'f' for primary, 't' for replica
```
 
#### 2. Monitoring Connections
```sql
SELECT state, count(*) 
FROM pg_stat_activity 
GROUP BY state;
-- Alert if 'active' > 100 or total > 200
```
 
#### 3. Identifying Long-Running Queries
```sql
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
  AND state = 'active';
```
 
#### 4. Killing a Runaway Query
```sql
-- Graceful cancellation
SELECT pg_cancel_backend(PID);
-- Forceful termination (if cancel fails)
SELECT pg_terminate_backend(PID);
```
 
### Troubleshooting
 
#### Symptom: Slow Queries
1. Check for missing indexes: EXPLAIN ANALYZE the slow query
2. Check for table bloat: pg_stat_user_tables.n_dead_tup
3. Check for lock contention: pg_locks joined with pg_stat_activity
4. Verify statistics are current: Check pg_stat_user_tables.last_analyze
 
#### Symptom: Disk Space Alert
1. Identify largest tables: pg_total_relation_size()
2. Check for WAL bloat: pg_wal_lsn_diff()
3. Identify unused indexes: pg_stat_user_indexes.idx_scan = 0
4. Run VACUUM if dead tuples > 10% of table size
 
### Maintenance Schedule
| Task | Frequency | Window |
|------|-----------|--------|
| VACUUM ANALYZE | Daily | 02:00-04:00 UTC |
| REINDEX (large tables) | Monthly | First Sunday 01:00 UTC |
| Full backup | Daily | 00:00 UTC |
| Transaction log backup | Every 15 min | Continuous |
| Statistics refresh | Weekly | Sunday 03:00 UTC |

Knowledge Transfer Sessions

Documentation alone is insufficient. Conduct formal knowledge transfer sessions with the operations team: walk through the architecture, demonstrate common procedures, explain design decisions that affect operations. Record these sessions for future team members.

Go-Live Planning and Cutover

The go-live cutover is the culmination of the entire database design and implementation effort. It's a high-stakes operation that requires meticulous planning, clear communication, and well-rehearsed procedures.

Cutover Planning Elements:

Go-Live Checklist

•Pre-cutover verification — All tests passed, sign-offs obtained, rollback plan ready
•Communication plan — Stakeholders notified, maintenance window announced
•Team assignments — Clear roles for each cutover step, contact list
•Success criteria — Measurable conditions that confirm successful cutover
•Rollback triggers — Specific conditions that trigger rollback decision
•Rollback procedure — Tested steps to revert if cutover fails
•Post-cutover monitoring — Enhanced monitoring period after go-live

Cutover Sequence Example:

T-24h: Final backup of source system
T-12h: Team briefing, confirm all prerequisites
T-2h:  Freeze source system (read-only or maintenance mode)
T-1h:  Final incremental data sync
T-0:   Point of no return decision
       - Execute final migration scripts
       - Update connection strings
       - Enable production traffic
T+15m: Initial verification checks
T+1h:  Extended verification, performance spot checks
T+24h: Exit hypercare, return to normal operations

Rollback Considerations:

Not all cutovers can be easily rolled back:

Scenario	Rollback Complexity
New database, no legacy	Low — just revert connection strings
Schema migration, same DBMS	Medium — run reverse migration scripts
DBMS platform change	High — need parallel systems, data sync
Data transformation with loss	Very High — may need source data reload

Always have a tested rollback plan, even if it's expensive.

Rehearse the Cutover

Never execute a production cutover without rehearsing it in a staging environment. Time each step, identify surprises, refine procedures. The rehearsal should be as close to production conditions as possible—same data volumes, same team, same time pressure.

Summary: The Complete Design Lifecycle

Implementation transforms design artifacts into running database systems. It requires disciplined processes, comprehensive testing, and careful attention to operational concerns that extend far beyond the initial deployment.

Key Takeaways

•DDL is code — Version control, review, test, and document database schema scripts with the same rigor as application code.
•Deployment strategies matter — Zero-downtime deployments require expand-contract patterns and specialized tooling for large tables.
•Data migration is complex — Legacy data is never clean; budget time for cleansing, transformation, and exception handling.
•Testing must be comprehensive — Schema validation, constraint enforcement, data integrity, and performance all require explicit testing.
•Performance validation requires production-like conditions — Tests on small datasets with different hardware are meaningless.
•Operational documentation is essential — The database outlives the project team; documentation enables long-term maintenance.
•Go-live requires meticulous planning — Rehearsed procedures, clear success criteria, and tested rollback plans minimize risk.

The Complete Design Lifecycle:

We have now traversed the complete database design process:

Requirements Analysis — Understanding what the database must accomplish
Conceptual Design — Modeling the domain in abstract, technology-independent terms
Logical Design — Transforming concepts into implementable structures
Physical Design — Optimizing for real-world performance
Implementation — Building, testing, and deploying the running system

Each phase builds on the previous; errors in early phases propagate through all subsequent phases. The discipline to invest appropriately in each phase—particularly the often-rushed requirements and conceptual phases—distinguishes successful database projects from troubled ones.

As you apply these principles, remember: a database is not a static artifact but a living system that evolves with the business it serves. The skills you've learned enable not just initial design but ongoing evolution, maintenance, and optimization throughout the database lifecycle.

Module Complete

Congratulations! You have completed the Design Phases module. You now understand the complete database design lifecycle from requirements through implementation. You can systematically approach database projects with the structured methodology used by experienced database architects, ensuring your designs are not just technically correct but practically successful in production environments.

5 / 5

Loading learning content...

Database Management SystemsDatabase Design Process

Design Phases

LevelIntermediate

Duration90 mins

TopicDatabase Design Process

5 / 5

Implementation

From Design Artifacts to Running Systems

What You Will Learn

DDL Script Development and Organization

DDL Script Organization:

project-structure.txt

Text

database/
├── migrations/              # Versioned schema changes
│   ├── V001__initial_schema.sql
│   ├── V002__add_customer_phone.sql
│   ├── V003__create_orders_table.sql
│   └── ...
├── baseline/               # Clean schema (for new environments)
│   ├── 01_types.sql        # Custom types, enums
│   ├── 02_tables.sql       # All table definitions
│   ├── 03_constraints.sql  # FK, CHECK constraints
│   ├── 04_indexes.sql      # All indexes
│   ├── 05_views.sql        # View definitions
│   ├── 06_functions.sql    # Stored functions
│   └── 07_triggers.sql     # Trigger definitions
├── seed/                   # Initial/reference data
│   ├── countries.sql
│   ├── currencies.sql
│   └── system_config.sql
├── rollback/              # Rollback scripts (if needed)
│   ├── R001__rollback_initial.sql
│   └── ...
└── docs/
    ├── schema_diagram.png
    └── data_dictionary.md

Migration Script Best Practices:

DDL Script Standards

•Sequential versioning — Use numbered prefixes (V001, V002) to ensure execution order
•One logical change per script — Don't mix unrelated changes; easier to debug and rollback
•Idempotent where possible — Scripts that can safely run multiple times (IF NOT EXISTS)
•Include comments — Document the why, not just the what; reference ticket numbers
•Test before committing — Run against development database before adding to version control
•Never modify applied migrations — Once a migration has run in any environment, create a new migration instead

V003__create_orders_table.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
-- Migration: V003__create_orders_table.sql
-- Author: J. Smith
-- Date: 2024-03-15
-- Ticket: PROJ-1234
-- Description: Creates orders and order_lines tables for order management
 
-- =====================================================
-- ORDERS TABLE
-- Stores customer orders with status tracking
-- =====================================================
CREATE TABLE IF NOT EXISTS orders (
    order_id        BIGSERIAL       PRIMARY KEY,
    customer_id     INTEGER         NOT NULL,
    order_date      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status          VARCHAR(20)     NOT NULL DEFAULT 'pending',
    shipping_address_id INTEGER,
    billing_address_id INTEGER,
    subtotal        DECIMAL(10,2)   NOT NULL DEFAULT 0,
    tax_amount      DECIMAL(10,2)   NOT NULL DEFAULT 0,
    shipping_cost   DECIMAL(10,2)   NOT NULL DEFAULT 0,
    total           DECIMAL(10,2)   NOT NULL GENERATED ALWAYS AS 
                    (subtotal + tax_amount + shipping_cost) STORED,
    notes           TEXT,
    created_at      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    
    -- Constraints
    CONSTRAINT fk_orders_customer 
        FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    CONSTRAINT fk_orders_shipping_address 
        FOREIGN KEY (shipping_address_id) REFERENCES addresses(address_id),
    CONSTRAINT fk_orders_billing_address 
        FOREIGN KEY (billing_address_id) REFERENCES addresses(address_id),
    CONSTRAINT chk_orders_status 
        CHECK (status IN ('pending', 'confirmed', 'processing', 
                          'shipped', 'delivered', 'cancelled')),
    CONSTRAINT chk_orders_amounts_positive
        CHECK (subtotal >= 0 AND tax_amount >= 0 AND shipping_cost >= 0)
);
 
-- Comments for documentation
COMMENT ON TABLE orders IS 'Customer orders with status tracking and totals';
COMMENT ON COLUMN orders.status IS 'Order lifecycle status: pending→confirmed→processing→shipped→delivered';
COMMENT ON COLUMN orders.total IS 'Computed total = subtotal + tax + shipping; auto-maintained';
 
-- =====================================================
-- ORDER_LINES TABLE  
-- Line items for each order
-- =====================================================
CREATE TABLE IF NOT EXISTS order_lines (
    order_id        BIGINT          NOT NULL,
    line_number     INTEGER         NOT NULL,
    product_id      INTEGER         NOT NULL,
    quantity        INTEGER         NOT NULL,
    unit_price      DECIMAL(10,2)   NOT NULL,
    discount_pct    DECIMAL(5,2)    NOT NULL DEFAULT 0,
    line_total      DECIMAL(10,2)   NOT NULL GENERATED ALWAYS AS
                    (quantity * unit_price * (1 - discount_pct/100)) STORED,
    
    PRIMARY KEY (order_id, line_number),
    CONSTRAINT fk_order_lines_order 
        FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
    CONSTRAINT fk_order_lines_product 
        FOREIGN KEY (product_id) REFERENCES products(product_id),
    CONSTRAINT chk_order_lines_quantity 
        CHECK (quantity > 0),
    CONSTRAINT chk_order_lines_price 
        CHECK (unit_price >= 0),
    CONSTRAINT chk_order_lines_discount 
        CHECK (discount_pct BETWEEN 0 AND 100)
);
 
-- =====================================================
-- INDEXES
-- Based on expected query patterns
-- =====================================================
CREATE INDEX IF NOT EXISTS idx_orders_customer 
    ON orders(customer_id);
    
CREATE INDEX IF NOT EXISTS idx_orders_status_date 
    ON orders(status, order_date DESC);
    
CREATE INDEX IF NOT EXISTS idx_orders_date 
    ON orders(order_date DESC);
 
CREATE INDEX IF NOT EXISTS idx_order_lines_product 
    ON order_lines(product_id);
 
-- =====================================================
-- TRIGGERS
-- Auto-update timestamp on modification
-- =====================================================
CREATE OR REPLACE FUNCTION update_orders_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
DROP TRIGGER IF EXISTS trg_orders_updated ON orders;
CREATE TRIGGER trg_orders_updated
    BEFORE UPDATE ON orders
    FOR EACH ROW
    EXECUTE FUNCTION update_orders_timestamp();

Migration Tools

Deployment Strategies

Deployment Approaches:

Database Deployment Strategies
Strategy	Description	Use When	Risks
Maintenance Window	Take system offline, apply changes, bring back	Infrequent, complex changes; acceptable downtime	Downtime impacts users; rollback pressure
Rolling Deployment	Apply to replicas one at a time	Read replicas, eventual consistency acceptable	Version skew between nodes during rollout
Blue-Green	Maintain two environments, switch traffic	Full cutover with instant rollback	Double infrastructure cost; data sync complexity
Online Migration	Change schema while system runs	Zero-downtime requirements	Complex tooling; long migration windows for large tables

Online Schema Change Techniques:

For systems requiring zero downtime, schema changes must happen while queries continue to execute:

Expand-Contract Pattern:

Phase 1: EXPAND — Add new structure, don't remove old
Phase 2: MIGRATE — Copy/transform data, dual-write
Phase 3: CONTRACT — Remove old structure after verification

Example: Renaming a Column

-- WRONG: Instant breakage
-- ALTER TABLE customers RENAME COLUMN phone TO phone_number;

-- CORRECT: Expand-Contract
-- Phase 1: Add new column (applications still use 'phone')
ALTER TABLE customers ADD COLUMN phone_number VARCHAR(20);

-- Phase 2: Backfill existing data
UPDATE customers SET phone_number = phone WHERE phone_number IS NULL;

-- Phase 2b: Deploy application that writes to BOTH columns
-- Phase 2c: Deploy application that reads from new column

-- Phase 3: After all applications updated, drop old column
ALTER TABLE customers DROP COLUMN phone;

Large Table Alterations:

ALTER TABLE operations on large tables can lock the table for extended periods. Specialized tools handle this:

Online Schema Change Tools

•gh-ost (GitHub) — MySQL online schema migration using binlog parsing; no triggers
•pt-online-schema-change (Percona) — MySQL tool using triggers to capture changes
•pg_repack (PostgreSQL) — Repacks tables to remove bloat without exclusive locks
•LHM (Large Hadron Migrator) — Ruby library for MySQL online migrations
•Native Online DDL — MySQL 8.0+ and PostgreSQL 11+ support many alterations online

Lock Awareness

Data Migration Techniques

Migration Types:

Initial Load Approaches

•Bulk insert — COPY/LOAD DATA for maximum speed; disable indexes during load
•Batch processing — Load in chunks with commits between batches; recoverable
•Parallel streams — Multiple processes loading different data subsets
•Staged loading — Load to staging tables, transform, then move to final tables

Ongoing Sync Approaches

•Change Data Capture (CDC) — Capture changes from source, replay to target
•Dual-write — Application writes to both old and new systems during transition
•Trigger-based — Triggers on source tables capture and forward changes
•Log-based replication — Parse transaction logs for change extraction

ETL Pipeline Design:

migration-pipeline-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- Example: Migrating customer data from legacy system
 
-- Step 1: Create staging table matching source structure
CREATE TABLE staging_customers (
    legacy_id       VARCHAR(20),
    full_name       VARCHAR(200),
    email_addr      VARCHAR(255),
    phone1          VARCHAR(30),
    phone2          VARCHAR(30),
    addr_line1      VARCHAR(100),
    addr_line2      VARCHAR(100),
    addr_city       VARCHAR(50),
    addr_state      VARCHAR(50),
    addr_zip        VARCHAR(20),
    created_date    VARCHAR(30),  -- Text, needs parsing
    status_code     VARCHAR(5),
    raw_data        JSONB         -- Store original for debugging
);
 
-- Step 2: Bulk load raw data
COPY staging_customers 
FROM '/data/exports/legacy_customers.csv' 
WITH (FORMAT csv, HEADER true);
 
-- Step 3: Data quality checks BEFORE transformation
-- Check for duplicates
SELECT email_addr, COUNT(*) 
FROM staging_customers 
GROUP BY email_addr HAVING COUNT(*) > 1;
 
-- Check for invalid emails
SELECT legacy_id, email_addr 
FROM staging_customers 
WHERE email_addr !~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';
 
-- Check for missing required fields
SELECT COUNT(*) FROM staging_customers WHERE full_name IS NULL;
 
-- Step 4: Transformation and loading
INSERT INTO customers (
    legacy_customer_id,
    first_name,
    last_name,
    email,
    created_at,
    status
)
SELECT 
    legacy_id,
    SPLIT_PART(full_name, ' ', 1) AS first_name,
    SPLIT_PART(full_name, ' ', 2) AS last_name,
    LOWER(TRIM(email_addr)),
    TO_TIMESTAMP(created_date, 'MM/DD/YYYY'),
    CASE status_code 
        WHEN 'A' THEN 'active'
        WHEN 'I' THEN 'inactive'
        WHEN 'S' THEN 'suspended'
        ELSE 'unknown'
    END
FROM staging_customers
WHERE email_addr IS NOT NULL  -- Skip invalid records
  AND full_name IS NOT NULL;
 
-- Step 5: Load phones to separate table (denormalized in source)
INSERT INTO customer_phones (customer_id, phone_number, phone_type)
SELECT c.customer_id, s.phone1, 'primary'
FROM staging_customers s
JOIN customers c ON c.legacy_customer_id = s.legacy_id
WHERE s.phone1 IS NOT NULL
UNION ALL
SELECT c.customer_id, s.phone2, 'secondary'
FROM staging_customers s
JOIN customers c ON c.legacy_customer_id = s.legacy_id
WHERE s.phone2 IS NOT NULL;
 
-- Step 6: Verification counts
SELECT 'staging' as source, COUNT(*) FROM staging_customers
UNION ALL
SELECT 'loaded', COUNT(*) FROM customers WHERE legacy_customer_id IS NOT NULL;
 
-- Step 7: Log exceptions for manual review
INSERT INTO migration_exceptions (source_table, source_id, issue, raw_data)
SELECT 'staging_customers', legacy_id, 'missing_email', to_jsonb(s.*)
FROM staging_customers s
WHERE email_addr IS NULL;

Data Quality Reality

Comprehensive Database Testing

Testing Categories:

Database Testing Types
Test Type	What It Validates	Example Tests
Schema Tests	Structure matches specification	Table exists, columns have correct types, constraints present
Constraint Tests	Business rules enforced	FK violations rejected, CHECK constraints fire, UNIQUE enforced
Data Integrity	Data remains consistent	No orphans, aggregates match, temporal consistency
Performance Tests	Queries meet SLAs	Query response times, throughput under load, index effectiveness
Migration Tests	Data transformed correctly	Row counts match, values translated properly, no data loss
Integration Tests	Application works with database	CRUD operations succeed, transactions behave correctly

Schema Validation Tests:

schema-tests.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Schema Test Suite (PostgreSQL example)
-- Can be run via pgTAP or similar testing framework
 
-- Test 1: Required tables exist
SELECT has_table('customers', 'Table customers should exist');
SELECT has_table('orders', 'Table orders should exist');
SELECT has_table('order_lines', 'Table order_lines should exist');
 
-- Test 2: Required columns exist with correct types
SELECT has_column('customers', 'customer_id', 'customers.customer_id exists');
SELECT col_type_is('customers', 'customer_id', 'integer', 
    'customer_id is integer');
SELECT col_type_is('customers', 'email', 'character varying(255)', 
    'email is varchar(255)');
 
-- Test 3: Primary keys defined
SELECT has_pk('customers', 'customers has primary key');
SELECT has_pk('orders', 'orders has primary key');
 
-- Test 4: Foreign keys defined
SELECT has_fk('orders', 'orders has foreign key');
SELECT fk_ok('orders', 'customer_id', 'customers', 'customer_id',
    'orders.customer_id references customers.customer_id');
 
-- Test 5: Indexes exist
SELECT has_index('orders', 'idx_orders_customer', 
    'Index on orders.customer_id exists');
SELECT has_index('orders', 'idx_orders_status_date',
    'Index on orders(status, order_date) exists');
 
-- Test 6: Check constraints exist
SELECT has_check('orders', 'orders has check constraint');

Constraint Enforcement Tests:

constraint-tests.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Constraint Test Suite
-- These tests should FAIL (constraint prevents invalid data)
 
-- Test: FK prevents orphan orders
BEGIN;
DO $$
BEGIN
    INSERT INTO orders (customer_id, order_date) 
    VALUES (999999, CURRENT_DATE);
    -- Should not reach here
    RAISE EXCEPTION 'FK constraint failed to prevent orphan order';
EXCEPTION
    WHEN foreign_key_violation THEN
        RAISE NOTICE 'PASS: FK correctly prevents orphan order';
END $$;
ROLLBACK;
 
-- Test: CHECK prevents negative quantities
BEGIN;
DO $$
BEGIN
    INSERT INTO order_lines (order_id, line_number, product_id, quantity, unit_price)
    VALUES (1, 1, 1, -5, 10.00);
    RAISE EXCEPTION 'CHECK constraint failed to prevent negative quantity';
EXCEPTION
    WHEN check_violation THEN
        RAISE NOTICE 'PASS: CHECK correctly prevents negative quantity';
END $$;
ROLLBACK;
 
-- Test: UNIQUE prevents duplicate emails
BEGIN;
DO $$
BEGIN
    INSERT INTO customers (first_name, last_name, email) 
    VALUES ('Test', 'User', 'existing@example.com');
    INSERT INTO customers (first_name, last_name, email) 
    VALUES ('Test', 'Duplicate', 'existing@example.com');
    RAISE EXCEPTION 'UNIQUE constraint failed to prevent duplicate email';
EXCEPTION
    WHEN unique_violation THEN
        RAISE NOTICE 'PASS: UNIQUE correctly prevents duplicate email';
END $$;
ROLLBACK;

Automated Testing Pipeline

Performance Validation and Benchmarking

Benchmark Design:

Performance Testing Elements

•Representative data volume — Test with production-scale data or realistic synthetic data
•Realistic data distribution — Cardinalities, null ratios, and value distributions should match production
•Actual query patterns — Test the specific queries applications will execute
•Concurrent load — Simulate multiple simultaneous users/connections
•Mixed workload — Combine reads and writes in production ratios
•Stress testing — Push beyond expected load to find breaking points

performance-benchmark.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Performance Benchmark Script
 
-- Generate test data (10 million orders)
INSERT INTO orders (customer_id, order_date, status, total)
SELECT 
    (random() * 100000)::int,  -- 100K customers
    timestamp '2020-01-01' + (random() * 1460) * interval '1 day',
    (ARRAY['pending','confirmed','shipped','delivered'])[ceil(random()*4)],
    (random() * 500 + 10)::decimal(10,2)
FROM generate_series(1, 10000000);
 
-- Analyze tables after loading
ANALYZE customers;
ANALYZE orders;
ANALYZE order_lines;
 
-- Benchmark Query 1: Recent orders for customer
\timing on
 
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, array_agg(ol.product_id) as products
FROM orders o
LEFT JOIN order_lines ol ON o.order_id = ol.order_id
WHERE o.customer_id = 12345
  AND o.order_date > CURRENT_DATE - interval '1 year'
GROUP BY o.order_id
ORDER BY o.order_date DESC
LIMIT 20;
 
-- Target: < 50ms
 
-- Benchmark Query 2: Daily sales aggregate
EXPLAIN (ANALYZE, BUFFERS)
SELECT DATE(order_date), COUNT(*), SUM(total)
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY DATE(order_date)
ORDER BY 1;
 
-- Target: < 500ms
 
-- Benchmark Query 3: Product sales ranking
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    p.product_id, 
    p.name,
    SUM(ol.quantity) as units_sold,
    SUM(ol.line_total) as revenue
FROM products p
JOIN order_lines ol ON p.product_id = ol.product_id
JOIN orders o ON ol.order_id = o.order_id
WHERE o.order_date > CURRENT_DATE - interval '30 days'
GROUP BY p.product_id, p.name
ORDER BY revenue DESC
LIMIT 100;
 
-- Target: < 1000ms
 
-- Record results
CREATE TABLE IF NOT EXISTS benchmark_results (
    run_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    query_name VARCHAR(50),
    execution_time_ms DECIMAL(10,2),
    rows_returned INTEGER,
    notes TEXT
);

Interpreting EXPLAIN Output:

Warning Sign	Indicates	Action
Seq Scan on large table	Missing or unused index	Add appropriate index
High Rows Removed by Filter	Inefficient index	Consider different index columns
Sort using disk	Insufficient work_mem	Increase memory or pre-sort via index
Nested Loop with high iterations	Cartesian product risk	Verify join conditions
Bitmap Heap Scan	Index used but not covering	Consider covering index

Load Testing Tools:

pgbench — PostgreSQL built-in benchmarking
sysbench — Multi-database OLTP benchmarking
HammerDB — TPC-C/TPC-H style benchmarks
Custom scripts — Application-specific workload simulation

Production-Like Environment

Operational Documentation and Handoff

Essential Documentation:

Operational Documentation Deliverables
Document	Audience	Content
Schema Documentation	Developers, DBAs	ERD, table/column descriptions, relationships, business rules
Runbook	Operations, DBA	Common tasks, maintenance procedures, troubleshooting guides
Backup/Recovery Plan	Operations, DR team	Backup schedule, retention, recovery procedures, RTO/RPO
Monitoring Guide	Operations, SRE	Key metrics, alert thresholds, escalation procedures
Security Documentation	Security, Audit	Access controls, encryption, audit logging, compliance
Capacity Plan	Infrastructure, Management	Current utilization, growth projections, scaling triggers

Runbook Template:

database-runbook.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Database Operations Runbook
## orders_db Production Database
 
### Quick Reference
| Item | Value |
|------|-------|
| DBMS | PostgreSQL 15.2 |
| Host | prod-db-01.example.com |
| Port | 5432 |
| Primary DB | orders_production |
| Replica DBs | prod-db-02, prod-db-03 |
| DBA Contact | dba-team@example.com |
| Escalation | PagerDuty: OPS-DB-001 |
 
### Common Operations
 
#### 1. Checking Database Status
```bash
psql -h prod-db-01 -U ops_readonly -c "SELECT pg_is_in_recovery();"
# Returns 'f' for primary, 't' for replica
```
 
#### 2. Monitoring Connections
```sql
SELECT state, count(*) 
FROM pg_stat_activity 
GROUP BY state;
-- Alert if 'active' > 100 or total > 200
```
 
#### 3. Identifying Long-Running Queries
```sql
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
  AND state = 'active';
```
 
#### 4. Killing a Runaway Query
```sql
-- Graceful cancellation
SELECT pg_cancel_backend(PID);
-- Forceful termination (if cancel fails)
SELECT pg_terminate_backend(PID);
```
 
### Troubleshooting
 
#### Symptom: Slow Queries
1. Check for missing indexes: EXPLAIN ANALYZE the slow query
2. Check for table bloat: pg_stat_user_tables.n_dead_tup
3. Check for lock contention: pg_locks joined with pg_stat_activity
4. Verify statistics are current: Check pg_stat_user_tables.last_analyze
 
#### Symptom: Disk Space Alert
1. Identify largest tables: pg_total_relation_size()
2. Check for WAL bloat: pg_wal_lsn_diff()
3. Identify unused indexes: pg_stat_user_indexes.idx_scan = 0
4. Run VACUUM if dead tuples > 10% of table size
 
### Maintenance Schedule
| Task | Frequency | Window |
|------|-----------|--------|
| VACUUM ANALYZE | Daily | 02:00-04:00 UTC |
| REINDEX (large tables) | Monthly | First Sunday 01:00 UTC |
| Full backup | Daily | 00:00 UTC |
| Transaction log backup | Every 15 min | Continuous |
| Statistics refresh | Weekly | Sunday 03:00 UTC |

Knowledge Transfer Sessions

Go-Live Planning and Cutover

Cutover Planning Elements:

Go-Live Checklist

•Pre-cutover verification — All tests passed, sign-offs obtained, rollback plan ready
•Communication plan — Stakeholders notified, maintenance window announced
•Team assignments — Clear roles for each cutover step, contact list
•Success criteria — Measurable conditions that confirm successful cutover
•Rollback triggers — Specific conditions that trigger rollback decision
•Rollback procedure — Tested steps to revert if cutover fails
•Post-cutover monitoring — Enhanced monitoring period after go-live

Cutover Sequence Example:

T-24h: Final backup of source system
T-12h: Team briefing, confirm all prerequisites
T-2h:  Freeze source system (read-only or maintenance mode)
T-1h:  Final incremental data sync
T-0:   Point of no return decision
       - Execute final migration scripts
       - Update connection strings
       - Enable production traffic
T+15m: Initial verification checks
T+1h:  Extended verification, performance spot checks
T+24h: Exit hypercare, return to normal operations

Rollback Considerations:

Not all cutovers can be easily rolled back:

Scenario	Rollback Complexity
New database, no legacy	Low — just revert connection strings
Schema migration, same DBMS	Medium — run reverse migration scripts
DBMS platform change	High — need parallel systems, data sync
Data transformation with loss	Very High — may need source data reload

Always have a tested rollback plan, even if it's expensive.

Rehearse the Cutover

Summary: The Complete Design Lifecycle

Key Takeaways

•DDL is code — Version control, review, test, and document database schema scripts with the same rigor as application code.
•Deployment strategies matter — Zero-downtime deployments require expand-contract patterns and specialized tooling for large tables.
•Data migration is complex — Legacy data is never clean; budget time for cleansing, transformation, and exception handling.
•Testing must be comprehensive — Schema validation, constraint enforcement, data integrity, and performance all require explicit testing.
•Performance validation requires production-like conditions — Tests on small datasets with different hardware are meaningless.
•Operational documentation is essential — The database outlives the project team; documentation enables long-term maintenance.
•Go-live requires meticulous planning — Rehearsed procedures, clear success criteria, and tested rollback plans minimize risk.

The Complete Design Lifecycle:

We have now traversed the complete database design process:

Requirements Analysis — Understanding what the database must accomplish
Conceptual Design — Modeling the domain in abstract, technology-independent terms
Logical Design — Transforming concepts into implementable structures
Physical Design — Optimizing for real-world performance
Implementation — Building, testing, and deploying the running system

Module Complete

5 / 5