Specialization Mapping - Learning Module

Loading content...

0/252

Trade-offs Analysis: Comprehensive Comparison and Selection Framework

The Art of Trade-off Analysis

Throughout this module, we've explored Single Table Inheritance (STI), Table-Per-Type (TPT), Table-Per-Concrete-Class (TPC), and Hybrid approaches. Each has strengths and weaknesses. The key to expert database design is understanding trade-offs quantitatively and selecting based on specific requirements.

This page consolidates everything into a comprehensive decision framework that you can apply to any specialization mapping challenge.

We'll analyze trade-offs across eight critical dimensions:

Query Performance
Storage Efficiency
Data Integrity
Schema Evolution
Query Complexity
Null Handling
Relationship Support
ORM Compatibility

What You Will Learn

By the end of this page, you will have a quantitative understanding of all trade-offs, a systematic decision framework for real-world selection, and the confidence to justify your mapping choices to stakeholders with concrete reasoning.

Query Performance Analysis

Performance is often the primary concern. Let's analyze each approach's performance characteristics for different query types.

Query Type Matrix:

Query Performance Comparison (relative to optimal)
Query Type	STI	TPT	TPC	Winner
Hierarchy-wide scan	★★★★★ Single table scan	★★★☆☆ N-way JOIN/UNION	★★☆☆☆ Requires UNION ALL	STI
Single subtype query	★★★★☆ With discriminator filter	★★★★★ Direct table access	★★★★★ Direct table access	TPT/TPC
Insert single entity	★★★★★ One INSERT	★★★☆☆ 2+ INSERTs	★★★★★ One INSERT	STI/TPC
Update shared attrs	★★★★★ Single UPDATE	★★★★★ Single UPDATE	★★★☆☆ Update per concrete table	STI/TPT
Update subtype attrs	★★★★★ Single UPDATE	★★★★★ Single UPDATE	★★★★★ Single UPDATE	Tie
Polymorphic join	★★★★★ No join needed	★★★☆☆ Requires JOIN	★★☆☆☆ Requires UNION+JOIN	STI
Type-specific join	★★★☆☆ Full scan with filter	★★★★★ Direct FK join	★★★★★ Direct FK join	TPT/TPC

Quantitative Performance Model:

Let's define variables:

n = total number of entities across all subtypes
k = number of subtypes
s = average subtype-specific attributes
p = proportion of entities per subtype (1/k if uniform)

Read Performance (single entity by ID):

performance_model.txt
READ PERFORMANCE BY ID:
 
STI:  O(1) - Direct index lookup on single table
      Cost: 1 index seek + 1 row fetch
 
TPT:  O(k) - Join k tables by shared key (but often just 2: super + sub)
      Cost: 2 index seeks + 2 row fetches (with known type)
      Cost: k+1 index seeks if type unknown (LEFT JOIN all)
 
TPC:  O(k) - Unknown type requires checking each concrete table
      Cost: 1 index seek if type known
      Cost: k index seeks if type unknown (UNION approach)
 
 
READ PERFORMANCE (type-specific batch):
 
STI:  O(n) scan with selectivity 1/k
      Processes n rows, filters to n/k
 
TPT:  O(n/k) - Direct scan of subtype table
      Only processes n/k rows
 
TPC:  O(n/k) - Direct scan of concrete table
      Only processes n/k rows
 
 
READ PERFORMANCE (hierarchy-wide batch):
 
STI:  O(n) - Single table scan
      Best case - no joins
 
TPT:  O(n * k) worst case, O(n) with good optimizer
      UNION approach: k queries of n/k each = O(n)
      LEFT JOIN: potentially O(n * k) intermediate result
 
TPC:  O(n) - UNION ALL k tables
      k separate scans, but well parallelizable

Write Performance:

write_performance.txt
INSERT PERFORMANCE:
 
STI:  1 INSERT statement
      - Acquires 1 lock
      - Updates 1 set of indexes
      - Simple and fast
 
TPT:  2+ INSERT statements (1 per table in hierarchy)
      - Requires transaction
      - Acquires multiple locks
      - Updates multiple index sets
      - 2-3x slower than STI typically
 
TPC:  1 INSERT statement (into concrete table)
      - Similar to STI
      - Single lock, single index update
 
 
UPDATE PERFORMANCE (shared attribute):
 
STI:  UPDATE 1 row in 1 table
      Single index update
 
TPT:  UPDATE 1 row in supertype table
      Single index update (same as STI)
 
TPC:  UPDATE 1 row (but attribute replicated if multiple concrete)
      Worse if updating across types (change in all tables)
 
 
DELETE PERFORMANCE:
 
STI:  DELETE 1 row
      Simple
 
TPT:  DELETE cascades through FK relationships
      Multiple row deletes, but handled by DB
 
TPC:  DELETE 1 row from concrete table
      Simple (like STI)

Performance Rule of Thumb

If >70% of queries are hierarchy-wide, STI typically wins on performance. If >70% of queries are subtype-specific, TPT or TPC typically wins. Mixed workloads require hybrid approaches or careful benchmarking with realistic data.

Storage Efficiency Analysis

Storage efficiency affects not just disk costs but also memory usage, cache efficiency, and I/O performance.

Storage Components:

storage_model.txt
STORAGE MODEL VARIABLES:
- a_shared = bytes for shared attributes
- a_sub[i] = bytes for subtype i specific attributes
- n_sub[i] = count of entities of subtype i
- n_total = sum of all n_sub[i]
- k = number of subtypes
- overhead_pk = bytes for primary key
- overhead_fk = bytes for foreign key
- overhead_null = bytes per nullable column (varies by DB)
 
STI STORAGE:
  Per row: a_shared + sum(a_sub[i]) + overhead_null * (k-1 subtypes worth)
  Total: n_total * (a_shared + sum(a_sub) + null_overhead)
  
  Example: 10 shared + (5+5+5) subtype = 25 data bytes
           + NULL bitmap for ~10 nullable cols ≈ 2 bytes
           Total: ~27 bytes/row × n_total
 
 
TPT STORAGE:
  Supertype row: a_shared + overhead_pk
  Subtype row: a_sub[i] + overhead_pk (which is also FK)
  
  Per entity: a_shared + a_sub[i] + 2 * overhead_pk
  Total: n_total * (a_shared + avg(a_sub) + 2*pk_size)
  
  Example: 10 shared + 5 subtype avg = 15 data bytes
           + 2 × 4 byte integer PKs = 8 bytes
           Total: ~23 bytes/entity × n_total
 
 
TPC STORAGE:
  Per row: a_shared + a_sub[i] + overhead_pk
  Total: sum(n_sub[i] * (a_shared + a_sub[i] + pk_size))
  
  Example: (10 shared + 5 subtype + 4 pk) × entities per type
           Total: ~19 bytes/row × n_total
           BUT: shared attribute changes require updates to all tables

Storage Comparison (1 Million Entities, 3 Subtypes)
Approach	Data Storage	Index Storage	Total	Relative
STI (wide table)	~27 MB	~15 MB (1 table)	~42 MB	1.0x
TPT (normalized)	~23 MB	~25 MB (4 tables)	~48 MB	1.14x
TPC (concrete)	~19 MB	~20 MB (3 tables)	~39 MB	0.93x

Cache Efficiency:

Beyond raw storage, consider cache/buffer pool efficiency:

•STI: Wider rows mean fewer rows per page; subtype-specific scans read unnecessary columns
•TPT: Narrower tables pack more efficiently; but joining requires pages from multiple tables
•TPC: Optimal per-subtype density; but shared-attribute queries need all tables in cache

Storage Verdict

Storage differences are often modest (within 20%). TPC is most compact. STI wastes space on NULLs but has predictable row sizes. TPT adds key overhead but eliminates NULL waste. Choose based on query patterns rather than storage alone.

Data Integrity Analysis

Data integrity—the database's ability to enforce correctness constraints—varies dramatically between approaches.

Integrity Constraint Support
Constraint Type	STI	TPT	TPC
NOT NULL on subtype attr	❌ Cannot enforce (other types need NULL)	✅ Native NOT NULL	✅ Native NOT NULL
CHECK on subtype attr	⚠️ Complex (discriminator + CHECK)	✅ Simple CHECK	✅ Simple CHECK
UNIQUE within subtype	✅ Partial UNIQUE index	✅ Native UNIQUE	✅ Native UNIQUE
UNIQUE across hierarchy	✅ Native UNIQUE	⚠️ Requires trigger/logic	❌ Cannot enforce natively
FK TO specific subtype	❌ Impossible	✅ Native FK	✅ Native FK
FK FROM any subtype	✅ Native FK	✅ Native FK	⚠️ Complex polymorphic ref
Disjoint constraint	✅ Via discriminator	⚠️ Requires trigger	✅ Implicit (separate tables)
Total participation	✅ CHECK NOT NULL on discriminator	⚠️ Requires deferred trigger	⚠️ Requires registry table

Integrity Deep Dive:

1. Subtype-Specific NOT NULL:

integrity_examples.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- SCENARIO: Student.gpa must be NOT NULL for students
 
-- STI: CANNOT use NOT NULL directly (other types need NULL there)
-- Must use CHECK constraint:
ALTER TABLE person ADD CONSTRAINT chk_student_gpa_required
CHECK (person_type != 'STUDENT' OR gpa IS NOT NULL);
-- Problem: More complex, easier to miss in code review
 
-- TPT: Simple NOT NULL works
CREATE TABLE student (
    person_id INTEGER PRIMARY KEY,
    gpa DECIMAL(3,2) NOT NULL  -- Simple, clear, DB-enforced
);
 
-- TPC: Simple NOT NULL works
CREATE TABLE student (
    person_id INTEGER PRIMARY KEY,
    gpa DECIMAL(3,2) NOT NULL  -- Same as TPT
);

2. Foreign Keys TO Subtypes:

fk_to_subtype.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- SCENARIO: Only students can enroll in courses
 
-- STI: IMPOSSIBLE with native FK
-- Must use trigger:
CREATE TABLE enrollment (
    course_id INTEGER REFERENCES course(course_id),
    person_id INTEGER REFERENCES person(person_id) -- Can't restrict to students!
);
-- Then add trigger to validate person_type = 'STUDENT'
-- Risk: Trigger can be disabled, bypassed, or have bugs
 
-- TPT: Native FK works perfectly
CREATE TABLE enrollment (
    course_id INTEGER REFERENCES course(course_id),
    student_id INTEGER REFERENCES student(person_id) -- Only valid students!
);
-- Database enforces this automatically - bulletproof
 
-- TPC: Same as TPT - native FK
CREATE TABLE enrollment (
    course_id INTEGER REFERENCES course(course_id),
    student_id INTEGER REFERENCES student(person_id)
);

Integrity Critical Path

If your domain requires FKs to specific subtypes, TPT or TPC is mandatory. STI cannot provide this guarantee at the database level. For financial, healthcare, or compliance-heavy systems, this is often the deciding factor.

Schema Evolution Analysis

Schemas change over time. The ability to evolve without major migrations is crucial for long-lived systems.

Schema Evolution Scenarios
Change Type	STI Impact	TPT Impact	TPC Impact
Add shared attribute	✅ ADD COLUMN (simple)	✅ ADD COLUMN to supertype	❌ ADD COLUMN to each concrete table
Add subtype attribute	✅ ADD COLUMN (NULLable)	✅ ADD COLUMN to subtype table	✅ ADD COLUMN to specific table
Add new subtype	⚠️ Many ADD COLUMNs + CHECK update	✅ CREATE TABLE + FK	✅ CREATE TABLE
Remove subtype	⚠️ DROP COLUMN × subtype attrs	✅ DROP TABLE	✅ DROP TABLE
Move attr to shared	⚠️ Complex data migration	✅ ALTER + migrate + DROP	❌ Complex: multiple tables
Move attr to subtype	⚠️ Complex (NULLs)	✅ ALTER + migrate	✅ ALTER (if in correct table)
Change subtype relationship	Same column	⚠️ Requires FK updates	⚠️ Requires FK updates

Evolution Scenario: Adding a New Subtype

add_subtype.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- SCENARIO: Add "Alumni" as new Person subtype with 5 attributes
 
-- ═══════════════════════════════════════════════════════════════
-- STI: Modify existing table
-- ═══════════════════════════════════════════════════════════════
-- Step 1: Add new columns (will be NULL for existing rows)
ALTER TABLE person ADD COLUMN alumni_id VARCHAR(20);
ALTER TABLE person ADD COLUMN graduation_date DATE;
ALTER TABLE person ADD COLUMN degree_earned VARCHAR(100);
ALTER TABLE person ADD COLUMN employer VARCHAR(100);
ALTER TABLE person ADD COLUMN donation_history JSONB;
 
-- Step 2: Update discriminator CHECK
ALTER TABLE person DROP CONSTRAINT chk_valid_person_type;
ALTER TABLE person ADD CONSTRAINT chk_valid_person_type
    CHECK (person_type IN ('STUDENT', 'FACULTY', 'STAFF', 'ALUMNI'));
 
-- Step 3: Add CHECK for new subtype's required fields
ALTER TABLE person ADD CONSTRAINT chk_alumni_attrs
    CHECK (person_type != 'ALUMNI' OR 
           (alumni_id IS NOT NULL AND graduation_date IS NOT NULL));
 
-- Impact: Table now has 5 more columns affecting all rows
 
 
-- ═══════════════════════════════════════════════════════════════
-- TPT: Create new table
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE alumni (
    person_id       INTEGER PRIMARY KEY REFERENCES person(person_id),
    alumni_id       VARCHAR(20) NOT NULL UNIQUE,
    graduation_date DATE NOT NULL,
    degree_earned   VARCHAR(100) NOT NULL,
    employer        VARCHAR(100),
    donation_history JSONB
);
 
-- Optional: Update discriminator in supertype
ALTER TABLE person DROP CONSTRAINT chk_person_type;
ALTER TABLE person ADD CONSTRAINT chk_person_type
    CHECK (person_type IN ('STUDENT', 'FACULTY', 'STAFF', 'ALUMNI')
           OR person_type IS NULL);
 
-- Impact: New table only; existing tables unchanged
 
 
-- ═══════════════════════════════════════════════════════════════
-- TPC: Create new table
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE alumni (
    person_id       SERIAL PRIMARY KEY,  -- Or managed externally
    -- All shared attributes replicated
    name            VARCHAR(100) NOT NULL,
    email           VARCHAR(100) NOT NULL UNIQUE,
    date_of_birth   DATE NOT NULL,
    address         TEXT,
    phone           VARCHAR(20),
    -- Alumni-specific
    alumni_id       VARCHAR(20) NOT NULL UNIQUE,
    graduation_date DATE NOT NULL,
    degree_earned   VARCHAR(100) NOT NULL,
    employer        VARCHAR(100),
    donation_history JSONB
);
 
-- Impact: New table; but UNION queries must be updated!

Evolution Recommendation

TPT is most evolution-friendly. Adding subtypes creates new tables without touching existing ones. STI requires adding nullable columns that affect all rows. TPC requires updating all UNION queries. If your hierarchy will evolve significantly, favor TPT.

Comprehensive Decision Framework

Based on our analysis, here's a systematic decision framework for selecting the appropriate mapping approach.

Step 1: Score Your Requirements

Rate each dimension from 1-5 based on importance:

requirements_scorecard.txt
REQUIREMENTS SCORECARD:
 
┌────────────────────────────────────────────────────────────────┐
│ Dimension                        │ Your Score │ Weight Factor │
│────────────────────────────────────────────────────────────────│
│ Hierarchy-wide query frequency   │    ___     │    × 1.5     │
│ Subtype-specific query frequency │    ___     │    × 1.5     │
│ Insert/Update frequency          │    ___     │    × 1.0     │
│ Data integrity importance        │    ___     │    × 2.0     │
│ FK to subtypes needed            │    ___     │    × 2.5     │
│ Storage efficiency importance    │    ___     │    × 0.5     │
│ Schema evolution likelihood      │    ___     │    × 1.5     │
│ Query simplicity importance      │    ___     │    × 1.0     │
│ Overlapping subtypes needed      │    ___     │    × 2.0     │
└────────────────────────────────────────────────────────────────┘
 
SCORING GUIDE:
1 = Not important / Never occurs
3 = Moderately important / Sometimes occurs
5 = Critical / Always required

Step 2: Apply Decision Rules

decision_rules.txt
DECISION RULES (in priority order):
 
RULE 1: FK to Subtypes
IF "FK to subtypes needed" score ≥ 4
THEN eliminate STI (must use TPT or TPC)
 
RULE 2: Overlapping Subtypes  
IF "Overlapping subtypes needed" score ≥ 4
THEN eliminate TPC, strongly prefer TPT
 
RULE 3: Query Pattern Dominance
IF "Hierarchy-wide queries" score - "Subtype queries" score ≥ 2
THEN favor STI
IF "Subtype queries" score - "Hierarchy-wide queries" score ≥ 2
THEN favor TPT or TPC
 
RULE 4: Integrity Requirements
IF "Data integrity importance" score ≥ 4
THEN favor TPT over STI
 
RULE 5: Evolution Requirements  
IF "Schema evolution likelihood" score ≥ 4
THEN favor TPT over STI and TPC
 
RULE 6: Simplicity Requirements
IF "Query simplicity importance" score ≥ 4 AND no blocking rules above
THEN favor STI

Step 3: Calculate Weighted Scores

Approach Scoring Matrix
Dimension (Weight)	STI Score	TPT Score	TPC Score
Hierarchy queries (1.5)	+5	+2	+1
Subtype queries (1.5)	+2	+5	+5
Insert/Update (1.0)	+5	+3	+4
Data integrity (2.0)	+2	+5	+4
FK to subtypes (2.5)	+0	+5	+5
Storage (0.5)	+3	+4	+5
Evolution (1.5)	+3	+5	+2
Query simplicity (1.0)	+5	+3	+4
Overlapping (2.0)	+2	+5	+1

Multiply your importance score by the approach score, sum each column, and the highest total indicates the best fit.

Step 4: Consider Hybrid

If:

No single approach scores ≥70% of maximum possible
Different hierarchy sections favor different approaches
You answered 4-5 on both hierarchy-wide AND subtype-specific queries

→ Evaluate a Hybrid approach using the techniques from Page 4.

Common Scenarios and Recommendations

Here are pre-analyzed recommendations for common database scenarios:

Product Catalog:

Hierarchy: Product → PhysicalProduct, DigitalProduct, SubscriptionProduct
Query pattern: Heavy catalog search (hierarchy-wide), distinct checkout flows (subtype-specific)
Relationships: Inventory FK to PhysicalProduct, Downloads FK to DigitalProduct

Recommendation: Hybrid (TPT top, STI within similar subtypes)

TPT separates fundamentally different product categories
STI within PhysicalProducts if Books, Electronics, Clothing are similar
Enables proper FK constraints for inventory, downloads, subscriptions

Anti-Patterns to Avoid

Learn from common mistakes in specialization mapping:

Mapping Anti-Patterns

•Anti-Pattern 1: STI with 50+ Columns — If your single table has more than ~20 subtype-specific columns total, STI is probably wrong. You're sacrificing too much integrity and storage for query simplicity.
•Anti-Pattern 2: TPC Without Registry — Using TPC but needing to query 'all entities' frequently without a registry table means slow UNION ALL queries. Add a lightweight supertype table for lookups.
•Anti-Pattern 3: TPT for 2 Similar Subtypes — Creating separate tables for two subtypes that share 90% attributes adds unnecessary join complexity. Use STI or a shared table with flag.
•Anti-Pattern 4: Ignoring FK Requirements — Choosing STI and then using triggers to enforce subtype FKs. Triggers are slower, bypassable, and complex. If you need subtype FKs, use TPT.
•Anti-Pattern 5: Deep TPT Without Views — Using 4+ level TPT inheritance without creating view abstractions. Queries become unwieldy 4-way joins. Always create views for common access patterns.
•Anti-Pattern 6: ORM Default Without Analysis — Accepting your ORM's default (often STI for Rails, TPT for .NET) without analyzing your specific requirements. ORMs have defaults, not universal solutions.

The Worst Anti-Pattern

The single worst mistake: choosing an approach based on what you're familiar with rather than what your data requires. Every hierarchy is different. Analyze each one fresh using the decision framework.

Summary: Mastering Specialization Mapping

You've now completed a comprehensive study of specialization mapping—from fundamental concepts to advanced trade-off analysis.

Module Key Takeaways

•Three Core Approaches: STI (single table), TPT (table-per-type), TPC (table-per-concrete) represent fundamentally different trade-offs between query simplicity and data integrity.
•STI Excels At: Hierarchy-wide queries, simple writes, query simplicity—but sacrifices integrity and wastes storage on NULLs.
•TPT Excels At: Data integrity, subtype-specific FKs, overlapping hierarchies, evolution—but requires JOINs and more complex writes.
•TPC Excels At: Subtype independence, storage efficiency, no JOINs—but struggles with hierarchy-wide queries and shared attribute changes.
•Hybrid Approaches: Real-world hierarchies often require combining approaches strategically based on subsection characteristics.
•Decision Framework: Systematically evaluate query patterns, integrity requirements, evolution likelihood, and relationship needs.
•No Universal Best: The optimal choice depends entirely on your specific requirements. Analyze each hierarchy independently.

Quick Reference: When to Use Each Approach
Approach	Best For	Avoid When
STI	Small similar subtypes, hierarchy-wide queries, ORM simplicity	Many unique attrs, need subtype FKs, overlapping types
TPT	Distinct subtypes, integrity critical, need subtype FKs, evolution expected	Performance-critical hierarchy-wide queries
TPC	Independent subtypes, rarely queried together, storage efficiency	Need shared relationships, overlapping types, frequent schema changes
Hybrid	Complex real-world hierarchies with mixed characteristics	Simple hierarchies where pure approach works well

Module Complete

Congratulations! You've mastered specialization mapping—one of the most complex decisions in database design. You can now analyze any hierarchy systematically, select the optimal approach with confidence, and justify your decisions with concrete reasoning. This knowledge will serve you throughout your career as you design schemas for complex real-world domains.