Loading learning content...
After analyzing entity types, identifying common attributes, and validating semantic relationships, the generalization process reaches its creative zenith: creating the supertype. This is the moment when abstract insight becomes concrete schema—when the observation that 'these things share a common identity' transforms into a new entity type that formally captures that identity.
Creating a supertype is more than just drawing a new rectangle in an ER diagram. It requires careful decisions about naming, identity, primary keys, inherited versus local attributes, relationship migration, and integration with the existing schema. A well-designed supertype feels natural and inevitable; a poorly designed one creates confusion and maintenance burden.
This page provides comprehensive guidance on supertype creation—from initial conception through complete schema integration. By the end, you will be able to create supertypes that are semantically meaningful, technically sound, and practically useful.
By the end of this page, you will master the complete process of supertype creation: naming conventions, primary key design, attribute assignment, relationship restructuring, constraint definition, documentation, and integration with the broader database schema.
The supertype's name is its first and most visible design decision. A well-chosen name communicates the generalization's purpose instantly; a poorly chosen name confuses users and developers for years to come.
Naming Principles:
Principle 1: Use a Natural Domain Concept
The supertype name should be a term that domain experts naturally use when speaking generally about all subtypes:
✅ Good: EMPLOYEE (for hourly, salaried, contract workers) ✅ Good: ACCOUNT (for checking, savings, money market) ✅ Good: VEHICLE (for car, truck, motorcycle)
❌ Bad: WORKER_BASE_CLASS ❌ Bad: ACCOUNT_SUPERTYPE ❌ Bad: THING_WITH_WHEELS
Principle 2: Prefer Singular Nouns
Entity names in database design conventionally use singular form:
✅ Good: PERSON, ACCOUNT, PRODUCT ❌ Bad: PERSONS, ACCOUNTS, PRODUCTS
Principle 3: Avoid Technical Jargon
The name should make sense to business users, not just database designers:
✅ Good: PARTY (standard term for customer/supplier/partner) ❌ Bad: BUSINESS_ENTITY_ABSTRACTION
Principle 4: Consider Scope and Future Growth
Choose a name broad enough to accommodate future subtypes without renaming:
✅ Good: PAYMENT_METHOD (cards, bank transfers, digital wallets, future methods) ❌ Bad: CARD_OR_BANK (too specific, excludes future payment types)
| Subtypes | Weak Names | Strong Name | Rationale |
|---|---|---|---|
| CHECKING, SAVINGS, CD | ACCOUNT_TYPES, BANK_STUFF | ACCOUNT | Natural banking term |
| PHYSICIAN, NURSE, TECHNICIAN | MEDICAL_WORKERS, CLINICAL_STAFF | HEALTHCARE_PROVIDER | Industry-standard term |
| CUSTOMER, SUPPLIER, PARTNER | BUSINESS_ENTITIES, EXTERNALS | PARTY | Standard business modeling term |
| CAR, TRUCK, MOTORCYCLE, BUS | WHEELED_THINGS, AUTOS | VEHICLE | Universal transportation term |
| LAPTOP, DESKTOP, SERVER | MACHINES, COMPUTING | COMPUTER | Clear, natural term |
A good supertype name appears in domain dictionaries, industry glossaries, or common business vocabulary. If you can't find your proposed name in relevant reference materials, and domain experts don't naturally use it, reconsider the name—or reconsider whether the generalization is valid.
The supertype's primary key is the foundation of entity identity within the hierarchy. This key must uniquely identify every instance across all subtypes and serve as the basis for referential integrity throughout the schema.
Key Design Options:
Option 1: Unified Surrogate Key
Create a new synthetic key for the supertype that replaces all subtype keys:
CREATE TABLE party (
party_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
party_type VARCHAR(20) NOT NULL, -- discriminator
name VARCHAR(200),
email VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Advantages:
Disadvantages:
Option 2: Inherited Business Key
Use an existing business key that's consistent across subtypes:
-- All subtypes already used consistent ID format
CREATE TABLE person (
person_id VARCHAR(10) PRIMARY KEY, -- format: 'P-1234567'
person_type VARCHAR(20) NOT NULL,
...
);
Advantages:
Disadvantages:
Option 3: Composite Key with Type Discriminator
Combine original keys with a type indicator to ensure uniqueness:
CREATE TABLE account (
account_type CHAR(1) NOT NULL, -- 'C'=checking, 'S'=savings
account_number VARCHAR(20) NOT NULL,
...
PRIMARY KEY (account_type, account_number)
);
Advantages:
Disadvantages:
Option 4: Prefixed/Transformed Key
Transform original keys to include type prefix:
-- Original: CUSTOMER.id=12345, SUPPLIER.id=12345 (collision!)
-- Transformed:
INSERT INTO party (party_id, ...) VALUES ('CUS-12345', ...); -- customer
INSERT INTO party (party_id, ...) VALUES ('SUP-12345', ...); -- supplier
Advantages:
Disadvantages:
If subtypes had independent ID sequences, the same ID might exist in multiple subtypes. CUSTOMER #1001 and SUPPLIER #1001 are different entities. The supertype key design MUST handle this collision—typically via type prefix, composite key, or new surrogate key.
A type discriminator (also called type indicator, subtype flag, or discriminator column) is an attribute in the supertype that identifies which subtype each instance belongs to. This is essential for determining how to interpret subtype-specific data and which business rules apply.
Discriminator Design Options:
Option 1: Single Character Code
account_type CHAR(1) CHECK (account_type IN ('C', 'S', 'M', 'D'))
-- C=Checking, S=Savings, M=Money Market, D=CD
Pros: Minimal storage, fast comparison Cons: Cryptic, requires documentation, limited to 26/36 values
Option 2: Short Descriptive Code
account_type VARCHAR(10) CHECK (account_type IN ('CHECKING', 'SAVINGS', 'MMARKET', 'CD'))
Pros: Readable, self-documenting Cons: More storage, potential for typos
Option 3: Numeric Code
account_type_id SMALLINT REFERENCES account_type_lookup(type_id)
Pros: Efficient storage, enforced via FK Cons: Requires lookup for meaning, join for labels
Option 4: Enum Type (if supported)
CREATE TYPE account_type AS ENUM ('CHECKING', 'SAVINGS', 'MONEY_MARKET', 'CD');
account_type account_type NOT NULL
Pros: Type-safe, readable, validated Cons: Database-specific, harder to modify
| Strategy | Storage | Readability | Extensibility | Query Pattern |
|---|---|---|---|---|
| CHAR(1) code | 1 byte | Poor | Limited (26/36) | WHERE type='C' |
| VARCHAR code | Variable | Good | Flexible | WHERE type='CHECKING' |
| Numeric FK | 2-4 bytes | Via join | Easy (insert row) | WHERE type_id=1 |
| ENUM | 1-2 bytes | Good | Schema change | WHERE type='CHECKING' |
| Boolean flags | N booleans | Good | Schema change | WHERE is_checking=TRUE |
Multiple Discriminators (for Overlapping Hierarchies):
When subtypes can overlap (an entity can belong to multiple subtypes simultaneously), multiple discriminators or flag columns are needed:
CREATE TABLE university_person (
person_id UUID PRIMARY KEY,
-- Multiple roles possible
is_student BOOLEAN DEFAULT FALSE,
is_faculty BOOLEAN DEFAULT FALSE,
is_staff BOOLEAN DEFAULT FALSE,
is_researcher BOOLEAN DEFAULT FALSE,
-- At least one must be true
CHECK (is_student OR is_faculty OR is_staff OR is_researcher)
);
This accommodates: a faculty member who is also a researcher, a student who is also staff (work-study), etc.
Discriminator Constraints:
The discriminator should have appropriate constraints:
-- For disjoint subtypes (exclusive)
account_type NOT NULL CHECK (account_type IN ('CHECKING', 'SAVINGS', 'CD'))
-- For overlapping subtypes
CHECK (is_student OR is_faculty OR is_staff) -- total participation
-- no check needed for which combinations allowed (overlapping)
The discriminator always belongs in the supertype table, not subtype tables. It identifies which subtype logic applies to each supertype instance, enabling polymorphic queries and appropriate join decisions.
With the supertype named, keyed, and discriminated, we now formally assign attributes to the supertype and subtypes. This step crystallizes the decisions made during common attribute analysis.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
-- Before Generalization: Separate entity definitions -- FULL_TIME_EMPLOYEE-- emp_id, first_name, last_name, email, hire_date, department,-- annual_salary, bonus_percent, vacation_days, stock_options -- PART_TIME_EMPLOYEE -- emp_id, first_name, last_name, email, hire_date, department,-- hourly_rate, max_hours_per_week -- CONTRACTOR-- emp_id, first_name, last_name, email, start_date, department,-- contract_rate, contract_end_date, agency_name -- After Generalization: Attribute assignment -- EMPLOYEE (supertype)CREATE TABLE employee ( emp_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), employee_type VARCHAR(20) NOT NULL CHECK (employee_type IN ('FULL_TIME', 'PART_TIME', 'CONTRACTOR')), -- Common attributes (inherited by all) first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE, start_date DATE NOT NULL, -- unified from hire_date/start_date department VARCHAR(100), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- FULL_TIME_EMPLOYEE (subtype) - specific attributes onlyCREATE TABLE full_time_employee ( emp_id UUID PRIMARY KEY REFERENCES employee(emp_id) ON DELETE CASCADE, annual_salary DECIMAL(12,2) NOT NULL, bonus_percent DECIMAL(5,2) DEFAULT 0, vacation_days INT DEFAULT 15, stock_options INT DEFAULT 0); -- PART_TIME_EMPLOYEE (subtype) - specific attributes onlyCREATE TABLE part_time_employee ( emp_id UUID PRIMARY KEY REFERENCES employee(emp_id) ON DELETE CASCADE, hourly_rate DECIMAL(8,2) NOT NULL, max_hours_per_week INT DEFAULT 20); -- CONTRACTOR (subtype) - specific attributes only CREATE TABLE contractor ( emp_id UUID PRIMARY KEY REFERENCES employee(emp_id) ON DELETE CASCADE, contract_rate DECIMAL(10,2) NOT NULL, contract_end_date DATE, agency_name VARCHAR(200));Notice how 'hire_date' (employees) and 'start_date' (contractors) were unified to 'start_date' in the supertype. This is a common attribute with different names that was reconciled during common attribute analysis.
Generalization affects not only attributes but also relationships. Common relationships must be migrated to the supertype, while subtype-specific relationships remain with their subtypes.
Relationship Migration Principles:
Principle 1: Move Common Relationships to Supertype
If all subtypes participate in the same relationship to another entity, that relationship should connect to the supertype:
Before:
FULL_TIME ──(works_in)──→ DEPARTMENT
PART_TIME ──(works_in)──→ DEPARTMENT
CONTRACTOR ──(works_in)──→ DEPARTMENT
After:
EMPLOYEE ──(works_in)──→ DEPARTMENT
Principle 2: Keep Subtype-Specific Relationships Local
Relationships that only apply to certain subtypes remain with those subtypes:
CONTRACTOR ──(contracted_through)──→ AGENCY
-- Only contractors have agencies; keep at subtype level
Principle 3: Foreign Key References Should Target Supertype
Other entities that previously referenced multiple subtype tables now reference the single supertype:
-- Before: Three FKs (or UNION queries)
CREATE TABLE project_assignment (
assignment_id UUID PRIMARY KEY,
worker_type VARCHAR(20),
full_time_emp_id UUID REFERENCES full_time_employee,
part_time_emp_id UUID REFERENCES part_time_employee,
contractor_id UUID REFERENCES contractor,
...
);
-- After: Single FK to supertype
CREATE TABLE project_assignment (
assignment_id UUID PRIMARY KEY,
employee_id UUID NOT NULL REFERENCES employee(emp_id), -- any employee type
...
);
Verify that 'same-named' relationships truly mean the same thing. FULL_TIME 'reports_to' MANAGER and CONTRACTOR 'reports_to' MANAGER might have different implications: employee has ongoing reporting; contractor may have project-based oversight. If semantics differ significantly, keep as separate relationships.
Constraints on the supertype establish baseline rules that apply to all subtypes. Additional constraints can be added at the subtype level for more specific requirements.
CHECK (employee_type IN ('FULL_TIME', 'PART_TIME', 'CONTRACTOR'))name VARCHAR(100) NOT NULLemail VARCHAR(255) UNIQUE (no duplicate emails regardless of employee type)CHECK (start_date <= CURRENT_DATE) (can't start in future)department_id REFERENCES department(id) (all employees belong to departments)1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
-- Comprehensive supertype constraint definition CREATE TABLE employee ( emp_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- Discriminator with valid values employee_type VARCHAR(20) NOT NULL CHECK (employee_type IN ('FULL_TIME', 'PART_TIME', 'CONTRACTOR')), -- Universal requirements (all employees have these) first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(255) NOT NULL, start_date DATE NOT NULL, -- Optional for some subtypes (nullable in supertype) department_id UUID, manager_id UUID, -- Global uniqueness CONSTRAINT unique_employee_email UNIQUE (email), -- Domain constraints applying to all CONSTRAINT valid_start_date CHECK (start_date <= CURRENT_DATE), CONSTRAINT valid_email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'), -- Referential integrity (self-referential) CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES employee(emp_id) ON DELETE SET NULL, -- Referential integrity (to other entities) CONSTRAINT fk_department FOREIGN KEY (department_id) REFERENCES department(dept_id) ON DELETE SET NULL, -- Audit columns created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP); -- Subtype-specific constraint (only full-time must have department)CREATE TABLE full_time_employee ( emp_id UUID PRIMARY KEY REFERENCES employee(emp_id) ON DELETE CASCADE, annual_salary DECIMAL(12,2) NOT NULL, -- Subtype-specific: full-time must have department -- This is enforced via trigger or application since department is in supertype CONSTRAINT valid_salary CHECK (annual_salary > 0)); -- Create trigger to enforce full-time department requirementCREATE OR REPLACE FUNCTION enforce_fulltime_department()RETURNS TRIGGER AS $$BEGIN IF NOT EXISTS ( SELECT 1 FROM employee WHERE emp_id = NEW.emp_id AND department_id IS NOT NULL ) THEN RAISE EXCEPTION 'Full-time employees must have a department'; END IF; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER trg_fulltime_departmentBEFORE INSERT OR UPDATE ON full_time_employeeFOR EACH ROW EXECUTE FUNCTION enforce_fulltime_department();When subtype constraints reference supertype attributes (like 'full-time must have department'), they often require triggers or application-level enforcement. Pure DDL constraints can only reference columns within the same table.
Creating the supertype in isolation is insufficient; it must be properly integrated into the existing database schema. This involves updating references, creating appropriate views, and establishing migration paths.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
-- Schema Integration Migration Script -- Step 1: Create the new supertype structure (shown previously)-- ... employee, full_time_employee, part_time_employee, contractor ... -- Step 2: Migrate data from legacy tablesINSERT INTO employee (emp_id, employee_type, first_name, last_name, email, start_date, department_id)SELECT gen_random_uuid(), 'FULL_TIME', first_name, last_name, email, hire_date, department_idFROM legacy_full_time_employee; -- Similarly for other subtypes... -- Step 3: Populate subtype tables with specific attributesINSERT INTO full_time_employee (emp_id, annual_salary, bonus_percent, vacation_days)SELECT e.emp_id, l.annual_salary, l.bonus_percent, l.vacation_daysFROM employee eJOIN legacy_full_time_employee l ON e.email = l.emailWHERE e.employee_type = 'FULL_TIME'; -- Step 4: Create backward-compatible viewsCREATE VIEW v_full_time_employee ASSELECT e.emp_id, e.first_name, e.last_name, e.email, e.start_date AS hire_date, -- original name e.department_id, ft.annual_salary, ft.bonus_percent, ft.vacation_days, ft.stock_optionsFROM employee eJOIN full_time_employee ft ON e.emp_id = ft.emp_idWHERE e.employee_type = 'FULL_TIME'; -- Step 5: Update foreign keys in dependent tables-- Example: project_assignment previously had worker_type + multiple FKsALTER TABLE project_assignment DROP COLUMN worker_type, DROP COLUMN full_time_emp_id, DROP COLUMN part_time_emp_id, DROP COLUMN contractor_id; ALTER TABLE project_assignment ADD COLUMN employee_id UUID REFERENCES employee(emp_id); -- Migrate assignment data... -- Step 6: Create useful indexesCREATE INDEX idx_employee_type ON employee(employee_type);CREATE INDEX idx_employee_department ON employee(department_id); CREATE INDEX idx_employee_email_lower ON employee(LOWER(email));We've covered the complete process of supertype creation—from naming through schema integration. Let's consolidate the essential practices:
What's Next:
Having mastered supertype creation, our final topic examines the comparison with specialization—understanding how generalization and specialization relate as inverse operations, when to use each approach, and how they converge to produce the same hierarchical structures from different starting points.
You now have comprehensive knowledge of supertype creation. You can name supertypes appropriately, design identity mechanisms, structure attributes and relationships correctly, define appropriate constraints, and integrate the supertype into an existing schema. Next, we'll compare generalization with its inverse operation: specialization.