Generalization - Learning Module

Loading content...

0/241

Common Attributes

The Building Blocks of Generalization

If generalization is the process of recognizing that diverse entities share a common identity, then common attributes are the evidence that proves it. Every valid generalization rests on a foundation of shared characteristics—attributes that appear in all (or most) subtypes with the same essential meaning.

But identifying common attributes is not as simple as matching column names. Real-world entities evolved independently, often in different departments, different systems, or different eras. What is 'customerName' in one system is 'clientFullName' in another and 'account_holder' in a third. Data types vary. Constraints differ. Optional in one place, required in another.

The skilled database designer must navigate this complexity, transforming a fragmented landscape of inconsistent attributes into a coherent, unified set of inherited properties. This is the art and science of common attribute analysis—the critical step that determines whether a generalization is meaningful and maintainable.

In this page, we'll master the techniques for identifying, reconciling, and properly placing common attributes in a generalization hierarchy.

What You Will Learn

By the end of this page, you will be able to systematically identify common attributes across entity types, resolve naming conflicts and synonyms, harmonize data types and constraints, decide which attributes belong in the supertype versus subtypes, and document your attribute unification decisions for long-term maintainability.

What Makes an Attribute 'Common'

Before we can identify common attributes, we must precisely define what 'common' means in the context of generalization. Superficial similarity is not enough—true commonality requires semantic equivalence.

Definition:

An attribute is common across a set of entity types E₁, E₂, ..., Eₙ if and only if:

It appears in all (or a significant majority of) the entity types

It serves the same semantic purpose in each entity type

It describes a property of the shared supertype concept, not a subtype-specific characteristic

The Three Dimensions of Commonality:

Structural Commonality

•Name Match — The attribute has identical or clearly synonymous names across entities (e.g., 'email' in CUSTOMER and EMPLOYEE)
•Type Compatibility — The data types are identical or can be unified without loss of information (e.g., VARCHAR(50) and VARCHAR(100) can unify to VARCHAR(100))
•Domain Overlap — The valid values share significant overlap (e.g., both store email addresses following the same format)

Semantic Commonality

•Same Concept — The attribute describes the same real-world property in each entity (e.g., 'birthDate' means date of birth, not date of creation)
•Same Granularity — The level of detail is equivalent (e.g., both store full address vs. just city)
•Same Purpose — The attribute serves the same business function (e.g., both 'phone' attributes are for contact, not audit)

Behavioral Commonality

•Same Constraints — Similar validation rules apply (or can be harmonized) across entities
•Same Update Patterns — The attribute is updated under similar circumstances and by similar processes
•Same Query Usage — The attribute is queried and reported on in similar contexts

Beware False Positives

Same-named attributes are not automatically common. 'status' in ORDER (values: pending, shipped, delivered) and 'status' in EMPLOYEE (values: active, on-leave, terminated) are completely different concepts that happen to share a name. Analyzing only structure without semantics leads to invalid generalizations.

Systematic Attribute Analysis Process

A rigorous attribute analysis follows a structured process. This ensures that all potential common attributes are identified and properly evaluated.

Step-by-Step Attribute Analysis

•Create Attribute Inventory — List all attributes from all candidate subtype entities. Include name, data type, nullability, constraints, and business description for each.
•Perform Name Clustering — Group attributes by identical names. These are the obvious candidates for commonality (though not guaranteed).
•Identify Synonyms — For unmatched attributes, look for semantic equivalents. Use domain knowledge and business glossaries to identify synonymous terms.
•Validate Semantics — For each cluster of same-named or synonymous attributes, verify they represent the same concept. Consult domain experts if uncertain.
•Assess Presence Patterns — Determine whether each candidate common attribute is present in all subtypes, most subtypes, or only some subtypes.
•Evaluate Type Compatibility — For validated common attributes, compare data types and determine if unification is possible and appropriate.
•Harmonize Constraints — Review constraints (NOT NULL, CHECK, UNIQUE) and determine appropriate supertype constraints.
•Document Decisions — Record which attributes will move to the supertype, which remain in subtypes, and the rationale for borderline cases.

Attribute Analysis Template Example
Attribute Name	CUSTOMER	SUPPLIER	EMPLOYEE	Semantic	Decision
name/companyName	name VARCHAR(100)	companyName VARCHAR(150)	employeeName VARCHAR(80)	All represent entity name	→ Supertype: name VARCHAR(150)
email	email VARCHAR(255)	email VARCHAR(255)	workEmail VARCHAR(255)	All for primary contact	→ Supertype: email VARCHAR(255)
phone	phone VARCHAR(20)	phone VARCHAR(20)	phone VARCHAR(20)	Primary contact phone	→ Supertype: phone VARCHAR(20)
taxId	taxId VARCHAR(15)	taxId VARCHAR(15)	— (absent)	Tax identification	→ Supertype: taxId VARCHAR(15) NULL
creditLimit	creditLimit DECIMAL	— (absent)	— (absent)	Customer-specific only	→ Stays in CUSTOMER
supplyCategories	— (absent)	categories TEXT[]	— (absent)	Supplier-specific	→ Stays in SUPPLIER

The 'All-Most-Some' Framework

Categorize each attribute: ALL (present in every subtype) → definitely supertype; MOST (present in majority) → likely supertype with nullable; SOME (present in minority) → likely stays in specific subtypes. This framework provides clear decision criteria.

Resolving Naming Conflicts and Synonyms

Real-world systems rarely use consistent naming. When generalizing entities from different sources, naming conflicts are inevitable. The database designer must systematically identify and resolve these conflicts.

Types of Naming Conflicts:

Type 1: Synonyms (Different Names, Same Concept)

Different terms refer to the same underlying attribute:

'clientId' / 'customerId' / 'accountHolderId' → all mean customer identifier
'effectiveDate' / 'startDate' / 'validFrom' → all mean when something becomes active
'amt' / 'amount' / 'value' → all mean monetary value

Resolution Strategy: Choose the most descriptive, standards-compliant name. Prefer full words over abbreviations. Document the renamed attributes and original names.

Type 2: Homonyms (Same Name, Different Concepts)

Identical terms refer to different underlying concepts:

'status' in ORDER (fulfillment state) vs. 'status' in USER (active/inactive)
'date' in INVOICE (invoice date) vs. 'date' in EMPLOYEE (hire date)
'type' in ACCOUNT (checking/savings) vs. 'type' in CUSTOMER (individual/corporate)

Resolution Strategy: These are not common attributes despite sharing names. Rename them with specific prefixes (orderStatus, userStatus) to clarify distinction.

Type 3: Format Variations (Same Concept, Different Formatting)

Same concept with different naming conventions:

'first_name' / 'firstName' / 'FirstName' / 'FIRST_NAME'
'date_of_birth' / 'dateOfBirth' / 'birthDate' / 'DOB'

Resolution Strategy: Standardize to your project's naming convention. Typically, choose snake_case or camelCase consistently and rename all instances.

Converting Mermaid diagram...

Name Unification Best Practices

•Prefer Domain Terms — Choose names that domain experts use. 'accountHolder' may be technically precise, but if everyone says 'customer', use 'customer'.
•Favor Clarity Over Brevity — 'emailAddress' is better than 'email' if it prevents confusion with 'emailContent'. But don't over-qualify: 'personEmailAddress' is excessive if 'email' is unambiguous.
•Apply Consistent Conventions — Use the same naming pattern throughout: snake_case, camelCase, or PascalCase. Never mix within a schema.
•Document Origins — Maintain a mapping table that records original attribute names and their unified name. This is invaluable for data migration and historical queries.
•Consider Future Subtypes — Choose names that will remain appropriate if new subtypes are added. 'email' is better than 'customerEmail' if employees and suppliers will also use it.

Data Type Harmonization

When common attributes have different data types across subtypes, the database designer must harmonize them into a single type for the supertype. This requires careful analysis to avoid data loss or constraint violations.

Principles of Type Harmonization:

Principle 1: No Data Loss

The unified type must be able to store all valid values from all subtypes without truncation or loss of precision.

CUSTOMER.balance   DECIMAL(10,2)   // up to 99,999,999.99
SUPPLIER.balance   DECIMAL(8,2)    // up to 999,999.99
↓
PARTY.balance      DECIMAL(10,2)   // uses larger precision

Principle 2: Semantic Preservation

The unified type must preserve the meaning of the data. Widening a type is usually safe; narrowing is dangerous.

// Safe: widening
VARCHAR(50) + VARCHAR(100) → VARCHAR(100)

// Dangerous: narrowing
VARCHAR(100) → VARCHAR(50)  // potential truncation!

Principle 3: Type Compatibility

Some type combinations cannot be meaningfully unified:

// Incompatible: semantic mismatch
INTEGER (customer count) + DECIMAL(10,2) (account balance)
// These are different concepts, not type variations!

Common Type Harmonization Patterns
Source Types	Unified Type	Rationale	Considerations
VARCHAR(50), VARCHAR(100)	VARCHAR(100)	Take maximum length	Check if max is sufficient for all subtypes
CHAR(10), VARCHAR(20)	VARCHAR(20)	VARCHAR is more flexible	Fixed-length data may pad inconsistently
INT, BIGINT	BIGINT	Take larger range	Consider storage implications
DECIMAL(8,2), DECIMAL(10,4)	DECIMAL(10,4)	Take larger precision+scale	Monetary calculations need consistent scale
DATE, TIMESTAMP	TIMESTAMP	TIMESTAMP includes DATE	May need to handle time zones
BOOLEAN, CHAR(1) 'Y'/'N'	BOOLEAN	Normalize to true boolean	Requires data migration for char column
TEXT, VARCHAR(MAX)	TEXT	Equivalent in most DBMS	Check specific DBMS behavior

Watch for Semantic Incompatibility

If you find yourself trying to unify INTEGER and VARCHAR, or DATE and DECIMAL, stop. These aren't the same attribute—they're different concepts with coincidentally similar names. Semantic incompatibility indicates the attributes shouldn't be unified.

type-harmonization-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Example: Harmonizing identifier types before generalization
 
-- Before: Three entities with different ID types
-- CUSTOMER.customer_id    INT AUTO_INCREMENT
-- SUPPLIER.supplier_id    VARCHAR(20) -- legacy codes like 'SUP-001'
-- EMPLOYEE.emp_id        BIGINT
 
-- Decision: Use VARCHAR(36) to accommodate all patterns
-- Plus UUID for new records going forward
 
-- Migration approach:
ALTER TABLE party ADD COLUMN party_id VARCHAR(36) PRIMARY KEY;
 
-- Migrate existing IDs with type prefix for uniqueness
UPDATE party p
SET party_id = CASE
    WHEN p.source = 'customer' THEN CONCAT('CUS-', p.legacy_customer_id)
    WHEN p.source = 'supplier' THEN p.legacy_supplier_id  -- already string
    WHEN p.source = 'employee' THEN CONCAT('EMP-', p.legacy_emp_id)
END;
 
-- New records use UUID
ALTER TABLE party 
ALTER COLUMN party_id SET DEFAULT gen_random_uuid();

Constraint Reconciliation

Attributes in different subtypes often have different constraints. When moving attributes to a supertype, these constraints must be reconciled—typically by taking the least restrictive constraint for the supertype, while preserving stricter constraints at the subtype level.

Nullability Constraints:

The most common constraint conflict is nullability:

CUSTOMER.tax_id    NOT NULL   (all customers must have tax ID)
SUPPLIER.tax_id    NOT NULL   (all suppliers must have tax ID)
EMPLOYEE.tax_id    NULL       (contractors may not have tax ID initially)

Resolution: The supertype must use NULL (the least restrictive). Stricter constraints are enforced at the subtype level:

-- Supertype definition
CREATE TABLE party (
    party_id VARCHAR(36) PRIMARY KEY,
    tax_id VARCHAR(20) NULL,  -- nullable in supertype
    ...
);

-- Subtype-level constraint for customers
ALTER TABLE customer 
    ADD CONSTRAINT customer_tax_id_required 
    CHECK (tax_id IS NOT NULL);

-- Subtype-level constraint for suppliers  
ALTER TABLE supplier
    ADD CONSTRAINT supplier_tax_id_required
    CHECK (tax_id IS NOT NULL);

-- Employee has no such constraint (nullable OK)

Constraint Reconciliation Rules
Constraint Type	Reconciliation Strategy	Example
NOT NULL	Use NULL if any subtype allows NULL	Supertype NULL, strict subtypes add CHECK
UNIQUE	Unique only if unique across ALL subtypes combined	Email unique across all parties, not just within each
CHECK range	Use union of all ranges	CHECK amount > 0 → valid if any subtype allows 0
DEFAULT	Omit default in supertype if it varies	No default; each subtype/app sets appropriate default
FOREIGN KEY	Reference same table only if all do	If all FK to country, move FK to supertype
Length limits	Use maximum length	VARCHAR(100) encompasses VARCHAR(50) and VARCHAR(100)

The Widening Principle

Supertype constraints should always be 'wider' (less restrictive) than or equal to any subtype constraint. Think of it as: the supertype defines what's possible, subtypes define what's required for their specific cases.

Unique Constraint Considerations:

Uniqueness across subtypes requires special attention:

Scenario 1: Attribute is unique within each subtype

CUSTOMER.email is unique among customers
SUPPLIER.email is unique among suppliers

Question: Should email be unique in the supertype PARTY?

Answer: It depends on business rules. If the same email can belong to a customer AND a supplier (same person in both roles), then email is NOT unique in PARTY. If email must be globally unique (one person = one party record), then email IS unique in PARTY.

Scenario 2: Attribute is unique in some subtypes only

CUSTOMER.taxId is unique (each customer has unique tax ID)
SUPPLIER.taxId is unique
EMPLOYEE.taxId might have NULLs (not all have it)

Resolution: Use a partial unique index or application-level enforcement for non-null values.

Handling Partial Commonality

Not all candidate common attributes appear in every subtype. Partial commonality—where an attribute appears in most but not all subtypes—requires nuanced handling.

Decision Framework for Partial Commonality:

Case 1: Present in All-Minus-One

Attribute appears in all subtypes except one:

CUSTOMER.phone: present
SUPPLIER.phone: present  
EMPLOYEE.phone: present
SYSTEM_USER.phone: absent (system accounts have no phone)

Decision: Move to supertype as nullable. The missing subtype simply has NULL values.

Case 2: Present in Majority

Attribute appears in most subtypes:

FULL_TIME.startDate: present
PART_TIME.startDate: present
CONTRACTOR.startDate: present
VOLUNTEER.startDate: absent (volunteers may come and go)
INTERN.startDate: absent (treated differently)

Decision: Consider whether the attribute describes the supertype concept (WORKER) or only certain subtypes. If it's essential to being a worker, move to supertype as nullable. If it's specific to formal employment, keep in applicable subtypes or create an intermediate type.

Case 3: Present in Minority

Attribute appears in few subtypes:

MANAGER.directReports: present (count of direct reports)
TECH_LEAD.directReports: present
ENGINEER.directReports: absent
DESIGNER.directReports: absent
ANALYST.directReports: absent

Decision: This attribute describes 'people who manage others', not all employees. Keep in subtypes, or introduce a PEOPLE_MANAGER intermediate supertype.

Converting Mermaid diagram...

The 80% Threshold Heuristic

A common heuristic: If an attribute appears in 80%+ of subtypes, move it to the supertype (nullable if necessary). If it appears in 20-80%, carefully evaluate whether it describes the supertype concept. If it appears in <20%, it's almost certainly subtype-specific.

Attribute Inheritance Semantics

Once common attributes are placed in the supertype, they are inherited by all subtypes. Understanding inheritance semantics is crucial for correct schema implementation and query design.

Inheritance Principles:

Principle 1: Automatic Inclusion

Every instance of a subtype automatically has all supertype attributes. A MANAGER has all EMPLOYEE attributes plus manager-specific attributes.

EMPLOYEE: {empId, name, email, department, hireDate}
MANAGER: {empId, name, email, department, hireDate} + {budgetAuthority, teamSize}

Principle 2: Single Source of Truth

Inherited attributes exist in one place—the supertype. Subtypes do not duplicate these columns (unless using specific physical implementation strategies).

Principle 3: Uniform Access

Queries can access inherited attributes through the supertype:

-- This works for ALL employee types
SELECT empId, name, email FROM employee

-- This returns managers with inherited + specific attributes
SELECT e.empId, e.name, m.budgetAuthority 
FROM employee e JOIN manager m ON e.empId = m.empId

Principle 4: Constraint Inheritance

Subtypes inherit supertype constraints. A CHECK constraint on EMPLOYEE.hireDate applies to managers, engineers, and all other employee types.

Inheritance Implementation Strategies

•Single Table Inheritance — All attributes (supertype + all subtypes) in one table. Subtype-specific columns are NULL for non-applicable rows. Simple but can waste storage.
•Class Table Inheritance — Separate table for each type in hierarchy. Supertype table holds common attrs; subtype tables hold specific attrs + FK to supertype. Normalized but requires joins.
•Concrete Table Inheritance — Each concrete subtype has its own complete table with all inherited attributes denormalized. No joins but duplication.

Implementation Strategy Trade-offs
Strategy	Query Simplicity	Storage Efficiency	Modification Impact	Best For
Single Table	Excellent (one table)	Poor (many NULLs)	Easy (one place)	Few subtypes, many common attrs
Class Table	Moderate (joins needed)	Good (no waste)	Moderate	Deep hierarchies, normalized needs
Concrete Table	Good (no joins)	Poor (duplication)	Hard (many places)	Read-heavy, rare schema changes

Logical vs Physical

Remember: Inheritance is a logical concept. The EER diagram shows inheritance relationships regardless of how they're physically implemented. The choice between single table, class table, or concrete table is a separate physical design decision made later.

Summary: Common Attributes

We've comprehensively covered the analysis, reconciliation, and placement of common attributes in generalization. Let's consolidate the essential practices:

Key Takeaways

•Commonality requires semantic equivalence — Same names alone don't make attributes common; they must represent the same concept, serve the same purpose, and have compatible domains.
•Follow a systematic process — Inventory attributes, cluster by name, identify synonyms, validate semantics, assess presence patterns, harmonize types, and reconcile constraints.
•Resolve naming conflicts carefully — Handle synonyms by choosing the best term, homonyms by distinguishing with prefixes, and format variations by standardizing conventions.
•Harmonize types without data loss — The unified type must accommodate all valid values from all subtypes. When in doubt, use the wider/larger type.
•Apply the widening principle for constraints — Supertype constraints should be the least restrictive; stricter constraints are enforced at subtype level.
•Handle partial commonality judiciously — Use the 80% heuristic as a guide, but always consider whether the attribute truly describes the supertype concept.

What's Next:

Now that we understand how to identify and place common attributes, we'll examine supertype creation in detail—how to define the supertype entity itself, establish its identity, structure its relationships, and integrate it into the broader schema.

Page Complete

You now have comprehensive knowledge of common attribute analysis. You can identify true commonality beyond superficial name matches, resolve naming and type conflicts, reconcile constraints appropriately, and make informed decisions about partial commonality. Next, we'll focus on creating the supertype itself.

Common Attributes

The Building Blocks of Generalization

In this page, we'll master the techniques for identifying, reconciling, and properly placing common attributes in a generalization hierarchy.

What You Will Learn

What Makes an Attribute 'Common'

Definition:

An attribute is common across a set of entity types E₁, E₂, ..., Eₙ if and only if:

It appears in all (or a significant majority of) the entity types

It serves the same semantic purpose in each entity type

It describes a property of the shared supertype concept, not a subtype-specific characteristic

The Three Dimensions of Commonality:

Structural Commonality

•Name Match — The attribute has identical or clearly synonymous names across entities (e.g., 'email' in CUSTOMER and EMPLOYEE)
•Type Compatibility — The data types are identical or can be unified without loss of information (e.g., VARCHAR(50) and VARCHAR(100) can unify to VARCHAR(100))
•Domain Overlap — The valid values share significant overlap (e.g., both store email addresses following the same format)

Semantic Commonality

•Same Concept — The attribute describes the same real-world property in each entity (e.g., 'birthDate' means date of birth, not date of creation)
•Same Granularity — The level of detail is equivalent (e.g., both store full address vs. just city)
•Same Purpose — The attribute serves the same business function (e.g., both 'phone' attributes are for contact, not audit)

Behavioral Commonality

•Same Constraints — Similar validation rules apply (or can be harmonized) across entities
•Same Update Patterns — The attribute is updated under similar circumstances and by similar processes
•Same Query Usage — The attribute is queried and reported on in similar contexts

Beware False Positives

Systematic Attribute Analysis Process

A rigorous attribute analysis follows a structured process. This ensures that all potential common attributes are identified and properly evaluated.

Step-by-Step Attribute Analysis

•Create Attribute Inventory — List all attributes from all candidate subtype entities. Include name, data type, nullability, constraints, and business description for each.
•Perform Name Clustering — Group attributes by identical names. These are the obvious candidates for commonality (though not guaranteed).
•Identify Synonyms — For unmatched attributes, look for semantic equivalents. Use domain knowledge and business glossaries to identify synonymous terms.
•Validate Semantics — For each cluster of same-named or synonymous attributes, verify they represent the same concept. Consult domain experts if uncertain.
•Assess Presence Patterns — Determine whether each candidate common attribute is present in all subtypes, most subtypes, or only some subtypes.
•Evaluate Type Compatibility — For validated common attributes, compare data types and determine if unification is possible and appropriate.
•Harmonize Constraints — Review constraints (NOT NULL, CHECK, UNIQUE) and determine appropriate supertype constraints.
•Document Decisions — Record which attributes will move to the supertype, which remain in subtypes, and the rationale for borderline cases.

Attribute Analysis Template Example
Attribute Name	CUSTOMER	SUPPLIER	EMPLOYEE	Semantic	Decision
name/companyName	name VARCHAR(100)	companyName VARCHAR(150)	employeeName VARCHAR(80)	All represent entity name	→ Supertype: name VARCHAR(150)
email	email VARCHAR(255)	email VARCHAR(255)	workEmail VARCHAR(255)	All for primary contact	→ Supertype: email VARCHAR(255)
phone	phone VARCHAR(20)	phone VARCHAR(20)	phone VARCHAR(20)	Primary contact phone	→ Supertype: phone VARCHAR(20)
taxId	taxId VARCHAR(15)	taxId VARCHAR(15)	— (absent)	Tax identification	→ Supertype: taxId VARCHAR(15) NULL
creditLimit	creditLimit DECIMAL	— (absent)	— (absent)	Customer-specific only	→ Stays in CUSTOMER
supplyCategories	— (absent)	categories TEXT[]	— (absent)	Supplier-specific	→ Stays in SUPPLIER

The 'All-Most-Some' Framework

Resolving Naming Conflicts and Synonyms

Types of Naming Conflicts:

Type 1: Synonyms (Different Names, Same Concept)

Different terms refer to the same underlying attribute:

'clientId' / 'customerId' / 'accountHolderId' → all mean customer identifier
'effectiveDate' / 'startDate' / 'validFrom' → all mean when something becomes active
'amt' / 'amount' / 'value' → all mean monetary value

Resolution Strategy: Choose the most descriptive, standards-compliant name. Prefer full words over abbreviations. Document the renamed attributes and original names.

Type 2: Homonyms (Same Name, Different Concepts)

Identical terms refer to different underlying concepts:

'status' in ORDER (fulfillment state) vs. 'status' in USER (active/inactive)
'date' in INVOICE (invoice date) vs. 'date' in EMPLOYEE (hire date)
'type' in ACCOUNT (checking/savings) vs. 'type' in CUSTOMER (individual/corporate)

Resolution Strategy: These are not common attributes despite sharing names. Rename them with specific prefixes (orderStatus, userStatus) to clarify distinction.

Type 3: Format Variations (Same Concept, Different Formatting)

Same concept with different naming conventions:

'first_name' / 'firstName' / 'FirstName' / 'FIRST_NAME'
'date_of_birth' / 'dateOfBirth' / 'birthDate' / 'DOB'

Resolution Strategy: Standardize to your project's naming convention. Typically, choose snake_case or camelCase consistently and rename all instances.

Converting Mermaid diagram...

Name Unification Best Practices

•Prefer Domain Terms — Choose names that domain experts use. 'accountHolder' may be technically precise, but if everyone says 'customer', use 'customer'.
•Favor Clarity Over Brevity — 'emailAddress' is better than 'email' if it prevents confusion with 'emailContent'. But don't over-qualify: 'personEmailAddress' is excessive if 'email' is unambiguous.
•Apply Consistent Conventions — Use the same naming pattern throughout: snake_case, camelCase, or PascalCase. Never mix within a schema.
•Document Origins — Maintain a mapping table that records original attribute names and their unified name. This is invaluable for data migration and historical queries.
•Consider Future Subtypes — Choose names that will remain appropriate if new subtypes are added. 'email' is better than 'customerEmail' if employees and suppliers will also use it.

Data Type Harmonization

Principles of Type Harmonization:

Principle 1: No Data Loss

The unified type must be able to store all valid values from all subtypes without truncation or loss of precision.

CUSTOMER.balance   DECIMAL(10,2)   // up to 99,999,999.99
SUPPLIER.balance   DECIMAL(8,2)    // up to 999,999.99
↓
PARTY.balance      DECIMAL(10,2)   // uses larger precision

Principle 2: Semantic Preservation

The unified type must preserve the meaning of the data. Widening a type is usually safe; narrowing is dangerous.

// Safe: widening
VARCHAR(50) + VARCHAR(100) → VARCHAR(100)

// Dangerous: narrowing
VARCHAR(100) → VARCHAR(50)  // potential truncation!

Principle 3: Type Compatibility

Some type combinations cannot be meaningfully unified:

// Incompatible: semantic mismatch
INTEGER (customer count) + DECIMAL(10,2) (account balance)
// These are different concepts, not type variations!

Common Type Harmonization Patterns
Source Types	Unified Type	Rationale	Considerations
VARCHAR(50), VARCHAR(100)	VARCHAR(100)	Take maximum length	Check if max is sufficient for all subtypes
CHAR(10), VARCHAR(20)	VARCHAR(20)	VARCHAR is more flexible	Fixed-length data may pad inconsistently
INT, BIGINT	BIGINT	Take larger range	Consider storage implications
DECIMAL(8,2), DECIMAL(10,4)	DECIMAL(10,4)	Take larger precision+scale	Monetary calculations need consistent scale
DATE, TIMESTAMP	TIMESTAMP	TIMESTAMP includes DATE	May need to handle time zones
BOOLEAN, CHAR(1) 'Y'/'N'	BOOLEAN	Normalize to true boolean	Requires data migration for char column
TEXT, VARCHAR(MAX)	TEXT	Equivalent in most DBMS	Check specific DBMS behavior

Watch for Semantic Incompatibility

type-harmonization-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Example: Harmonizing identifier types before generalization
 
-- Before: Three entities with different ID types
-- CUSTOMER.customer_id    INT AUTO_INCREMENT
-- SUPPLIER.supplier_id    VARCHAR(20) -- legacy codes like 'SUP-001'
-- EMPLOYEE.emp_id        BIGINT
 
-- Decision: Use VARCHAR(36) to accommodate all patterns
-- Plus UUID for new records going forward
 
-- Migration approach:
ALTER TABLE party ADD COLUMN party_id VARCHAR(36) PRIMARY KEY;
 
-- Migrate existing IDs with type prefix for uniqueness
UPDATE party p
SET party_id = CASE
    WHEN p.source = 'customer' THEN CONCAT('CUS-', p.legacy_customer_id)
    WHEN p.source = 'supplier' THEN p.legacy_supplier_id  -- already string
    WHEN p.source = 'employee' THEN CONCAT('EMP-', p.legacy_emp_id)
END;
 
-- New records use UUID
ALTER TABLE party 
ALTER COLUMN party_id SET DEFAULT gen_random_uuid();

Constraint Reconciliation

Nullability Constraints:

The most common constraint conflict is nullability:

CUSTOMER.tax_id    NOT NULL   (all customers must have tax ID)
SUPPLIER.tax_id    NOT NULL   (all suppliers must have tax ID)
EMPLOYEE.tax_id    NULL       (contractors may not have tax ID initially)

Resolution: The supertype must use NULL (the least restrictive). Stricter constraints are enforced at the subtype level:

-- Supertype definition
CREATE TABLE party (
    party_id VARCHAR(36) PRIMARY KEY,
    tax_id VARCHAR(20) NULL,  -- nullable in supertype
    ...
);

-- Subtype-level constraint for customers
ALTER TABLE customer 
    ADD CONSTRAINT customer_tax_id_required 
    CHECK (tax_id IS NOT NULL);

-- Subtype-level constraint for suppliers  
ALTER TABLE supplier
    ADD CONSTRAINT supplier_tax_id_required
    CHECK (tax_id IS NOT NULL);

-- Employee has no such constraint (nullable OK)

Constraint Reconciliation Rules
Constraint Type	Reconciliation Strategy	Example
NOT NULL	Use NULL if any subtype allows NULL	Supertype NULL, strict subtypes add CHECK
UNIQUE	Unique only if unique across ALL subtypes combined	Email unique across all parties, not just within each
CHECK range	Use union of all ranges	CHECK amount > 0 → valid if any subtype allows 0
DEFAULT	Omit default in supertype if it varies	No default; each subtype/app sets appropriate default
FOREIGN KEY	Reference same table only if all do	If all FK to country, move FK to supertype
Length limits	Use maximum length	VARCHAR(100) encompasses VARCHAR(50) and VARCHAR(100)

The Widening Principle

Unique Constraint Considerations:

Uniqueness across subtypes requires special attention:

Scenario 1: Attribute is unique within each subtype

CUSTOMER.email is unique among customers
SUPPLIER.email is unique among suppliers

Question: Should email be unique in the supertype PARTY?

Scenario 2: Attribute is unique in some subtypes only

CUSTOMER.taxId is unique (each customer has unique tax ID)
SUPPLIER.taxId is unique
EMPLOYEE.taxId might have NULLs (not all have it)

Resolution: Use a partial unique index or application-level enforcement for non-null values.

Handling Partial Commonality

Not all candidate common attributes appear in every subtype. Partial commonality—where an attribute appears in most but not all subtypes—requires nuanced handling.

Decision Framework for Partial Commonality:

Case 1: Present in All-Minus-One

Attribute appears in all subtypes except one:

CUSTOMER.phone: present
SUPPLIER.phone: present  
EMPLOYEE.phone: present
SYSTEM_USER.phone: absent (system accounts have no phone)

Decision: Move to supertype as nullable. The missing subtype simply has NULL values.

Case 2: Present in Majority

Attribute appears in most subtypes:

FULL_TIME.startDate: present
PART_TIME.startDate: present
CONTRACTOR.startDate: present
VOLUNTEER.startDate: absent (volunteers may come and go)
INTERN.startDate: absent (treated differently)

Case 3: Present in Minority

Attribute appears in few subtypes:

MANAGER.directReports: present (count of direct reports)
TECH_LEAD.directReports: present
ENGINEER.directReports: absent
DESIGNER.directReports: absent
ANALYST.directReports: absent

Decision: This attribute describes 'people who manage others', not all employees. Keep in subtypes, or introduce a PEOPLE_MANAGER intermediate supertype.

Converting Mermaid diagram...

The 80% Threshold Heuristic

Attribute Inheritance Semantics

Once common attributes are placed in the supertype, they are inherited by all subtypes. Understanding inheritance semantics is crucial for correct schema implementation and query design.

Inheritance Principles:

Principle 1: Automatic Inclusion

Every instance of a subtype automatically has all supertype attributes. A MANAGER has all EMPLOYEE attributes plus manager-specific attributes.

EMPLOYEE: {empId, name, email, department, hireDate}
MANAGER: {empId, name, email, department, hireDate} + {budgetAuthority, teamSize}

Principle 2: Single Source of Truth

Inherited attributes exist in one place—the supertype. Subtypes do not duplicate these columns (unless using specific physical implementation strategies).

Principle 3: Uniform Access

Queries can access inherited attributes through the supertype:

-- This works for ALL employee types
SELECT empId, name, email FROM employee

-- This returns managers with inherited + specific attributes
SELECT e.empId, e.name, m.budgetAuthority 
FROM employee e JOIN manager m ON e.empId = m.empId

Principle 4: Constraint Inheritance

Subtypes inherit supertype constraints. A CHECK constraint on EMPLOYEE.hireDate applies to managers, engineers, and all other employee types.

Inheritance Implementation Strategies

•Single Table Inheritance — All attributes (supertype + all subtypes) in one table. Subtype-specific columns are NULL for non-applicable rows. Simple but can waste storage.
•Class Table Inheritance — Separate table for each type in hierarchy. Supertype table holds common attrs; subtype tables hold specific attrs + FK to supertype. Normalized but requires joins.
•Concrete Table Inheritance — Each concrete subtype has its own complete table with all inherited attributes denormalized. No joins but duplication.

Implementation Strategy Trade-offs
Strategy	Query Simplicity	Storage Efficiency	Modification Impact	Best For
Single Table	Excellent (one table)	Poor (many NULLs)	Easy (one place)	Few subtypes, many common attrs
Class Table	Moderate (joins needed)	Good (no waste)	Moderate	Deep hierarchies, normalized needs
Concrete Table	Good (no joins)	Poor (duplication)	Hard (many places)	Read-heavy, rare schema changes

Logical vs Physical

Summary: Common Attributes

We've comprehensively covered the analysis, reconciliation, and placement of common attributes in generalization. Let's consolidate the essential practices:

Key Takeaways

•Commonality requires semantic equivalence — Same names alone don't make attributes common; they must represent the same concept, serve the same purpose, and have compatible domains.
•Follow a systematic process — Inventory attributes, cluster by name, identify synonyms, validate semantics, assess presence patterns, harmonize types, and reconcile constraints.
•Resolve naming conflicts carefully — Handle synonyms by choosing the best term, homonyms by distinguishing with prefixes, and format variations by standardizing conventions.
•Harmonize types without data loss — The unified type must accommodate all valid values from all subtypes. When in doubt, use the wider/larger type.
•Apply the widening principle for constraints — Supertype constraints should be the least restrictive; stricter constraints are enforced at subtype level.
•Handle partial commonality judiciously — Use the 80% heuristic as a guide, but always consider whether the attribute truly describes the supertype concept.

What's Next:

Page Complete