Loading content...
If generalization is the process of recognizing that diverse entities share a common identity, then common attributes are the evidence that proves it. Every valid generalization rests on a foundation of shared characteristics—attributes that appear in all (or most) subtypes with the same essential meaning.
But identifying common attributes is not as simple as matching column names. Real-world entities evolved independently, often in different departments, different systems, or different eras. What is 'customerName' in one system is 'clientFullName' in another and 'account_holder' in a third. Data types vary. Constraints differ. Optional in one place, required in another.
The skilled database designer must navigate this complexity, transforming a fragmented landscape of inconsistent attributes into a coherent, unified set of inherited properties. This is the art and science of common attribute analysis—the critical step that determines whether a generalization is meaningful and maintainable.
In this page, we'll master the techniques for identifying, reconciling, and properly placing common attributes in a generalization hierarchy.
By the end of this page, you will be able to systematically identify common attributes across entity types, resolve naming conflicts and synonyms, harmonize data types and constraints, decide which attributes belong in the supertype versus subtypes, and document your attribute unification decisions for long-term maintainability.
Before we can identify common attributes, we must precisely define what 'common' means in the context of generalization. Superficial similarity is not enough—true commonality requires semantic equivalence.
Definition:
An attribute is common across a set of entity types E₁, E₂, ..., Eₙ if and only if:
- It appears in all (or a significant majority of) the entity types
- It serves the same semantic purpose in each entity type
- It describes a property of the shared supertype concept, not a subtype-specific characteristic
The Three Dimensions of Commonality:
Same-named attributes are not automatically common. 'status' in ORDER (values: pending, shipped, delivered) and 'status' in EMPLOYEE (values: active, on-leave, terminated) are completely different concepts that happen to share a name. Analyzing only structure without semantics leads to invalid generalizations.
A rigorous attribute analysis follows a structured process. This ensures that all potential common attributes are identified and properly evaluated.
| Attribute Name | CUSTOMER | SUPPLIER | EMPLOYEE | Semantic | Decision |
|---|---|---|---|---|---|
| name/companyName | name VARCHAR(100) | companyName VARCHAR(150) | employeeName VARCHAR(80) | All represent entity name | → Supertype: name VARCHAR(150) |
| email VARCHAR(255) | email VARCHAR(255) | workEmail VARCHAR(255) | All for primary contact | → Supertype: email VARCHAR(255) | |
| phone | phone VARCHAR(20) | phone VARCHAR(20) | phone VARCHAR(20) | Primary contact phone | → Supertype: phone VARCHAR(20) |
| taxId | taxId VARCHAR(15) | taxId VARCHAR(15) | — (absent) | Tax identification | → Supertype: taxId VARCHAR(15) NULL |
| creditLimit | creditLimit DECIMAL | — (absent) | — (absent) | Customer-specific only | → Stays in CUSTOMER |
| supplyCategories | — (absent) | categories TEXT[] | — (absent) | Supplier-specific | → Stays in SUPPLIER |
Categorize each attribute: ALL (present in every subtype) → definitely supertype; MOST (present in majority) → likely supertype with nullable; SOME (present in minority) → likely stays in specific subtypes. This framework provides clear decision criteria.
Real-world systems rarely use consistent naming. When generalizing entities from different sources, naming conflicts are inevitable. The database designer must systematically identify and resolve these conflicts.
Types of Naming Conflicts:
Type 1: Synonyms (Different Names, Same Concept)
Different terms refer to the same underlying attribute:
Resolution Strategy: Choose the most descriptive, standards-compliant name. Prefer full words over abbreviations. Document the renamed attributes and original names.
Type 2: Homonyms (Same Name, Different Concepts)
Identical terms refer to different underlying concepts:
Resolution Strategy: These are not common attributes despite sharing names. Rename them with specific prefixes (orderStatus, userStatus) to clarify distinction.
Type 3: Format Variations (Same Concept, Different Formatting)
Same concept with different naming conventions:
Resolution Strategy: Standardize to your project's naming convention. Typically, choose snake_case or camelCase consistently and rename all instances.
When common attributes have different data types across subtypes, the database designer must harmonize them into a single type for the supertype. This requires careful analysis to avoid data loss or constraint violations.
Principles of Type Harmonization:
Principle 1: No Data Loss
The unified type must be able to store all valid values from all subtypes without truncation or loss of precision.
CUSTOMER.balance DECIMAL(10,2) // up to 99,999,999.99
SUPPLIER.balance DECIMAL(8,2) // up to 999,999.99
↓
PARTY.balance DECIMAL(10,2) // uses larger precision
Principle 2: Semantic Preservation
The unified type must preserve the meaning of the data. Widening a type is usually safe; narrowing is dangerous.
// Safe: widening
VARCHAR(50) + VARCHAR(100) → VARCHAR(100)
// Dangerous: narrowing
VARCHAR(100) → VARCHAR(50) // potential truncation!
Principle 3: Type Compatibility
Some type combinations cannot be meaningfully unified:
// Incompatible: semantic mismatch
INTEGER (customer count) + DECIMAL(10,2) (account balance)
// These are different concepts, not type variations!
| Source Types | Unified Type | Rationale | Considerations |
|---|---|---|---|
| VARCHAR(50), VARCHAR(100) | VARCHAR(100) | Take maximum length | Check if max is sufficient for all subtypes |
| CHAR(10), VARCHAR(20) | VARCHAR(20) | VARCHAR is more flexible | Fixed-length data may pad inconsistently |
| INT, BIGINT | BIGINT | Take larger range | Consider storage implications |
| DECIMAL(8,2), DECIMAL(10,4) | DECIMAL(10,4) | Take larger precision+scale | Monetary calculations need consistent scale |
| DATE, TIMESTAMP | TIMESTAMP | TIMESTAMP includes DATE | May need to handle time zones |
| BOOLEAN, CHAR(1) 'Y'/'N' | BOOLEAN | Normalize to true boolean | Requires data migration for char column |
| TEXT, VARCHAR(MAX) | TEXT | Equivalent in most DBMS | Check specific DBMS behavior |
If you find yourself trying to unify INTEGER and VARCHAR, or DATE and DECIMAL, stop. These aren't the same attribute—they're different concepts with coincidentally similar names. Semantic incompatibility indicates the attributes shouldn't be unified.
123456789101112131415161718192021222324
-- Example: Harmonizing identifier types before generalization -- Before: Three entities with different ID types-- CUSTOMER.customer_id INT AUTO_INCREMENT-- SUPPLIER.supplier_id VARCHAR(20) -- legacy codes like 'SUP-001'-- EMPLOYEE.emp_id BIGINT -- Decision: Use VARCHAR(36) to accommodate all patterns-- Plus UUID for new records going forward -- Migration approach:ALTER TABLE party ADD COLUMN party_id VARCHAR(36) PRIMARY KEY; -- Migrate existing IDs with type prefix for uniquenessUPDATE party pSET party_id = CASE WHEN p.source = 'customer' THEN CONCAT('CUS-', p.legacy_customer_id) WHEN p.source = 'supplier' THEN p.legacy_supplier_id -- already string WHEN p.source = 'employee' THEN CONCAT('EMP-', p.legacy_emp_id)END; -- New records use UUIDALTER TABLE party ALTER COLUMN party_id SET DEFAULT gen_random_uuid();Attributes in different subtypes often have different constraints. When moving attributes to a supertype, these constraints must be reconciled—typically by taking the least restrictive constraint for the supertype, while preserving stricter constraints at the subtype level.
Nullability Constraints:
The most common constraint conflict is nullability:
CUSTOMER.tax_id NOT NULL (all customers must have tax ID)
SUPPLIER.tax_id NOT NULL (all suppliers must have tax ID)
EMPLOYEE.tax_id NULL (contractors may not have tax ID initially)
Resolution: The supertype must use NULL (the least restrictive). Stricter constraints are enforced at the subtype level:
-- Supertype definition
CREATE TABLE party (
party_id VARCHAR(36) PRIMARY KEY,
tax_id VARCHAR(20) NULL, -- nullable in supertype
...
);
-- Subtype-level constraint for customers
ALTER TABLE customer
ADD CONSTRAINT customer_tax_id_required
CHECK (tax_id IS NOT NULL);
-- Subtype-level constraint for suppliers
ALTER TABLE supplier
ADD CONSTRAINT supplier_tax_id_required
CHECK (tax_id IS NOT NULL);
-- Employee has no such constraint (nullable OK)
| Constraint Type | Reconciliation Strategy | Example |
|---|---|---|
| NOT NULL | Use NULL if any subtype allows NULL | Supertype NULL, strict subtypes add CHECK |
| UNIQUE | Unique only if unique across ALL subtypes combined | Email unique across all parties, not just within each |
| CHECK range | Use union of all ranges | CHECK amount > 0 → valid if any subtype allows 0 |
| DEFAULT | Omit default in supertype if it varies | No default; each subtype/app sets appropriate default |
| FOREIGN KEY | Reference same table only if all do | If all FK to country, move FK to supertype |
| Length limits | Use maximum length | VARCHAR(100) encompasses VARCHAR(50) and VARCHAR(100) |
Supertype constraints should always be 'wider' (less restrictive) than or equal to any subtype constraint. Think of it as: the supertype defines what's possible, subtypes define what's required for their specific cases.
Unique Constraint Considerations:
Uniqueness across subtypes requires special attention:
Scenario 1: Attribute is unique within each subtype
Question: Should email be unique in the supertype PARTY?
Answer: It depends on business rules. If the same email can belong to a customer AND a supplier (same person in both roles), then email is NOT unique in PARTY. If email must be globally unique (one person = one party record), then email IS unique in PARTY.
Scenario 2: Attribute is unique in some subtypes only
Resolution: Use a partial unique index or application-level enforcement for non-null values.
Not all candidate common attributes appear in every subtype. Partial commonality—where an attribute appears in most but not all subtypes—requires nuanced handling.
Decision Framework for Partial Commonality:
Case 1: Present in All-Minus-One
Attribute appears in all subtypes except one:
CUSTOMER.phone: present
SUPPLIER.phone: present
EMPLOYEE.phone: present
SYSTEM_USER.phone: absent (system accounts have no phone)
Decision: Move to supertype as nullable. The missing subtype simply has NULL values.
Case 2: Present in Majority
Attribute appears in most subtypes:
FULL_TIME.startDate: present
PART_TIME.startDate: present
CONTRACTOR.startDate: present
VOLUNTEER.startDate: absent (volunteers may come and go)
INTERN.startDate: absent (treated differently)
Decision: Consider whether the attribute describes the supertype concept (WORKER) or only certain subtypes. If it's essential to being a worker, move to supertype as nullable. If it's specific to formal employment, keep in applicable subtypes or create an intermediate type.
Case 3: Present in Minority
Attribute appears in few subtypes:
MANAGER.directReports: present (count of direct reports)
TECH_LEAD.directReports: present
ENGINEER.directReports: absent
DESIGNER.directReports: absent
ANALYST.directReports: absent
Decision: This attribute describes 'people who manage others', not all employees. Keep in subtypes, or introduce a PEOPLE_MANAGER intermediate supertype.
A common heuristic: If an attribute appears in 80%+ of subtypes, move it to the supertype (nullable if necessary). If it appears in 20-80%, carefully evaluate whether it describes the supertype concept. If it appears in <20%, it's almost certainly subtype-specific.
Once common attributes are placed in the supertype, they are inherited by all subtypes. Understanding inheritance semantics is crucial for correct schema implementation and query design.
Inheritance Principles:
Principle 1: Automatic Inclusion
Every instance of a subtype automatically has all supertype attributes. A MANAGER has all EMPLOYEE attributes plus manager-specific attributes.
EMPLOYEE: {empId, name, email, department, hireDate}
MANAGER: {empId, name, email, department, hireDate} + {budgetAuthority, teamSize}
Principle 2: Single Source of Truth
Inherited attributes exist in one place—the supertype. Subtypes do not duplicate these columns (unless using specific physical implementation strategies).
Principle 3: Uniform Access
Queries can access inherited attributes through the supertype:
-- This works for ALL employee types
SELECT empId, name, email FROM employee
-- This returns managers with inherited + specific attributes
SELECT e.empId, e.name, m.budgetAuthority
FROM employee e JOIN manager m ON e.empId = m.empId
Principle 4: Constraint Inheritance
Subtypes inherit supertype constraints. A CHECK constraint on EMPLOYEE.hireDate applies to managers, engineers, and all other employee types.
| Strategy | Query Simplicity | Storage Efficiency | Modification Impact | Best For |
|---|---|---|---|---|
| Single Table | Excellent (one table) | Poor (many NULLs) | Easy (one place) | Few subtypes, many common attrs |
| Class Table | Moderate (joins needed) | Good (no waste) | Moderate | Deep hierarchies, normalized needs |
| Concrete Table | Good (no joins) | Poor (duplication) | Hard (many places) | Read-heavy, rare schema changes |
Remember: Inheritance is a logical concept. The EER diagram shows inheritance relationships regardless of how they're physically implemented. The choice between single table, class table, or concrete table is a separate physical design decision made later.
We've comprehensively covered the analysis, reconciliation, and placement of common attributes in generalization. Let's consolidate the essential practices:
What's Next:
Now that we understand how to identify and place common attributes, we'll examine supertype creation in detail—how to define the supertype entity itself, establish its identity, structure its relationships, and integrate it into the broader schema.
You now have comprehensive knowledge of common attribute analysis. You can identify true commonality beyond superficial name matches, resolve naming and type conflicts, reconcile constraints appropriately, and make informed decisions about partial commonality. Next, we'll focus on creating the supertype itself.