Loading content...
Imagine you're an archaeologist who has discovered artifacts from an ancient civilization: clay pots, stone tools, metal jewelry, and wooden utensils. Initially, each artifact seems unique, but as you study them, patterns emerge. You begin grouping items by material, by purpose, by era. Eventually, you realize that certain artifacts—though superficially different—share deep structural properties. They're all tools, or all ceremonial objects, or all household items.
This is precisely what database designers do when applying the bottom-up approach to generalization. You start with concrete, specific entity types—perhaps inherited from a legacy system, discovered through requirements gathering, or created during iterative design. Then, through careful analysis, you discover hidden commonalities that suggest a more elegant, unified model.
The bottom-up approach is not merely a technique; it is a disciplined methodology for discovering abstraction in data. It transforms a collection of seemingly disparate entities into a coherent type hierarchy that reflects the true conceptual structure of the domain.
By the end of this page, you will master the systematic bottom-up methodology for generalization: how to identify candidate entities, analyze their attributes and relationships, discover meaningful commonalities, create well-formed supertypes, and validate your generalizations against domain requirements.
Before diving into the methodology, it's essential to understand the distinction between bottom-up and top-down approaches to type hierarchies, as they represent fundamentally different cognitive and design processes.
Bottom-Up (Generalization):
Top-Down (Specialization):
While these approaches are conceptually inverse, they often converge to the same hierarchical structure. The choice of approach depends on how the design problem presents itself.
| Aspect | Bottom-Up (Generalization) | Top-Down (Specialization) |
|---|---|---|
| Initial model state | Multiple specific entity types exist | Single general entity exists |
| Design question | "What do these have in common?" | "What are the different kinds of this?" |
| Cognitive process | Synthesis (combining into whole) | Analysis (breaking into parts) |
| When typically used | Legacy system integration, data consolidation | New system design, domain decomposition |
| Risk | Forcing artificial commonalities | Missing important variations |
| Challenge | Naming the discovered supertype | Completely enumerating subtypes |
In practice, database design often involves both approaches iteratively. You might start bottom-up by generalizing CHECKING and SAVINGS into ACCOUNT, then apply top-down thinking to realize you're missing other account types like CD and MONEY_MARKET. The approaches complement each other.
The bottom-up approach to generalization follows a disciplined, repeatable methodology. Each step builds on the previous, leading from initial observation to validated generalization.
Let's walk through the complete bottom-up generalization process using a realistic healthcare system scenario. This extended example demonstrates each phase of the methodology in action.
Initial Situation:
A hospital information system has evolved over years, with different departments creating their own entity models. You've been tasked with consolidating the data model. The current state includes several person-related entities that appear redundant:
Step 1.1: Entity Inventory
We identify four person-related entities in the current model:
PHYSICIAN
NURSE
PATIENT
ADMINISTRATIVE_STAFF
Step 1.2: Relationship Documentation
| Entity | Relationships |
|---|---|
| PHYSICIAN | TREATS→Patient, WORKS_IN→Department, HAS_PRIVILEGES→Hospital, ORDERS→Test/Medication |
| NURSE | ASSIGNED_TO→Patient, WORKS_IN→Department, REPORTS_TO→Physician, ADMINISTERS→Medication |
| PATIENT | TREATED_BY→Physician, HAS→Appointment, RECEIVES→Test/Medication, BILLED_TO→Insurance |
| ADMIN_STAFF | WORKS_IN→Department, SCHEDULES→Appointment, PROCESSES→Billing |
The heart of bottom-up generalization is attribute analysis. Several systematic techniques help identify common attributes that should move to a supertype.
Handling Attribute Variations:
Real-world entities often have 'almost identical' attributes that require careful handling:
Case 1: Naming Differences
Case 2: Type Differences
Case 3: Constraint Differences
Case 4: Presence Differences
A common heuristic: if an attribute appears in 80% or more of the candidate subtypes, consider placing it in the supertype (possibly as nullable for the exceptions). Below 50%, it's likely a subtype-specific attribute. Between 50-80%, careful domain analysis is needed.
Beyond attributes, relationship patterns provide powerful signals for generalization. When multiple entities participate in the same relationships, they likely share a common abstract identity.
Pattern 1: Common Participation
Multiple entities participate in relationships with the same target entity:
CUSTOMER ──(has)──→ ADDRESS
EMPLOYEE ──(has)──→ ADDRESS
SUPPLIER ──(has)──→ ADDRESS
This pattern suggests all three should generalize to a PARTY or ADDRESSABLE_ENTITY supertype that has the ADDRESS relationship.
Pattern 2: Role Substitutability
Multiple entities can play the same role in a relationship:
CUSTOMER ──(places)──→ ORDER
DISTRIBUTOR ──(places)──→ ORDER
INTERNAL_DEPT ──(places)──→ ORDER
Any of these can place orders. Generalization to ORDER_PARTY or ACCOUNT simplifies the ORDER entity, which now relates to one supertype.
Pattern 3: Audit/Tracking Relationships
Entities with common tracking relationships:
CONTRACT ──(created_by)──→ USER
CONTRACT ──(modified_by)──→ USER
INVOICE ──(created_by)──→ USER
INVOICE ──(modified_by)──→ USER
PROPOSAL ──(created_by)──→ USER
This suggests an AUDITABLE_DOCUMENT supertype with common audit relationships.
| Pattern Type | Detection Signal | Generalization Action | Example |
|---|---|---|---|
| Common Target | Same entity appears as target of multiple relationships | Create supertype for sources | PERSON for all entities linked to ADDRESS |
| Role Substitution | Multiple entities fill same role in different instances | Create supertype to fill role | ACCOUNT for all entities that can place orders |
| Parallel Relationships | Multiple entities have identical relationship sets | Create supertype, move relationships up | DOCUMENT for all entities with audit trails |
| Hierarchical Pattern | Entity relates to entities of the same type | Consider self-referential on supertype | EMPLOYEE.manages→EMPLOYEE generalizes to PERSON |
Sometimes, examining relationships reveals generalization opportunities that attribute analysis misses. If five entities all participate in relationships with DEPARTMENT, BUILDING, and ACCESS_CONTROL, they likely share an ORGANIZATIONAL_ENTITY identity—even if their attributes seem quite different.
Several practical tools and techniques support the bottom-up generalization process in real-world database design projects.
123456789101112131415161718192021222324252627
-- SQL Query to identify common columns across tables-- (for analyzing existing database for generalization opportunities) WITH column_counts AS ( SELECT column_name, data_type, COUNT(DISTINCT table_name) as table_count, STRING_AGG(table_name, ', ') as appears_in_tables FROM information_schema.columns WHERE table_schema = 'public' AND table_name IN ('customer', 'supplier', 'employee', 'contractor') GROUP BY column_name, data_type)SELECT column_name, data_type, table_count, appears_in_tables, CASE WHEN table_count = 4 THEN 'SUPERTYPE CANDIDATE' WHEN table_count >= 3 THEN 'PROBABLE SUPERTYPE' WHEN table_count = 2 THEN 'CONSIDER' ELSE 'SUBTYPE SPECIFIC' END as recommendationFROM column_countsORDER BY table_count DESC, column_name;Automated Discovery Tools:
Several database design and data modeling tools offer features that assist with generalization discovery:
While these tools can assist, human judgment remains essential. The tools identify potential generalizations; domain expertise determines which are meaningful.
Even experienced database designers can fall into traps when applying bottom-up generalization. Recognizing these pitfalls helps avoid them.
If you can't explain the supertype to a domain expert in one natural sentence—'X is a general category that includes [subtypes] because they all [common property]'—the generalization may be artificial. The supertype should have a natural name that domain experts would use.
We've explored the complete bottom-up methodology for discovering and implementing generalization. Let's consolidate the essential practices:
What's Next:
Now that we understand the bottom-up process for discovering generalizations, we'll examine common attributes in detail—how to identify, analyze, and properly move shared attributes to the supertype while handling edge cases like nullable values, type mismatches, and constraint differences.
You now understand the systematic bottom-up methodology for generalization. You can analyze existing entities, identify commonalities, construct meaningful supertypes, and validate your generalizations against domain requirements. Next, we'll focus specifically on handling common attributes—the building blocks of generalization.