Loading content...
We've established that lossless join decomposition prevents spurious tuples and enables exact reconstruction of the original relation. But what does it truly mean to preserve information in a database?
Information preservation goes beyond just avoiding spurious tuples. It encompasses:
This page explores these deeper aspects of information preservation, connecting the mathematical formalism of lossless joins to the practical reality of database design.
By the end of this page, you will understand what information preservation truly means beyond spurious tuple prevention, how semantic meaning is maintained during decomposition, the relationship between losslessness and query power, and the theoretical foundations connecting normalization to information theory.
To understand preservation, we must first identify what types of information a database contains. A database isn't just a collection of values—it encodes multiple layers of information that must all be preserved during decomposition.
True information preservation requires maintaining all five layers:
| Layer | Preservation Criterion | Verification Method |
|---|---|---|
| Extension | Lossless join | Chase algorithm |
| Intension | Attribute coverage | Union equals original |
| Constraints | Dependency preservation | Closure equivalence |
| Relationships | Key preservation | Candidate key in result |
| Semantics | Domain modeling | Design review |
Many practitioners focus only on lossless join (extension preservation) while neglecting dependency preservation (constraint preservation) or semantic coherence. A decomposition can be lossless yet still damage the database's ability to enforce business rules or represent domain concepts clearly.
Semantic integrity refers to maintaining the meaningful interpretation of data. The decomposed schema should represent the same real-world entities and relationships as the original, just organized differently.
Example of Semantic Preservation:
Original relation:
Employee(EmpID, Name, DeptID, DeptName, DeptLocation)
Semantics: An employee works in a department, departments have names and locations.
Good decomposition:
Employee(EmpID, Name, DeptID) -- Employee entity
Department(DeptID, DeptName, DeptLocation) -- Department entity
Semantic meaning is preserved:
Example of Semantic Corruption:
Bad decomposition (even if technically lossless):
R1(EmpID, Name, DeptName) -- Mixed employee + partial dept
R2(DeptName, DeptID, DeptLocation) -- Dept keyed by name??
Problems:
This decomposition might be lossless (if DeptName is unique), but it corrupts the semantic model by conflating naming with identity.
Each decomposed relation should correspond to a single real-world entity type or relationship type. Relations that mix concepts from different entities create semantic confusion, even if mathematically they preserve information.
A fundamental measure of information preservation is query equivalence: Can every query answerable on the original schema be answered on the decomposed schema?
Theorem:
If a decomposition is lossless, then for any query Q on the original relation R, there exists an equivalent query Q' on the decomposed relations {R₁, R₂, ..., Rₙ} that produces the same result.
Proof sketch:
123456789101112131415161718192021
-- Original Query on R(A, B, C, D, E):SELECT A, B, DFROM RWHERE C > 100 AND E = 'Active'; -- Decomposition: R1(A,B,C), R2(C,D), R3(A,E)-- Equivalent Query on Decomposed Relations: SELECT R1.A, R1.B, R2.DFROM R1JOIN R2 ON R1.C = R2.CJOIN R3 ON R1.A = R3.AWHERE R1.C > 100 AND R3.E = 'Active'; -- The queries produce IDENTICAL results because:-- 1. The join perfectly reconstructs R (lossless property)-- 2. WHERE clause filters apply to the same underlying data-- 3. SELECT projects the same attributes -- NOTE: Query on decomposed schema may be less efficient-- (requires joins), but it's SEMANTICALLY equivalentLossless decomposition preserves the information capacity (what can be computed) but may affect access efficiency (how quickly it can be computed). A query that was a simple table scan might become a multi-table join. This is a trade-off, not information loss.
Query Equivalence Implications:
What changes:
We touched on dependency preservation earlier, but constraint preservation deserves deeper examination. It's not just about functional dependencies—databases enforce many types of constraints.
| Constraint Type | What It Enforces | Preservation in Decomposition |
|---|---|---|
| Primary Key | Uniqueness + NOT NULL | Must designate key in each relation |
| Foreign Key | Referential integrity | May need new FK constraints on joins |
| Functional Dependency | Determinism | Dependency preservation test |
| CHECK constraint | Value conditions | Usually preserved if attributes stay together |
| UNIQUE | Uniqueness (nullable) | Preserved if attributes in same relation |
| NOT NULL | Required values | Preserved per-attribute |
| Business rules | Complex invariants | May require application logic |
The Dependency Preservation Test (Revisited):
A decomposition preserves dependencies if every FD in F can be enforced by checking constraints on individual decomposed relations, without needing to join them.
Formally: (F₁ ∪ F₂ ∪ ... ∪ Fₘ)⁺ = F⁺
Where Fᵢ is the projection of F onto Rᵢ.
Why This Matters:
If an FD X → Y spans multiple relations, enforcing it requires:
This is expensive and error-prone. Single-table constraints are:
123456789101112131415161718192021222324252627282930313233343536
-- Original: StudentCourse(StudentID, CourseID, ProfID, ProfOffice)-- FDs: ProfID → ProfOffice (each professor has one office) -- Decomposition that PRESERVES the dependency:-- R1(StudentID, CourseID, ProfID)-- R2(ProfID, ProfOffice) -- Enforcing ProfID → ProfOffice is EASY:ALTER TABLE R2 ADD PRIMARY KEY (ProfID);-- Now ProfID → ProfOffice is automatically enforced! -- Decomposition that LOSES the dependency:-- R1(StudentID, ProfID, ProfOffice) -- R2(StudentID, CourseID) -- Enforcing ProfID → ProfOffice is HARD:-- No single table contains ProfID → ProfOffice as a key! -- Option 1: Create a trigger (complex, error-prone)CREATE TRIGGER check_prof_officeBEFORE INSERT OR UPDATE ON R1FOR EACH ROWBEGIN IF EXISTS ( SELECT 1 FROM R1 WHERE ProfID = NEW.ProfID AND ProfOffice != NEW.ProfOffice ) THEN SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Professor office inconsistency'; END IF;END; -- Option 2: Rely on application code (dangerous)-- Option 3: Accept the redundancy risk (unacceptable)This is why 3NF (which guarantees dependency preservation) is often preferred over BCNF (which only guarantees losslessness). In BCNF, some constraints become unenforceable without complex triggers or application logic. Choose based on your integrity requirements.
An often-overlooked aspect of information preservation is temporal consistency—how the schema handles changes over time.
The Challenge:
Databases are dynamic. Data changes through:
A well-preserved decomposition should maintain losslessness not just for the current state, but for all valid state transitions.
Update Anomaly Prevention:
Lossless decomposition inherently helps with temporal consistency by eliminating redundancy anomalies:
Before decomposition (redundant):
| EmpID | Name | DeptID | DeptName |
|-------|------|--------|----------|
| E1 | Alice| D1 | Sales |
| E2 | Bob | D1 | Sales |
| E3 | Carol| D1 | Sales |
To change Sales → Marketing, update 3 rows. If any update fails, data becomes inconsistent.
After decomposition:
Employee: | EmpID | Name | DeptID |
| E1 | Alice| D1 |
| E2 | Bob | D1 |
| E3 | Carol| D1 |
Department: | DeptID | DeptName |
| D1 | Sales |
To change Sales → Marketing, update 1 row. Atomic, consistent, no anomaly possible.
Lossless decomposition combined with redundancy elimination ensures that every valid update on the decomposed schema produces a state that could have existed on the original schema—and vice versa. Data integrity is preserved across time.
The concept of lossless decomposition has deep connections to information theory, providing an elegant theoretical foundation.
Information-Theoretic View:
In information theory, lossless compression means representing data in a smaller form from which the original can be perfectly recovered. Database normalization is analogous:
Entropy and Redundancy:
A relation with redundancy has low information density—the same facts are repeated. Normalization increases information density by:
The trade-off:
This is exactly analogous to compression vs. raw data storage.
Think of functional dependencies as patterns that enable compression. A→B means 'storing B separately is redundant—it can be derived from A.' Normalization exploits these patterns to reduce redundancy while guaranteeing perfect reconstruction (losslessness).
Let's synthesize everything into a comprehensive framework for evaluating information preservation in decompositions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
COMPLETE INFORMATION PRESERVATION FRAMEWORK============================================ A decomposition D of relation R fully preserves information if andonly if the following six conditions are satisfied: 1. STRUCTURAL PRESERVATION (Attributes)--------------------------------------- ∪{Rᵢ ∈ D} = R Every attribute appears in at least one decomposed relation. No structural information is lost. 2. EXTENSIONAL PRESERVATION (Losslessness) ------------------------------------------ For all valid instances r of R: π_R₁(r) ⋈ π_R₂(r) ⋈ ... ⋈ π_Rₙ(r) = r The original data can be exactly reconstructed. Verified via Chase algorithm. 3. INTENSIONAL PRESERVATION (Dependencies)------------------------------------------ (F₁ ∪ F₂ ∪ ... ∪ Fₙ)⁺ = F⁺ All functional dependencies remain enforceable. [Note: BCNF may sacrifice this; 3NF guarantees it] 4. RELATIONAL PRESERVATION (Keys)--------------------------------- At least one Rᵢ contains a complete candidate key of R The ability to identify tuples uniquely is maintained. 5. SEMANTIC PRESERVATION (Meaning)---------------------------------- Each Rᵢ represents a coherent real-world concept Verified through domain modeling review, not algorithm. 6. TEMPORAL PRESERVATION (Updates)---------------------------------- No update anomalies exist in the decomposed schema Each fact stored in exactly one place. PRESERVATION GRADES:--------------------★★★★★ Full: All 6 conditions satisfied (3NF with good design)★★★★☆ Near-full: 5 of 6 (BCNF may lose #3)★★★☆☆ Partial: 4 of 6 (lossless but design issues)★★☆☆☆ Minimal: Only losslessness (#2)★☆☆☆☆ Broken: Missing losslessness (unusable)A five-star decomposition satisfies all six preservation conditions. Achieving this requires careful design, not just algorithm application. Always verify semantic coherence and update behavior, not just mathematical properties.
Information preservation is richer and deeper than just lossless joins. True preservation maintains the full informational integrity of your database.
You now understand the full depth of information preservation in database decomposition. The final page explores join reconstruction—the practical process of recombining decomposed relations to answer queries.