File System Vs Dbms - Learning Module

Loading content...

0/241

Data Redundancy and Inconsistency

The Twin Problems That Defined a Crisis

Among all the limitations of file-based systems, data redundancy and data inconsistency stand out as the most pervasive and damaging. These twin problems are so tightly intertwined that understanding one requires understanding both.

Data redundancy creates the conditions for data inconsistency. And data inconsistency erodes the very foundation of information systems: the ability to trust that data represents reality.

This page examines these problems in depth—not because they're merely historical curiosities, but because their ghosts still haunt modern systems wherever data architecture principles are violated.

What You Will Learn

By the end of this page, you will understand the formal definitions and mathematical implications of redundancy, the multiple dimensions of inconsistency, how these problems compound over time, their real-world business consequences, and the fundamental insight that led to relational database theory as a solution.

Formal Definition of Data Redundancy

Data redundancy occurs when the same piece of information is stored in more than one place within a data management system. More formally:

Definition: A data element d exhibits redundancy in system S if d is derivable from other data elements in S, or if d is stored in multiple independent locations within S.

This definition captures two distinct but related forms of redundancy:

Storage Redundancy

•Same data value stored in multiple files
•Same data value stored in multiple records within a file
•Example: Customer address in Orders, Invoices, and Shipping files

Derivational Redundancy

•Storing data that can be computed from other data
•Example: Storing both unit_price, quantity, AND line_total
•line_total = unit_price × quantity (derivable, thus redundant)

Measuring Redundancy:

We can quantify redundancy in a system using the Redundancy Factor (RF):

RF = (Total Storage Used) / (Minimum Necessary Storage)

RF = 1.0 → No redundancy (theoretical minimum)
RF = 2.0 → Data is stored twice on average
RF = 5.0 → Data is stored five times on average

Studies of enterprise file systems in the 1970s and 1980s found typical RF values of 3.0 to 6.0, meaning organizations stored the same data 3-6 times on average.

Not All Redundancy Is Bad

In modern systems, controlled redundancy is sometimes intentional—for performance (caching, denormalization), reliability (replication), or availability (backup). The problem with file-based systems wasn't redundancy per se, but uncontrolled, uncoordinated redundancy where copies could diverge without detection.

Root Causes of Redundancy in File Systems

Understanding why redundancy occurs in file-based systems reveals that it's not a failure of discipline but a structural inevitability. Several forces combine to make redundancy unavoidable:

Departmental Autonomy:

Organizations are structured into departments with distinct responsibilities, budgets, and priorities. Each department:

Develops systems to meet its specific needs
Controls its own data without central coordination
Has different development timelines and priorities
Views 'its' data as proprietary

Historical Accumulation:

Systems are built incrementally over years or decades:

1965: Accounting department builds Accounts Payable system
1968: Purchasing department builds Vendor Management system
1972: Quality department builds Supplier Qualification system

Each system independently stores supplier information. By the time anyone notices the redundancy, three departments depend on three different files, and unification seems impossible.

Organizational Barriers

'Why should I depend on Purchasing's data? They don't maintain it properly, and when their system is down, my work stops.' This attitude was common and rational given the lack of shared infrastructure.

Anatomy of Data Inconsistency

Data inconsistency occurs when multiple representations of the same real-world entity contain conflicting information. Given redundancy, inconsistency is not a question of 'if' but 'when' and 'how severe'.

Formal Definition:

Definition: Data elements d₁ and d₂ are inconsistent if they purport to represent the same real-world fact but contain different values, where the difference cannot be explained by valid temporal variation.

Note the nuance: if d₁ is 'balance as of yesterday' and d₂ is 'balance as of today', different values don't indicate inconsistency. But if both claim to be 'current balance' and differ, that's inconsistency.

Dimensions of Inconsistency:

Inconsistency manifests across multiple dimensions:

Dimensions of Data Inconsistency
Dimension	Description	Example
Value Inconsistency	Same attribute, different values	Address in System A: '123 Main St'; System B: '123 Main Street'
Format Inconsistency	Same value, different representations	Date: '2024-03-15' vs '03/15/2024' vs '15-MAR-24'
Semantic Inconsistency	Same term, different meanings	'Revenue' includes vs excludes returns
Temporal Inconsistency	Different points in time treated as same	Last year's revenue in one report, YTD in another
Structural Inconsistency	Same entity, different decomposition	Full name as one field vs separate first/middle/last
Identity Inconsistency	Same entity, different identifiers	Customer #12345 in Sales = Customer #C-98765 in Support?

The Inconsistency Lifecycle:

Inconsistency develops through predictable stages:

Converting Mermaid diagram...

Update Anomalies: The Technical Details

Update anomalies are specific patterns of problems that arise from redundant data storage. Understanding these patterns provides insight into why normalization theory became central to database design.

The Three Canonical Anomalies:

Insertion Anomaly occurs when you cannot add a record about one entity without also adding data about a different entity.

Example: Consider a file that combines employee and department information:

employee_dept.txt
EMPLOYEE-DEPARTMENT FILE:
EmpID | EmpName     | DeptNo | DeptName     | DeptHead
------+-------------+--------+--------------+-----------
E101  | Alice Smith | D10    | Engineering  | Bob Jones
E102  | Carol White | D10    | Engineering  | Bob Jones
E103  | David Brown | D20    | Marketing    | Eve Adams
 
PROBLEM: We want to add a new department "Research (D30)" 
         headed by "Frank Miller"
 
IMPOSSIBLE! We cannot insert a department without 
at least one employee assigned to it.
 
The file structure forces us to either:
  1. Create a fake employee for D30
  2. Wait until we hire someone for Research
  3. Leave Research unrecorded
 
All options are problematic.

Real Impact

Insertion anomalies prevent organizations from recording legitimate business information. A university couldn't record a new course until a student enrolled. A hospital couldn't record a new department until a patient was admitted. This forced workarounds that introduced their own data quality problems.

The Root Cause:

All three anomalies share a common cause: mixing independent facts in a single record structure. A department's head is a fact about the department, not about each employee in that department. Storing department facts with employee facts forces artificial dependencies:

Departments can only exist if employees exist (insertion anomaly)
Deleting employees may delete departments (deletion anomaly)
Updating departments requires updating all employees (modification anomaly)

This insight—that different facts should be stored separately and linked by references—became the foundation of relational database design.

Case Study: Redundancy and Inconsistency in a Hospital

To understand the real-world impact of data redundancy and inconsistency, let's examine a detailed case study from a 1970s hospital that operated on file-based systems.

Background:

Metropolitan General Hospital had 500 beds, 2,000 employees, and saw 50,000 patients annually. Like most hospitals of the era, it operated with separate file-based systems for different departments.

Hospital Systems and Patient Data
System	Department	Patient Data Stored	Records
Admissions System	Admissions	Demographics, insurance, guarantor	150,000
Medical Records	HIM	Demographics, diagnoses, procedures	150,000
Billing System	Finance	Demographics, insurance, charges	200,000
Pharmacy System	Pharmacy	Demographics, allergies, medications	100,000
Lab System	Laboratory	Demographics, test orders, results	500,000
Radiology System	Radiology	Demographics, orders, reports	75,000

The Redundancy Situation:

Patient demographic information—name, date of birth, address, phone, insurance—was stored in all six systems. For active patients, this meant 6 copies of the same information. The hospital estimated:

•90% of patient demographic data was redundant across systems
•70% of insurance information was duplicated
•60% of allergy information was stored in multiple places
•An update to a patient's address required entry into 4-6 systems

The Inconsistency Rate

An audit found that for patients with records in all six systems, 42% had at least one inconsistency in their demographic data across systems. Most commonly: different addresses (patient moved, not all systems updated), different spellings of names, and different insurance information.

Clinical Impact:

Inconsistency in hospital data created real patient safety risks:

Patient Safety Incidents Traced to Data Inconsistency

•Wrong patient identification: Lab results delivered to wrong patient file because of name spelling variations
•Allergy alert failures: Patient allergic to penicillin in Pharmacy system, not recorded in physician order system
•Insurance claim denials: 15% of claims initially denied due to demographic mismatches with insurance records
•Medication delays: Duplicate patient records caused confusion about which record was authoritative
•Missing test results: Results filed under 'John Q. Smith' not linked to physician orders for 'J. Quincy Smith'

Financial Impact:

The hospital estimated annual costs of redundancy and inconsistency:

Cost Category	Annual Estimate	Notes
Duplicate data entry labor	$180,000	Staff time entering same data multiple times
Error correction labor	$120,000	Investigating and fixing inconsistencies
Denied claim rework	$240,000	Resubmitting claims with corrected information
Lost revenue (unfiled claims)	$150,000	Claims never filed due to data confusion
Duplicate mailings	$35,000	Multiple addresses for same patient
Patient matching projects	$100,000	Periodic reconciliation initiatives
Total Annual Cost	$825,000	In 1975 dollars (~$4.8M today)

Mathematical Analysis: Why Inconsistency Is Inevitable

We can use probability theory to demonstrate that inconsistency is mathematically inevitable in redundant systems, and that its likelihood increases with redundancy.

Model Setup:

Consider a piece of data stored in n independent locations. Let p be the probability that a change to the real-world value successfully propagates to a single copy.

Probability of Full Consistency:

For all n copies to remain consistent after a change, all must be updated:

P(all copies consistent) = p^n

Probability of Inconsistency:

P(at least one inconsistency) = 1 - p^n

Probability of Inconsistency by Redundancy Level
Update Success Rate (p)	n=2 copies	n=3 copies	n=5 copies	n=10 copies
99%	2.0%	3.0%	4.9%	9.6%
95%	9.8%	14.3%	22.6%	40.1%
90%	19.0%	27.1%	41.0%	65.1%
80%	36.0%	48.8%	67.2%	89.3%
70%	51.0%	65.7%	83.2%	97.2%

The Mathematics Are Unforgiving

Even with a 95% success rate on each update, storing data in 5 locations gives a 22.6% chance of inconsistency per change. Over thousands of changes per month, inconsistency isn't just possible—it's guaranteed.

Compounding Over Time:

The situation worsens as we consider multiple changes over time. If data changes k times per year, the probability of maintaining consistency after one year is:

P(consistent after k changes) = (p^n)^k = p^(n×k)

Example: A customer address stored in 3 systems, with 90% update success rate, changing once per year:

After 1 year: 72.9% consistent, 27.1% inconsistent
After 3 years: 38.7% consistent, 61.3% inconsistent
After 5 years: 20.6% consistent, 79.4% inconsistent

Without intervention, most records eventually become inconsistent.

The Consistency-Redundancy Tradeoff

The mathematical analysis reveals a fundamental tradeoff inherent in file-based systems:

You cannot simultaneously have:

High data redundancy (for performance and availability)
Strong data consistency (for reliability and accuracy)
Low coordination overhead (for development speed and flexibility)

Pick any two.

Converting Mermaid diagram...

File-Based Systems:

File-based systems chose high redundancy and low coordination, sacrificing consistency. Each application had its own copies, with no coordination overhead. The result: inevitable inconsistency.

Possible Alternatives:

Approach	Trade	Challenge
Single Copy (No Redundancy)	Give up redundancy for consistency	Performance bottleneck; single point of failure
Synchronized Copies	Accept coordination overhead	Complex; slow; what file systems couldn't provide
Accept Inconsistency	Live with errors	Unreliable data; business risk

The DBMS Solution

Database Management Systems solved this tradeoff by internalizing coordination. The DBMS manages all copies, automatically propagates updates, and guarantees consistency. Applications get the benefits of the DBMS's internal redundancy (for performance and recovery) without managing it themselves. This insight—that coordination must be centralized to be effective—defined the transition from file-based to database-based systems.

Detection and Remediation Challenges

Given that inconsistency was inevitable in file-based systems, organizations developed approaches for detecting and fixing problems. These approaches were expensive, imperfect, and never-ending.

Detection Methods:

•Periodic Reconciliation — Regular comparison of files to identify discrepancies. Often found thousands of differences.
•Exception Reporting — Reports highlighting records with impossible or conflicting values. Required subject matter expertise to analyze.
•Operational Discovery — Finding problems during normal business operations ('Why do these two reports show different totals?').
•Customer Complaints — 'You sent my bill to my old address again'. Reactive and embarrassing.
•Audit Findings — External auditors discovering control weaknesses. Often triggered remediation projects.

The Remediation Challenge:

Once inconsistency was detected, fixing it was often harder than finding it:

Remediation Obstacles

•Which value is correct? — With 3 systems showing 3 addresses, which is current? Often required calling the customer or researching paper records.
•Who has authority? — Which department 'owns' this data and can approve changes?
•Access issues — The person who detects the problem may not have update access to all affected systems.
•Format mapping — How does 'NY' in one system map to 'New York' in another? Conversion isn't always straightforward.
•Downstream effects — Fixing a customer number may require fixing hundreds of related transactions.
•New inconsistencies — While fixing old problems, new ones accumulate. The backlog never empties.

The Sisyphean Nature of Remediation

Organizations launched 'data quality initiatives' regularly—one-time projects to clean up inconsistencies. But without addressing the architectural cause (redundancy without coordination), the same problems returned. These projects were costly, disruptive, and provided only temporary relief.

Summary: Understanding the Twin Problems

We've now examined data redundancy and inconsistency in comprehensive detail. These twin problems defined the crisis that drove the development of Database Management Systems. Let's consolidate our understanding:

Key Takeaways

•Redundancy is structural, not accidental — Organizational, technical, and practical forces made redundancy inevitable in file-based systems.
•Inconsistency is the mathematical consequence of redundancy — With multiple copies and imperfect update propagation, inconsistency is guaranteed, not merely possible.
•Update anomalies expose design flaws — Insertion, deletion, and modification anomalies reveal the fundamental problem of mixing independent facts in single records.
•Inconsistency has real costs — Direct costs (labor, rework), indirect costs (errors, delays), and strategic costs (inability to trust data for decisions).
•The tradeoff is fundamental — High redundancy, strong consistency, and low coordination cannot coexist. File systems sacrificed consistency.
•Remediation without architectural change is futile — Data quality projects provide temporary relief but don't address root causes.
•The solution requires centralized data management — Only by centralizing data control can coordination be effective. This insight led to DBMS architecture.

What's Next:

With our understanding of redundancy and inconsistency complete, we'll examine the third major limitation of file-based systems: data isolation. This problem—data trapped in application silos, unable to be combined or queried across systems—prevented organizations from gaining the cross-functional insights that modern business requires.

Page Complete

You now have deep understanding of data redundancy and inconsistency—their definitions, causes, manifestations, and mathematical inevitability. This understanding is essential for appreciating data integrity features in DBMS and the design principles behind normalization theory.

Data Redundancy and Inconsistency

The Twin Problems That Defined a Crisis

Data redundancy creates the conditions for data inconsistency. And data inconsistency erodes the very foundation of information systems: the ability to trust that data represents reality.

This page examines these problems in depth—not because they're merely historical curiosities, but because their ghosts still haunt modern systems wherever data architecture principles are violated.

What You Will Learn

Formal Definition of Data Redundancy

Data redundancy occurs when the same piece of information is stored in more than one place within a data management system. More formally:

Definition: A data element d exhibits redundancy in system S if d is derivable from other data elements in S, or if d is stored in multiple independent locations within S.

This definition captures two distinct but related forms of redundancy:

Storage Redundancy

•Same data value stored in multiple files
•Same data value stored in multiple records within a file
•Example: Customer address in Orders, Invoices, and Shipping files

Derivational Redundancy

•Storing data that can be computed from other data
•Example: Storing both unit_price, quantity, AND line_total
•line_total = unit_price × quantity (derivable, thus redundant)

Measuring Redundancy:

We can quantify redundancy in a system using the Redundancy Factor (RF):

RF = (Total Storage Used) / (Minimum Necessary Storage)

RF = 1.0 → No redundancy (theoretical minimum)
RF = 2.0 → Data is stored twice on average
RF = 5.0 → Data is stored five times on average

Studies of enterprise file systems in the 1970s and 1980s found typical RF values of 3.0 to 6.0, meaning organizations stored the same data 3-6 times on average.

Not All Redundancy Is Bad

Root Causes of Redundancy in File Systems

Understanding why redundancy occurs in file-based systems reveals that it's not a failure of discipline but a structural inevitability. Several forces combine to make redundancy unavoidable:

Departmental Autonomy:

Organizations are structured into departments with distinct responsibilities, budgets, and priorities. Each department:

Develops systems to meet its specific needs
Controls its own data without central coordination
Has different development timelines and priorities
Views 'its' data as proprietary

Historical Accumulation:

Systems are built incrementally over years or decades:

1965: Accounting department builds Accounts Payable system
1968: Purchasing department builds Vendor Management system
1972: Quality department builds Supplier Qualification system

Each system independently stores supplier information. By the time anyone notices the redundancy, three departments depend on three different files, and unification seems impossible.

Organizational Barriers

Anatomy of Data Inconsistency

Formal Definition:

Definition: Data elements d₁ and d₂ are inconsistent if they purport to represent the same real-world fact but contain different values, where the difference cannot be explained by valid temporal variation.

Dimensions of Inconsistency:

Inconsistency manifests across multiple dimensions:

Dimensions of Data Inconsistency
Dimension	Description	Example
Value Inconsistency	Same attribute, different values	Address in System A: '123 Main St'; System B: '123 Main Street'
Format Inconsistency	Same value, different representations	Date: '2024-03-15' vs '03/15/2024' vs '15-MAR-24'
Semantic Inconsistency	Same term, different meanings	'Revenue' includes vs excludes returns
Temporal Inconsistency	Different points in time treated as same	Last year's revenue in one report, YTD in another
Structural Inconsistency	Same entity, different decomposition	Full name as one field vs separate first/middle/last
Identity Inconsistency	Same entity, different identifiers	Customer #12345 in Sales = Customer #C-98765 in Support?

The Inconsistency Lifecycle:

Inconsistency develops through predictable stages:

Converting Mermaid diagram...

Update Anomalies: The Technical Details

The Three Canonical Anomalies:

Insertion Anomaly occurs when you cannot add a record about one entity without also adding data about a different entity.

Example: Consider a file that combines employee and department information:

employee_dept.txt
EMPLOYEE-DEPARTMENT FILE:
EmpID | EmpName     | DeptNo | DeptName     | DeptHead
------+-------------+--------+--------------+-----------
E101  | Alice Smith | D10    | Engineering  | Bob Jones
E102  | Carol White | D10    | Engineering  | Bob Jones
E103  | David Brown | D20    | Marketing    | Eve Adams
 
PROBLEM: We want to add a new department "Research (D30)" 
         headed by "Frank Miller"
 
IMPOSSIBLE! We cannot insert a department without 
at least one employee assigned to it.
 
The file structure forces us to either:
  1. Create a fake employee for D30
  2. Wait until we hire someone for Research
  3. Leave Research unrecorded
 
All options are problematic.

Real Impact

The Root Cause:

Departments can only exist if employees exist (insertion anomaly)
Deleting employees may delete departments (deletion anomaly)
Updating departments requires updating all employees (modification anomaly)

This insight—that different facts should be stored separately and linked by references—became the foundation of relational database design.

Case Study: Redundancy and Inconsistency in a Hospital

To understand the real-world impact of data redundancy and inconsistency, let's examine a detailed case study from a 1970s hospital that operated on file-based systems.

Background:

Metropolitan General Hospital had 500 beds, 2,000 employees, and saw 50,000 patients annually. Like most hospitals of the era, it operated with separate file-based systems for different departments.

Hospital Systems and Patient Data
System	Department	Patient Data Stored	Records
Admissions System	Admissions	Demographics, insurance, guarantor	150,000
Medical Records	HIM	Demographics, diagnoses, procedures	150,000
Billing System	Finance	Demographics, insurance, charges	200,000
Pharmacy System	Pharmacy	Demographics, allergies, medications	100,000
Lab System	Laboratory	Demographics, test orders, results	500,000
Radiology System	Radiology	Demographics, orders, reports	75,000

The Redundancy Situation:

•90% of patient demographic data was redundant across systems
•70% of insurance information was duplicated
•60% of allergy information was stored in multiple places
•An update to a patient's address required entry into 4-6 systems

The Inconsistency Rate

Clinical Impact:

Inconsistency in hospital data created real patient safety risks:

Patient Safety Incidents Traced to Data Inconsistency

•Wrong patient identification: Lab results delivered to wrong patient file because of name spelling variations
•Allergy alert failures: Patient allergic to penicillin in Pharmacy system, not recorded in physician order system
•Insurance claim denials: 15% of claims initially denied due to demographic mismatches with insurance records
•Medication delays: Duplicate patient records caused confusion about which record was authoritative
•Missing test results: Results filed under 'John Q. Smith' not linked to physician orders for 'J. Quincy Smith'

Financial Impact:

The hospital estimated annual costs of redundancy and inconsistency:

Cost Category	Annual Estimate	Notes
Duplicate data entry labor	$180,000	Staff time entering same data multiple times
Error correction labor	$120,000	Investigating and fixing inconsistencies
Denied claim rework	$240,000	Resubmitting claims with corrected information
Lost revenue (unfiled claims)	$150,000	Claims never filed due to data confusion
Duplicate mailings	$35,000	Multiple addresses for same patient
Patient matching projects	$100,000	Periodic reconciliation initiatives
Total Annual Cost	$825,000	In 1975 dollars (~$4.8M today)

Mathematical Analysis: Why Inconsistency Is Inevitable

We can use probability theory to demonstrate that inconsistency is mathematically inevitable in redundant systems, and that its likelihood increases with redundancy.

Model Setup:

Consider a piece of data stored in n independent locations. Let p be the probability that a change to the real-world value successfully propagates to a single copy.

Probability of Full Consistency:

For all n copies to remain consistent after a change, all must be updated:

P(all copies consistent) = p^n

Probability of Inconsistency:

P(at least one inconsistency) = 1 - p^n

Probability of Inconsistency by Redundancy Level
Update Success Rate (p)	n=2 copies	n=3 copies	n=5 copies	n=10 copies
99%	2.0%	3.0%	4.9%	9.6%
95%	9.8%	14.3%	22.6%	40.1%
90%	19.0%	27.1%	41.0%	65.1%
80%	36.0%	48.8%	67.2%	89.3%
70%	51.0%	65.7%	83.2%	97.2%

The Mathematics Are Unforgiving

Compounding Over Time:

The situation worsens as we consider multiple changes over time. If data changes k times per year, the probability of maintaining consistency after one year is:

P(consistent after k changes) = (p^n)^k = p^(n×k)

Example: A customer address stored in 3 systems, with 90% update success rate, changing once per year:

After 1 year: 72.9% consistent, 27.1% inconsistent
After 3 years: 38.7% consistent, 61.3% inconsistent
After 5 years: 20.6% consistent, 79.4% inconsistent

Without intervention, most records eventually become inconsistent.

The Consistency-Redundancy Tradeoff

The mathematical analysis reveals a fundamental tradeoff inherent in file-based systems:

You cannot simultaneously have:

High data redundancy (for performance and availability)
Strong data consistency (for reliability and accuracy)
Low coordination overhead (for development speed and flexibility)

Pick any two.

Converting Mermaid diagram...

File-Based Systems:

Possible Alternatives:

Approach	Trade	Challenge
Single Copy (No Redundancy)	Give up redundancy for consistency	Performance bottleneck; single point of failure
Synchronized Copies	Accept coordination overhead	Complex; slow; what file systems couldn't provide
Accept Inconsistency	Live with errors	Unreliable data; business risk

The DBMS Solution

Detection and Remediation Challenges

Given that inconsistency was inevitable in file-based systems, organizations developed approaches for detecting and fixing problems. These approaches were expensive, imperfect, and never-ending.

Detection Methods:

•Periodic Reconciliation — Regular comparison of files to identify discrepancies. Often found thousands of differences.
•Exception Reporting — Reports highlighting records with impossible or conflicting values. Required subject matter expertise to analyze.
•Operational Discovery — Finding problems during normal business operations ('Why do these two reports show different totals?').
•Customer Complaints — 'You sent my bill to my old address again'. Reactive and embarrassing.
•Audit Findings — External auditors discovering control weaknesses. Often triggered remediation projects.

The Remediation Challenge:

Once inconsistency was detected, fixing it was often harder than finding it:

Remediation Obstacles

•Which value is correct? — With 3 systems showing 3 addresses, which is current? Often required calling the customer or researching paper records.
•Who has authority? — Which department 'owns' this data and can approve changes?
•Access issues — The person who detects the problem may not have update access to all affected systems.
•Format mapping — How does 'NY' in one system map to 'New York' in another? Conversion isn't always straightforward.
•Downstream effects — Fixing a customer number may require fixing hundreds of related transactions.
•New inconsistencies — While fixing old problems, new ones accumulate. The backlog never empties.

The Sisyphean Nature of Remediation

Summary: Understanding the Twin Problems

Key Takeaways

•Redundancy is structural, not accidental — Organizational, technical, and practical forces made redundancy inevitable in file-based systems.
•Inconsistency is the mathematical consequence of redundancy — With multiple copies and imperfect update propagation, inconsistency is guaranteed, not merely possible.
•Update anomalies expose design flaws — Insertion, deletion, and modification anomalies reveal the fundamental problem of mixing independent facts in single records.
•Inconsistency has real costs — Direct costs (labor, rework), indirect costs (errors, delays), and strategic costs (inability to trust data for decisions).
•The tradeoff is fundamental — High redundancy, strong consistency, and low coordination cannot coexist. File systems sacrificed consistency.
•Remediation without architectural change is futile — Data quality projects provide temporary relief but don't address root causes.
•The solution requires centralized data management — Only by centralizing data control can coordination be effective. This insight led to DBMS architecture.

What's Next:

Page Complete