Three Level Architecture - Learning Module

Loading content...

0/252

ANSI-SPARC Architecture

The Blueprint That Shaped Database Design

In the early 1970s, databases were proliferating but chaos reigned. Each vendor had its own terminology, its own architecture, its own approach to data management. Programs written for one database system couldn't work with another. There was no common vocabulary, no shared understanding of how database systems should be structured.

In 1972, the American National Standards Institute (ANSI) formed the SPARC (Standards Planning and Requirements Committee) Study Group on Database Management Systems. Their mission: bring order to chaos by defining a standard architecture that all database systems could follow.

The result, published in 1975 and refined through 1978, was the ANSI-SPARC three-level architecture—the formal framework we've been exploring. This wasn't just academic theory; it became the foundation upon which virtually all modern relational databases are built.

What You Will Learn

By the end of this page, you will understand the historical context of the ANSI-SPARC architecture, the specific goals it addressed, the formal structure of the three-schema framework, how mapping between levels works, and why this 50-year-old architecture remains relevant for modern database design.

Historical Context: The Database Crisis

To appreciate the ANSI-SPARC architecture, we must understand the problems it was designed to solve.

The State of Databases in the Early 1970s

By 1970, organizations were increasingly dependent on computerized data processing, but the landscape was fragmented:

Problems of the Pre-ANSI-SPARC Era

•No Standard Data Independence — Programs were tightly coupled to physical storage. Changing how data was stored meant rewriting applications.
•Vendor Lock-In — Each DBMS vendor (IBM's IMS, Honeywell's IDS, others) had proprietary interfaces. Migrating was nearly impossible.
•Confused Terminology — Terms like 'record', 'segment', 'schema' meant different things to different vendors, making communication difficult.
•Poor Separation of Concerns — Database administrators, application programmers, and end users often worked with the same low-level constructs.
•Limited Portability — Applications couldn't be moved between systems or share data across organizational boundaries.
•Difficult Evolution — As requirements changed, both applications and databases had to be completely restructured.

The CODASYL Efforts

The CODASYL (Conference on Data Systems Languages) committee had developed the network data model and COBOL. They also produced database specifications, but these were closely tied to specific implementation approaches.

The Need for Abstraction

What was missing was a high-level, implementation-independent architecture that could:

Separate logical and physical concerns
Provide stable interfaces for different user groups
Allow DBMS vendors to innovate while maintaining compatibility
Establish a common vocabulary for the industry

This is precisely what the ANSI-SPARC committee set out to create.

Timeline

1971: ANSI establishes SPARC • 1972: Study Group on DBMS formed • 1975: Interim report published • 1978: Final report establishing the three-schema architecture • 1980s onwards: Architecture becomes industry standard

The ANSI-SPARC Three-Schema Framework

The ANSI-SPARC architecture defines three levels of abstraction for describing a database, each represented by a schema:

External Schema (User View) — Individual user perspectives
Conceptual Schema (Community View) — The complete logical structure
Internal Schema (Storage View) — Physical storage structures

This separation achieves data independence: changes at one level don't necessarily require changes at other levels.

Converting Mermaid diagram...

ANSI-SPARC Schema Responsibilities
Schema Level	Defined By	Contains	Purpose
External Schema	Application Developer, DBA	Views, derived data, access permissions	Tailor database for specific users
Conceptual Schema	Enterprise Data Architect, DBA	All entities, relationships, constraints	Single unified truth of all data
Internal Schema	DBA, System Administrator	File structures, indexes, storage allocation	Optimize physical data access

One Conceptual, Many External, One Internal

A key ANSI-SPARC principle: there is exactly ONE conceptual schema (the single source of truth), exactly ONE internal schema (the physical implementation), but MANY external schemas (one or more per user group). This ensures all views derive from the same logical model.

The Two Mappings: Linking the Levels

The ANSI-SPARC architecture describes not just three levels but also the mappings between them. These mappings are crucial—they translate requests and data between levels.

External/Conceptual Mapping

This mapping defines how each external schema relates to the conceptual schema:

Which conceptual entities are visible in the external view
How conceptual attributes map to external attributes (possibly renamed, computed)
What transformations are applied (aggregations, filters, joins)
Access control enforcement

external_conceptual_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Conceptual Schema (simplified)
CREATE TABLE Employee (
    employee_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    department_id INTEGER,
    salary DECIMAL(10,2),
    hire_date DATE
);
 
CREATE TABLE Department (
    department_id INTEGER PRIMARY KEY,
    department_name VARCHAR(100)
);
 
-- External/Conceptual Mapping: The view definition IS the mapping
CREATE VIEW HR_Employee_View AS
SELECT 
    e.employee_id,                                    -- Direct mapping
    e.first_name || ' ' || e.last_name AS full_name,  -- Computed mapping
    d.department_name,                                -- Join mapping
    e.salary,                                         -- Direct mapping
    EXTRACT(YEAR FROM AGE(e.hire_date)) AS years_employed  -- Derived
FROM Employee e
JOIN Department d ON e.department_id = d.department_id;
 
-- The mapping tells the DBMS:
-- - "full_name" = concatenation of two conceptual columns
-- - "department_name" comes from Department table via join
-- - "years_employed" is computed from hire_date
-- - Salary IS directly visible (no hiding in this view)

Conceptual/Internal Mapping

This mapping defines how the logical structures of the conceptual schema are stored physically:

Which files store which tables
How records are physically formatted (fixed vs variable length)
Which indexes exist for which attributes
Storage allocation (tablespaces, partitions)
Compression and encryption settings

conceptual_internal_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Conceptual Schema: What exists logically
CREATE TABLE Order (
    order_id BIGINT PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(12,2)
);
 
-- Conceptual/Internal Mapping: Physical implementation decisions
 
-- 1. Storage Location
ALTER TABLE Order SET TABLESPACE fast_ssd_ts;
 
-- 2. Partitioning Scheme
ALTER TABLE Order PARTITION BY RANGE (order_date);
-- Creates separate physical files for each time range
 
-- 3. Index Structures
CREATE INDEX idx_order_customer ON Order(customer_id);
-- B-tree index on customer_id for fast lookups
 
CREATE INDEX idx_order_date ON Order(order_date);
-- B-tree index on order_date for range queries
 
-- 4. Clustering
CLUSTER Order USING idx_order_date;
-- Physically order rows by order_date
 
-- 5. Storage Parameters
ALTER TABLE Order SET (
    fillfactor = 80,        -- Leave 20% free for updates
    autovacuum_enabled = true,
    toast_tuple_target = 128
);
 
-- Note: The conceptual schema (TABLE definition) is unchanged!
-- All these are internal-level decisions that affect performance,
-- not the logical structure.

Mapping Tables

In formal ANSI-SPARC terminology, these mappings would be stored in 'mapping tables' maintained by the DBMS. In practice, modern databases store this information in the system catalog (metadata tables) that track views, indexes, tablespaces, and other schema objects.

Data Independence: The Core Achievement

The primary goal of the ANSI-SPARC architecture is data independence—the ability to change one level without affecting others. There are two types:

Logical data independence is the ability to change the conceptual schema without requiring changes to external schemas or application programs.

What It Enables

Add new entities or attributes without breaking existing views
Reorganize relationships while maintaining view compatibility
Split or merge tables transparently
Rename conceptual elements if views absorb the change

How It Works

When the conceptual schema changes, the external/conceptual mapping is updated to maintain the same external view interface. Applications continue to work because their view of data hasn't changed.

logical_independence_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- BEFORE: Single Customer table
CREATE TABLE Customer (
    customer_id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    street VARCHAR(100),
    city VARCHAR(50),
    country VARCHAR(50),
    credit_limit DECIMAL(10,2)
);
 
-- External view that applications use
CREATE VIEW Customer_View AS
SELECT customer_id, name, 
       street || ', ' || city || ', ' || country AS full_address,
       credit_limit
FROM Customer;
 
-- AFTER: Conceptual schema changes - split into two tables
-- (perhaps for normalization or to handle multiple addresses)
CREATE TABLE Customer_v2 (
    customer_id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    credit_limit DECIMAL(10,2)
);
 
CREATE TABLE Customer_Address (
    address_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES Customer_v2,
    street VARCHAR(100),
    city VARCHAR(50),
    country VARCHAR(50),
    is_primary BOOLEAN DEFAULT TRUE
);
 
-- Update the external view to maintain SAME interface
CREATE OR REPLACE VIEW Customer_View AS
SELECT c.customer_id, c.name,
       a.street || ', ' || a.city || ', ' || a.country AS full_address,
       c.credit_limit
FROM Customer_v2 c
LEFT JOIN Customer_Address a ON c.customer_id = a.customer_id 
                             AND a.is_primary = TRUE;
 
-- Applications see: customer_id, name, full_address, credit_limit
-- UNCHANGED - even though conceptual schema is completely different!

Data Independence Summary
Type	What Changes	What Stays Stable	Absorbs Change
Logical Independence	Conceptual Schema	External Schemas & Applications	External/Conceptual Mapping
Physical Independence	Internal Schema	Conceptual Schema & Everything Above	Conceptual/Internal Mapping

The Real-World Impact

Data independence is why you can add an index without recompiling applications, why you can upgrade from HDD to SSD without rewriting queries, and why a DBA can reorganize storage while the system is running. It's the foundation of maintainable database systems.

ANSI-SPARC System Components

The ANSI-SPARC specification described not just the three schemas but also the components of a database system that interact with them. These components remain relevant in modern DBMS architecture.

Key ANSI-SPARC Components

•Data Dictionary (Catalog) — Centralized repository storing all three schemas and the mappings between them. The 'metadata database'.
•Query Processor — Transforms user queries (against external views) into operations on the conceptual schema, then into internal operations.
•Data Manager — Executes internal-level operations, managing files, pages, and records on physical storage.
•Buffer Manager — Manages the buffer pool, handling page caching and replacement policies.
•Transaction Manager — Ensures ACID properties, coordinating concurrent access and recovery.
•Recovery Manager — Handles crash recovery using logs to restore consistent state.
•Authorization Manager — Controls access based on external schema permissions.

Converting Mermaid diagram...

Query Processing Flow

When a user submits a query:

Parse: SQL is parsed to verify syntax
Resolve: Names are looked up in external schema
Authorize: Permissions are checked
Transform: External query is translated via mapping to conceptual operations
Optimize: Query optimizer uses internal schema knowledge to find the best physical plan
Execute: Data manager executes via buffer manager
Return: Results transformed back through mappings to external format

Modern Equivalents

Modern DBMS components directly descend from ANSI-SPARC: PostgreSQL's 'pg_catalog' schema = Data Dictionary, 'query planner' = Query Processor, 'storage manager' = Data Manager, 'shared buffers' = Buffer Manager, etc. The names change but the architecture remains.

Strengths and Limitations

The ANSI-SPARC architecture, while foundational, is not without limitations. Understanding both helps apply it effectively.

Strengths

•Clear Separation of Concerns — Each level has distinct responsibilities
•Data Independence — Changes localized to affected levels
•Standard Vocabulary — Common terms across the industry
•Flexibility — Works with relational, object, and other models
•Security Framework — External level enables access control
•Evolution Support — Databases can grow without complete rewrites

Limitations

•Mapping Overhead — Transformations add processing cost
•View Update Problem — Not all views are updatable
•Perfect Independence is Rare — Some changes cascade
•Distributed Systems — Architecture doesn't fully address multi-node
•Performance Trade-offs — Abstraction can limit optimization
•NoSQL Fit — Some modern databases don't fully follow model

The Abstraction Tax

With every layer of abstraction comes overhead. The query processor must translate through mappings, apply transformations, and make optimization decisions that might be unnecessary if applications knew the physical structure directly.

However, this 'tax' is almost always worth paying because:

Most queries are well-served by the optimizer's choices
Maintenance costs without abstraction are far higher
The overhead is usually measured in microseconds, not seconds
Modern optimizers are extraordinarily sophisticated

When to Break the Rules

In extreme performance scenarios, architects sometimes 'punch through' abstraction layers—bypassing views to query base tables directly, using database-specific physical hints, or embedding storage awareness in applications. This sacrifices maintainability for performance and should be done sparingly and deliberately.

ANSI-SPARC in Modern Databases

How do modern database systems implement the ANSI-SPARC architecture? Let's examine several major systems:

ANSI-SPARC Implementation Across Databases
Component	PostgreSQL	MySQL/InnoDB	Oracle
External Schemas	VIEWs, schemas, roles	VIEWs, databases, users	VIEWs, schemas, users
Conceptual Schema	pg_catalog, system tables	information_schema	Data Dictionary (DBA_%)
Internal Schema	pg_class, pg_index, etc.	INNODB_SYS_* tables	DBA_SEGMENTS, DBA_EXTENTS
Ext/Con Mapping	View definitions in pg_views	View defs in VIEWS table	USER_VIEWS, dependency tracking
Con/Int Mapping	pg_class.relfilenode	tablespace_id, index_id	FILE_ID, BLOCK_ID mapping
Query Processor	Planner/Optimizer	Query Optimizer	Cost-Based Optimizer
Buffer Manager	shared_buffers	Buffer Pool	Database Buffer Cache

modern_ansi_sparc.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- PostgreSQL: Exploring the three levels
 
-- EXTERNAL LEVEL: What views exist?
SELECT schemaname, viewname, viewowner, definition
FROM pg_views
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
 
-- CONCEPTUAL LEVEL: Table structure
SELECT table_name, column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;
 
-- INTERNAL LEVEL: Physical storage details
SELECT 
    c.relname AS table_name,
    c.relfilenode AS file_node,
    c.relpages AS pages,
    c.reltuples AS estimated_rows,
    pg_size_pretty(pg_relation_size(c.oid)) AS size,
    t.spcname AS tablespace
FROM pg_class c
LEFT JOIN pg_tablespace t ON c.reltablespace = t.oid
WHERE c.relkind = 'r' AND c.relnamespace = 'public'::regnamespace;
 
-- INTERNAL LEVEL: Index structures
SELECT 
    tablename,
    indexname,
    indexdef
FROM pg_indexes
WHERE schemaname = 'public';
 
-- See the MAPPING: How view translates to base tables
SELECT pg_get_viewdef('my_view_name'::regclass);

NoSQL and Non-Relational Databases

NoSQL databases often implement ANSI-SPARC partially:

Document databases (MongoDB): Weak external/conceptual distinction (no views traditionally), but clear conceptual/internal separation
Column stores (Cassandra): Physical storage structure is more exposed to users
Graph databases (Neo4j): Strong conceptual model, but physical structure often visible for optimization

The architecture remains a useful lens even when not fully implemented.

Cloud-Native Evolution

Cloud databases (Aurora, Spanner, CockroachDB) add new internal-level complexity: distributed storage, automatic sharding, cross-region replication. The ANSI-SPARC principle—that users shouldn't need to know these details—remains the guiding philosophy.

Summary: ANSI-SPARC Architecture

We've explored the ANSI-SPARC architecture—the formal framework that has guided database design for 50 years. Here are the essential takeaways:

Key Takeaways

•ANSI-SPARC emerged from industry need — The 1970s database chaos required a standardized architecture that all vendors could follow.
•Three levels provide separation of concerns — External (user views), Conceptual (logical structure), Internal (physical storage).
•Mappings connect the levels — External/Conceptual mapping translates views; Conceptual/Internal mapping handles storage decisions.
•Data independence is the core achievement — Logical and physical changes can be made without cascading to other levels.
•Modern databases implement ANSI-SPARC — VIEWs, system catalogs, storage engines all trace back to this architecture.
•The architecture remains relevant — Despite NoSQL and cloud evolution, the principles of abstraction and independence endure.

What's Next:

We've now covered the complete three-level architecture and its formal ANSI-SPARC definition. The final piece is understanding how these levels interact—the specific transformations, query processing steps, and data flow that make the architecture work in practice.

Page Complete

You now understand the ANSI-SPARC architecture—its historical origins, formal structure, the crucial mappings between levels, and how it achieves data independence. This knowledge provides the theoretical foundation for understanding how any database system is organized.

ANSI-SPARC Architecture

The Blueprint That Shaped Database Design

What You Will Learn

Historical Context: The Database Crisis

To appreciate the ANSI-SPARC architecture, we must understand the problems it was designed to solve.

The State of Databases in the Early 1970s

By 1970, organizations were increasingly dependent on computerized data processing, but the landscape was fragmented:

Problems of the Pre-ANSI-SPARC Era

•No Standard Data Independence — Programs were tightly coupled to physical storage. Changing how data was stored meant rewriting applications.
•Vendor Lock-In — Each DBMS vendor (IBM's IMS, Honeywell's IDS, others) had proprietary interfaces. Migrating was nearly impossible.
•Confused Terminology — Terms like 'record', 'segment', 'schema' meant different things to different vendors, making communication difficult.
•Poor Separation of Concerns — Database administrators, application programmers, and end users often worked with the same low-level constructs.
•Limited Portability — Applications couldn't be moved between systems or share data across organizational boundaries.
•Difficult Evolution — As requirements changed, both applications and databases had to be completely restructured.

The CODASYL Efforts

The Need for Abstraction

What was missing was a high-level, implementation-independent architecture that could:

Separate logical and physical concerns
Provide stable interfaces for different user groups
Allow DBMS vendors to innovate while maintaining compatibility
Establish a common vocabulary for the industry

This is precisely what the ANSI-SPARC committee set out to create.

Timeline

The ANSI-SPARC Three-Schema Framework

The ANSI-SPARC architecture defines three levels of abstraction for describing a database, each represented by a schema:

External Schema (User View) — Individual user perspectives
Conceptual Schema (Community View) — The complete logical structure
Internal Schema (Storage View) — Physical storage structures

This separation achieves data independence: changes at one level don't necessarily require changes at other levels.

Converting Mermaid diagram...

ANSI-SPARC Schema Responsibilities
Schema Level	Defined By	Contains	Purpose
External Schema	Application Developer, DBA	Views, derived data, access permissions	Tailor database for specific users
Conceptual Schema	Enterprise Data Architect, DBA	All entities, relationships, constraints	Single unified truth of all data
Internal Schema	DBA, System Administrator	File structures, indexes, storage allocation	Optimize physical data access

One Conceptual, Many External, One Internal

The Two Mappings: Linking the Levels

The ANSI-SPARC architecture describes not just three levels but also the mappings between them. These mappings are crucial—they translate requests and data between levels.

External/Conceptual Mapping

This mapping defines how each external schema relates to the conceptual schema:

Which conceptual entities are visible in the external view
How conceptual attributes map to external attributes (possibly renamed, computed)
What transformations are applied (aggregations, filters, joins)
Access control enforcement

external_conceptual_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Conceptual Schema (simplified)
CREATE TABLE Employee (
    employee_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    department_id INTEGER,
    salary DECIMAL(10,2),
    hire_date DATE
);
 
CREATE TABLE Department (
    department_id INTEGER PRIMARY KEY,
    department_name VARCHAR(100)
);
 
-- External/Conceptual Mapping: The view definition IS the mapping
CREATE VIEW HR_Employee_View AS
SELECT 
    e.employee_id,                                    -- Direct mapping
    e.first_name || ' ' || e.last_name AS full_name,  -- Computed mapping
    d.department_name,                                -- Join mapping
    e.salary,                                         -- Direct mapping
    EXTRACT(YEAR FROM AGE(e.hire_date)) AS years_employed  -- Derived
FROM Employee e
JOIN Department d ON e.department_id = d.department_id;
 
-- The mapping tells the DBMS:
-- - "full_name" = concatenation of two conceptual columns
-- - "department_name" comes from Department table via join
-- - "years_employed" is computed from hire_date
-- - Salary IS directly visible (no hiding in this view)

Conceptual/Internal Mapping

This mapping defines how the logical structures of the conceptual schema are stored physically:

Which files store which tables
How records are physically formatted (fixed vs variable length)
Which indexes exist for which attributes
Storage allocation (tablespaces, partitions)
Compression and encryption settings

conceptual_internal_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Conceptual Schema: What exists logically
CREATE TABLE Order (
    order_id BIGINT PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(12,2)
);
 
-- Conceptual/Internal Mapping: Physical implementation decisions
 
-- 1. Storage Location
ALTER TABLE Order SET TABLESPACE fast_ssd_ts;
 
-- 2. Partitioning Scheme
ALTER TABLE Order PARTITION BY RANGE (order_date);
-- Creates separate physical files for each time range
 
-- 3. Index Structures
CREATE INDEX idx_order_customer ON Order(customer_id);
-- B-tree index on customer_id for fast lookups
 
CREATE INDEX idx_order_date ON Order(order_date);
-- B-tree index on order_date for range queries
 
-- 4. Clustering
CLUSTER Order USING idx_order_date;
-- Physically order rows by order_date
 
-- 5. Storage Parameters
ALTER TABLE Order SET (
    fillfactor = 80,        -- Leave 20% free for updates
    autovacuum_enabled = true,
    toast_tuple_target = 128
);
 
-- Note: The conceptual schema (TABLE definition) is unchanged!
-- All these are internal-level decisions that affect performance,
-- not the logical structure.

Mapping Tables

Data Independence: The Core Achievement

The primary goal of the ANSI-SPARC architecture is data independence—the ability to change one level without affecting others. There are two types:

Logical data independence is the ability to change the conceptual schema without requiring changes to external schemas or application programs.

What It Enables

Add new entities or attributes without breaking existing views
Reorganize relationships while maintaining view compatibility
Split or merge tables transparently
Rename conceptual elements if views absorb the change

How It Works

When the conceptual schema changes, the external/conceptual mapping is updated to maintain the same external view interface. Applications continue to work because their view of data hasn't changed.

logical_independence_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- BEFORE: Single Customer table
CREATE TABLE Customer (
    customer_id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    street VARCHAR(100),
    city VARCHAR(50),
    country VARCHAR(50),
    credit_limit DECIMAL(10,2)
);
 
-- External view that applications use
CREATE VIEW Customer_View AS
SELECT customer_id, name, 
       street || ', ' || city || ', ' || country AS full_address,
       credit_limit
FROM Customer;
 
-- AFTER: Conceptual schema changes - split into two tables
-- (perhaps for normalization or to handle multiple addresses)
CREATE TABLE Customer_v2 (
    customer_id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    credit_limit DECIMAL(10,2)
);
 
CREATE TABLE Customer_Address (
    address_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES Customer_v2,
    street VARCHAR(100),
    city VARCHAR(50),
    country VARCHAR(50),
    is_primary BOOLEAN DEFAULT TRUE
);
 
-- Update the external view to maintain SAME interface
CREATE OR REPLACE VIEW Customer_View AS
SELECT c.customer_id, c.name,
       a.street || ', ' || a.city || ', ' || a.country AS full_address,
       c.credit_limit
FROM Customer_v2 c
LEFT JOIN Customer_Address a ON c.customer_id = a.customer_id 
                             AND a.is_primary = TRUE;
 
-- Applications see: customer_id, name, full_address, credit_limit
-- UNCHANGED - even though conceptual schema is completely different!

Data Independence Summary
Type	What Changes	What Stays Stable	Absorbs Change
Logical Independence	Conceptual Schema	External Schemas & Applications	External/Conceptual Mapping
Physical Independence	Internal Schema	Conceptual Schema & Everything Above	Conceptual/Internal Mapping

The Real-World Impact

ANSI-SPARC System Components

Key ANSI-SPARC Components

•Data Dictionary (Catalog) — Centralized repository storing all three schemas and the mappings between them. The 'metadata database'.
•Query Processor — Transforms user queries (against external views) into operations on the conceptual schema, then into internal operations.
•Data Manager — Executes internal-level operations, managing files, pages, and records on physical storage.
•Buffer Manager — Manages the buffer pool, handling page caching and replacement policies.
•Transaction Manager — Ensures ACID properties, coordinating concurrent access and recovery.
•Recovery Manager — Handles crash recovery using logs to restore consistent state.
•Authorization Manager — Controls access based on external schema permissions.

Converting Mermaid diagram...

Query Processing Flow

When a user submits a query:

Parse: SQL is parsed to verify syntax
Resolve: Names are looked up in external schema
Authorize: Permissions are checked
Transform: External query is translated via mapping to conceptual operations
Optimize: Query optimizer uses internal schema knowledge to find the best physical plan
Execute: Data manager executes via buffer manager
Return: Results transformed back through mappings to external format

Modern Equivalents

Strengths and Limitations

The ANSI-SPARC architecture, while foundational, is not without limitations. Understanding both helps apply it effectively.

Strengths

•Clear Separation of Concerns — Each level has distinct responsibilities
•Data Independence — Changes localized to affected levels
•Standard Vocabulary — Common terms across the industry
•Flexibility — Works with relational, object, and other models
•Security Framework — External level enables access control
•Evolution Support — Databases can grow without complete rewrites

Limitations

•Mapping Overhead — Transformations add processing cost
•View Update Problem — Not all views are updatable
•Perfect Independence is Rare — Some changes cascade
•Distributed Systems — Architecture doesn't fully address multi-node
•Performance Trade-offs — Abstraction can limit optimization
•NoSQL Fit — Some modern databases don't fully follow model

The Abstraction Tax

However, this 'tax' is almost always worth paying because:

Most queries are well-served by the optimizer's choices
Maintenance costs without abstraction are far higher
The overhead is usually measured in microseconds, not seconds
Modern optimizers are extraordinarily sophisticated

When to Break the Rules

ANSI-SPARC in Modern Databases

How do modern database systems implement the ANSI-SPARC architecture? Let's examine several major systems:

ANSI-SPARC Implementation Across Databases
Component	PostgreSQL	MySQL/InnoDB	Oracle
External Schemas	VIEWs, schemas, roles	VIEWs, databases, users	VIEWs, schemas, users
Conceptual Schema	pg_catalog, system tables	information_schema	Data Dictionary (DBA_%)
Internal Schema	pg_class, pg_index, etc.	INNODB_SYS_* tables	DBA_SEGMENTS, DBA_EXTENTS
Ext/Con Mapping	View definitions in pg_views	View defs in VIEWS table	USER_VIEWS, dependency tracking
Con/Int Mapping	pg_class.relfilenode	tablespace_id, index_id	FILE_ID, BLOCK_ID mapping
Query Processor	Planner/Optimizer	Query Optimizer	Cost-Based Optimizer
Buffer Manager	shared_buffers	Buffer Pool	Database Buffer Cache

modern_ansi_sparc.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- PostgreSQL: Exploring the three levels
 
-- EXTERNAL LEVEL: What views exist?
SELECT schemaname, viewname, viewowner, definition
FROM pg_views
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
 
-- CONCEPTUAL LEVEL: Table structure
SELECT table_name, column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;
 
-- INTERNAL LEVEL: Physical storage details
SELECT 
    c.relname AS table_name,
    c.relfilenode AS file_node,
    c.relpages AS pages,
    c.reltuples AS estimated_rows,
    pg_size_pretty(pg_relation_size(c.oid)) AS size,
    t.spcname AS tablespace
FROM pg_class c
LEFT JOIN pg_tablespace t ON c.reltablespace = t.oid
WHERE c.relkind = 'r' AND c.relnamespace = 'public'::regnamespace;
 
-- INTERNAL LEVEL: Index structures
SELECT 
    tablename,
    indexname,
    indexdef
FROM pg_indexes
WHERE schemaname = 'public';
 
-- See the MAPPING: How view translates to base tables
SELECT pg_get_viewdef('my_view_name'::regclass);

NoSQL and Non-Relational Databases

NoSQL databases often implement ANSI-SPARC partially:

Document databases (MongoDB): Weak external/conceptual distinction (no views traditionally), but clear conceptual/internal separation
Column stores (Cassandra): Physical storage structure is more exposed to users
Graph databases (Neo4j): Strong conceptual model, but physical structure often visible for optimization

The architecture remains a useful lens even when not fully implemented.

Cloud-Native Evolution

Summary: ANSI-SPARC Architecture

We've explored the ANSI-SPARC architecture—the formal framework that has guided database design for 50 years. Here are the essential takeaways:

Key Takeaways

•ANSI-SPARC emerged from industry need — The 1970s database chaos required a standardized architecture that all vendors could follow.
•Three levels provide separation of concerns — External (user views), Conceptual (logical structure), Internal (physical storage).
•Mappings connect the levels — External/Conceptual mapping translates views; Conceptual/Internal mapping handles storage decisions.
•Data independence is the core achievement — Logical and physical changes can be made without cascading to other levels.
•Modern databases implement ANSI-SPARC — VIEWs, system catalogs, storage engines all trace back to this architecture.
•The architecture remains relevant — Despite NoSQL and cloud evolution, the principles of abstraction and independence endure.

What's Next:

Page Complete