Database Management SystemsData Independence

Data Independence: The Foundation of Database Flexibility

LevelBeginner

Duration60 mins

TopicData Independence

5 / 5

Mapping Between Levels

The Translation Layers That Make Independence Possible

Throughout this module, we've discussed data independence as a powerful abstraction. But abstraction requires machinery—specific mechanisms that translate between levels, maintain consistency, and enable the separation we've described.

These mechanisms are the mappings between the three levels of the ANSI-SPARC architecture:

External/Conceptual Mapping — Translates between application views and the logical schema
Conceptual/Internal Mapping — Translates between the logical schema and physical storage

Understanding these mappings completes your knowledge of how data independence actually works at a technical level. You'll see how SQL queries transform as they descend through the levels, how results propagate back up, and how the DBMS coordinates this translation machinery.

What You Will Learn

By the end of this page, you will understand how external/conceptual mappings work through view resolution and query rewriting, how conceptual/internal mappings operate through the storage manager and query optimizer, how queries flow through the complete mapping stack, and how modern DBMS implementations handle these mappings in practice.

The Architecture of Mappings

The three-level architecture isn't just a conceptual model—it's implemented through concrete DBMS components. Each mapping corresponds to specific subsystems that perform translation.

System Catalog: The Foundation of All Mappings

At the heart of all mappings is the system catalog (also called the data dictionary or metadata repository). The catalog stores:

External schema definitions: View definitions, column aliases, computed expressions
Conceptual schema definitions: Table structures, relationships, constraints, data types
Internal schema definitions: Storage structures, indexes, partitions, tablespaces
Mapping relationships: How each level relates to the others

Every query processing operation consults the catalog to understand how to translate between levels.

mapping_architecture.txt

Text

┌─────────────────────────────────────────────────────────────────────────────┐
│                      DBMS ARCHITECTURE: MAPPING COMPONENTS                  │
└─────────────────────────────────────────────────────────────────────────────┘
 
APPLICATION QUERY
    │
    │  "SELECT name, email FROM CustomerView WHERE region = 'West'"
    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         QUERY PROCESSOR                                     │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                       PARSER                                          │  │
│  │  - Syntax validation                                                  │  │
│  │  - Lexical analysis                                                   │  │
│  │  - AST (Abstract Syntax Tree) generation                              │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │              VIEW RESOLVER (External/Conceptual Mapping)              │  │
│  │  - Looks up view definition in catalog                                │  │
│  │  - Expands view into base table references                            │  │
│  │  - Resolves column aliases to actual columns                          │  │
│  │  - Merges external predicates with view predicates                    │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │              QUERY REWRITER                                           │  │
│  │  - Applies transformation rules                                       │  │
│  │  - Simplifies expressions                                             │  │
│  │  - Flattens nested views                                              │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │               QUERY OPTIMIZER (Conceptual/Internal Mapping)           │  │
│  │  - Reads available physical structures from catalog                   │  │
│  │  - Generates execution plan candidates                                │  │
│  │  - Costs each plan based on statistics                                │  │
│  │  - Selects optimal plan                                               │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
└───────────────────────────────────┼─────────────────────────────────────────┘
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         EXECUTION ENGINE                                    │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │               PLAN EXECUTOR                                           │  │
│  │  - Executes operators (scan, join, filter, project)                   │  │
│  │  - Coordinates with storage manager                                   │  │
│  │  - Manages intermediate results                                       │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │               STORAGE MANAGER (Conceptual/Internal Mapping)           │  │
│  │  - Translates logical table access to physical I/O                    │  │
│  │  - Manages buffer pool                                                │  │
│  │  - Executes index operations                                          │  │
│  │  - Handles physical record format                                     │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
└───────────────────────────────────┼─────────────────────────────────────────┘
                                    ▼
                              DISK STORAGE

Catalog as Truth Source

The system catalog is the authoritative source for all mappings. When you CREATE VIEW, the view definition is stored in the catalog. When you CREATE INDEX, the index metadata is stored in the catalog. All translation operations consult the catalog to know how to proceed.

External/Conceptual Mapping in Detail

The external/conceptual mapping translates between application-visible views and the underlying logical schema. This mapping is implemented through view resolution and query rewriting.

View Resolution Process:

When a query references a view, the DBMS must:

Locate the view definition in the catalog
Substitute the view reference with the view's query
Merge any predicates, projections, and joins
Resolve column name mappings

view_resolution_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
-- ==============================================================
-- VIEW DEFINITION (in catalog)
-- ==============================================================
 
CREATE VIEW WestCoastCustomers AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS full_name,
    c.email,
    a.city,
    a.state
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id AND a.address_type = 'primary'
WHERE a.state IN ('CA', 'OR', 'WA', 'NV', 'AZ');
 
 
-- ==============================================================
-- APPLICATION QUERY (what application submits)
-- ==============================================================
 
SELECT full_name, email 
FROM WestCoastCustomers 
WHERE city = 'Seattle' 
ORDER BY full_name;
 
 
-- ==============================================================
-- VIEW RESOLUTION (what DBMS transforms it to)
-- ==============================================================
 
-- Step 1: Substitute view reference with view definition
SELECT full_name, email
FROM (
    SELECT 
        c.customer_id,
        c.first_name || ' ' || c.last_name AS full_name,
        c.email,
        a.city,
        a.state
    FROM Customer c
    JOIN Address a ON c.customer_id = a.customer_id AND a.address_type = 'primary'
    WHERE a.state IN ('CA', 'OR', 'WA', 'NV', 'AZ')
) AS WestCoastCustomers
WHERE city = 'Seattle'
ORDER BY full_name;
 
-- Step 2: Merge predicates and flatten (query rewriting)
SELECT 
    c.first_name || ' ' || c.last_name AS full_name,
    c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id AND a.address_type = 'primary'
WHERE a.state IN ('CA', 'OR', 'WA', 'NV', 'AZ')  -- From view
  AND a.city = 'Seattle'                          -- From query (merged!)
ORDER BY c.first_name || ' ' || c.last_name;
 
-- This final query references only base tables!
-- The external/conceptual mapping is complete.

Column Mapping and Aliases:

The external/conceptual mapping handles more than just view expansion—it maintains the correspondence between external names (what applications see) and internal names (what actually exists):

Column Mapping Example
External Name (View)	Expression/Mapping	Conceptual Source
`full_name`	`first_name \|\| ' ' \|\| last_name`	`Customer.first_name`, `Customer.last_name`
`email`	Direct pass-through	`Customer.email`
`city`	Direct pass-through	`Address.city`
`state`	Direct pass-through	`Address.state`
`customer_id`	Direct (but hidden by final SELECT)	`Customer.customer_id`

Predicate Pushdown Optimization

Notice how the query rewriter merged city = 'Seattle' into the base query, pushing it down to be applied early. This is predicate pushdown—a key optimization that prevents the DBMS from fetching all California, Oregon, Washington, Nevada, and Arizona customers just to filter for Seattle. The external/conceptual mapping enables this optimization.

Nested View Resolution:

Views can reference other views, creating hierarchies. The view resolver handles this recursively:

nested_view_resolution.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Base view
CREATE VIEW ActiveCustomers AS
SELECT * FROM Customer WHERE status = 'active';
 
-- View on view
CREATE VIEW ActivePremiumCustomers AS
SELECT * FROM ActiveCustomers WHERE loyalty_tier = 'premium';
 
-- View on view on view
CREATE VIEW ActivePremiumWestCoast AS
SELECT ac.customer_id, ac.first_name, ac.last_name, ac.email, a.state
FROM ActivePremiumCustomers ac
JOIN Address a ON ac.customer_id = a.customer_id
WHERE a.state IN ('CA', 'OR', 'WA');
 
-- Query:
SELECT first_name, email FROM ActivePremiumWestCoast WHERE state = 'CA';
 
-- After recursive view resolution:
SELECT c.first_name, c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id
WHERE c.status = 'active'           -- From ActiveCustomers
  AND c.loyalty_tier = 'premium'    -- From ActivePremiumCustomers
  AND a.state IN ('CA', 'OR', 'WA') -- From ActivePremiumWestCoast
  AND a.state = 'CA';               -- From query
 
-- Further optimized (redundant predicate elimination):
SELECT c.first_name, c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id
WHERE c.status = 'active'
  AND c.loyalty_tier = 'premium'
  AND a.state = 'CA';  -- 'CA' satisfies IN ('CA','OR','WA') so simplify

Conceptual/Internal Mapping in Detail

The conceptual/internal mapping translates logical operations (on tables, rows, columns) into physical operations (on files, pages, bytes). This is implemented through the query optimizer and storage manager working together.

Key Translation Decisions:

The conceptual/internal mapping must answer questions like:

Which physical access path (scan, index) should be used for each table reference?
How should joins be physically executed (nested loop, hash, merge)?
In what order should operations be performed?
How should data be physically read from storage?
How should intermediate results be managed (memory vs. spill to disk)?

conceptual_internal_mapping.txt

Text

┌─────────────────────────────────────────────────────────────────────────────┐
│                 QUERY AFTER VIEW RESOLUTION (Conceptual Level)              │
└─────────────────────────────────────────────────────────────────────────────┘
 
SELECT c.first_name, c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id
WHERE c.status = 'active'
  AND c.loyalty_tier = 'premium'
  AND a.state = 'CA';
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                         OPTIMIZER ANALYSIS                                   │
└─────────────────────────────────────────────────────────────────────────────┘
 
Available Physical Structures (from catalog):
├── Customer table: heap storage, 1,000,000 rows
│   ├── Primary key index: B-tree on customer_id
│   ├── Index: B-tree on status
│   └── Index: B-tree on loyalty_tier
├── Address table: heap storage, 1,200,000 rows
│   ├── Primary key index: B-tree on address_id
│   ├── Index: B-tree on customer_id
│   └── Index: B-tree on state
 
Statistics (from catalog):
├── Customer.status = 'active': ~70% selectivity (700,000 rows)
├── Customer.loyalty_tier = 'premium': ~5% selectivity (50,000 rows)
├── Address.state = 'CA': ~12% selectivity (144,000 rows)
└── Combined selectivity estimate: ~0.4% (4,000 result rows)
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTION PLAN CANDIDATES                                │
└─────────────────────────────────────────────────────────────────────────────┘
 
Plan A: Full Table Scans + Hash Join
├── Scan Customer (full table scan, filter status='active' AND tier='premium')
├── Scan Address (full table scan, filter state='CA')
├── Hash join on customer_id
└── COST: 45,000 (high I/O)
 
Plan B: Index Scan on Customer + Nested Loop to Address
├── Index scan on Customer.loyalty_tier = 'premium' (50,000 rows)
├── Filter status = 'active' (35,000 rows after filter)
├── For each: index lookup on Address.customer_id
├── Filter state = 'CA'
└── COST: 38,000 (many index lookups)
 
Plan C: Index Scan on Address + Nested Loop to Customer
├── Index scan on Address.state = 'CA' (144,000 rows)
├── For each: index lookup on Customer.customer_id
├── Filter status='active' AND tier='premium'
└── COST: 52,000 (more lookups than Plan B)
 
Plan D: Bitmap Index Combine (PostgreSQL)
├── Bitmap scan Customer.loyalty_tier = 'premium'
├── Bitmap AND with Customer.status = 'active' rows
├── Heap scan matching bitmaps (35,000 rows)
├── Hash join to Address with filter state='CA'
└── COST: 12,000 (winner!)
 
SELECTED: Plan D
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTION PLAN (Internal Level)                          │
└─────────────────────────────────────────────────────────────────────────────┘
 
1. Bitmap Index Scan on idx_customer_loyalty_tier
   → Physical: Read index pages for loyalty_tier='premium', build bitmap
   
2. Bitmap Index Scan on idx_customer_status  
   → Physical: Read index pages for status='active', build bitmap
   
3. BitmapAnd
   → Physical: AND the two bitmaps (pure CPU, no I/O)
   
4. Bitmap Heap Scan on Customer
   → Physical: Read heap pages indicated by bitmap
   → Output: 35,000 rows (c.customer_id, c.first_name, c.email)
   
5. Hash
   → Physical: Build hash table on a.customer_id from Address where state='CA'
   → Reads index, then heap pages for matching rows
   
6. Hash Join
   → Physical: Probe hash table with customer rows
   → Output: 4,000 matching rows
   
7. Project
   → Physical: Select only first_name, email from results

Storage Manager Operations:

Once the optimizer chooses a plan, the storage manager executes the physical operations:

Storage Manager Responsibilities

•Table/Index File Access — Translating table names to physical file locations. Customer table → /data/base/16384/52301 file
•Page-Based I/O — All disk access is in fixed-size pages (typically 4KB-16KB). Reading row 12,567 means reading the page containing it.
•Buffer Management — Caching recently accessed pages in memory. Deciding which pages to evict when memory is full.
•Record Format Translation — Converting between logical row representation and physical byte format on disk.
•Index Navigation — Traversing B-tree structures, following pointers, handling index page splits/merges.
•Compression/Decompression — If data is stored compressed, transparently decompress on read and compress on write.

Complete Abstraction

The application asked for 'first_name' and 'email' from 'Customer'. The storage manager reads specific byte ranges from specific disk blocks, navigates index structures, and manages memory caching. All of this is invisible to the application—the conceptual/internal mapping provides complete abstraction.

Query Flow: End to End

Let's trace a complete query through both mappings, from application to disk and back:

complete_query_flow.txt

Text

┌─────────────────────────────────────────────────────────────────────────────┐
│                      COMPLETE QUERY FLOW TRACE                               │
└─────────────────────────────────────────────────────────────────────────────┘
 
═══════════════════════════════════════════════════════════════════════════════
STEP 1: APPLICATION SUBMITS QUERY
═══════════════════════════════════════════════════════════════════════════════
 
Application (Python):
    cursor.execute("SELECT name, total FROM OrderSummary WHERE status = 'pending'")
 
Query arrives at DBMS:
    "SELECT name, total FROM OrderSummary WHERE status = 'pending'"
 
═══════════════════════════════════════════════════════════════════════════════
STEP 2: PARSING
═══════════════════════════════════════════════════════════════════════════════
 
Parser produces AST:
    SELECT
    ├── Columns: [name, total]
    ├── FROM: OrderSummary
    └── WHERE: status = 'pending'
 
═══════════════════════════════════════════════════════════════════════════════
STEP 3: VIEW RESOLUTION (External → Conceptual)
═══════════════════════════════════════════════════════════════════════════════
 
Catalog lookup for "OrderSummary":
    Type: VIEW
    Definition:
        SELECT 
            c.first_name || ' ' || c.last_name AS name,
            o.order_id,
            o.total_amount AS total,
            o.status
        FROM Orders o
        JOIN Customer c ON o.customer_id = c.customer_id
 
View expansion + merge:
    SELECT 
        c.first_name || ' ' || c.last_name AS name,
        o.total_amount AS total
    FROM Orders o
    JOIN Customer c ON o.customer_id = c.customer_id
    WHERE o.status = 'pending'
 
═══════════════════════════════════════════════════════════════════════════════
STEP 4: QUERY OPTIMIZATION (Conceptual → Internal)
═══════════════════════════════════════════════════════════════════════════════
 
Catalog lookup for physical structures:
    Orders: Heap, 5M rows
        Indexes: PK(order_id), status, customer_id, order_date
    Customer: Heap, 500K rows
        Indexes: PK(customer_id), email
 
Statistics:
    Orders where status='pending': 50,000 rows (1%)
    
Plan selected:
    1. Index Scan on Orders.idx_status = 'pending'
    2. Nested Loop Join with Customer using PK lookup
    3. Project name, total
 
═══════════════════════════════════════════════════════════════════════════════
STEP 5: EXECUTION (Physical Operations)
═══════════════════════════════════════════════════════════════════════════════
 
Storage Manager Operations:
 
    [1] Index Scan: idx_orders_status
        └── Read index root page (block 1045)
        └── Navigate to 'pending' leaf (blocks 1156, 1157, 1158...)
        └── Collect row pointers to Orders heap (50,000 TIDs)
 
    [2] Heap Access: Orders table
        └── For each TID: read heap page, extract row
        └── Heap pages accessed: ~3,000 pages (due to TID locality)
        └── Extract: customer_id, total_amount
 
    [3] Nested Loop: For each order row
        └── Index Lookup: Customer.PK(customer_id)
            └── Read index (1-3 pages per lookup, cached well)
        └── Heap Access: Customer row
            └── Extract: first_name, last_name
 
    [4] Expression Evaluation
        └── Concatenate: first_name || ' ' || last_name → name
        └── Pass through: total_amount → total
 
    [5] Result Buffer
        └── Accumulate result rows in memory buffer
        └── 50,000 result rows × ~50 bytes = ~2.5MB
 
═══════════════════════════════════════════════════════════════════════════════
STEP 6: RESULT RETURN
═══════════════════════════════════════════════════════════════════════════════
 
Results formatted per external schema:
    Columns: name (VARCHAR), total (DECIMAL)
    
    Row 1: "John Doe", 125.99
    Row 2: "Jane Smith", 89.50
    ...
    Row 50000: "Alex Johnson", 234.00
 
Network transfer to application with column names matching original query.
 
═══════════════════════════════════════════════════════════════════════════════
SUMMARY OF MAPPINGS APPLIED
═══════════════════════════════════════════════════════════════════════════════
 
External → Conceptual:
    - "OrderSummary" → Orders JOIN Customer
    - "name" → first_name || ' ' || last_name
    - "total" → total_amount
    - WHERE status='pending' merged into view query
 
Conceptual → Internal:
    - "Orders table" → heap file + idx_orders_status
    - "Customer table" → heap file + PK index
    - "JOIN" → Nested Loop with Index Lookup
    - "WHERE" → Index Scan condition

System Catalog Structure

The system catalog is where all mapping information is stored. Understanding catalog structure helps you understand how mappings are persisted and accessed.

Key Catalog Tables (PostgreSQL example):

catalog_tables.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
-- ==============================================================
-- POSTGRESQL SYSTEM CATALOG: Key Tables for Mappings
-- ==============================================================
 
-- EXTERNAL LEVEL: View Definitions
-- pg_class: All relations (tables, views, indexes, etc.)
SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname = 'ordersummary';
-- relkind = 'v' indicates a view
 
-- pg_views: View definitions
SELECT viewname, definition
FROM pg_views
WHERE viewname = 'ordersummary';
-- Returns the view's SQL definition for view resolution
 
-- pg_attribute: Column information for external schema
SELECT attname, atttypid, attnum
FROM pg_attribute
WHERE attrelid = 'ordersummary'::regclass;
 
 
-- CONCEPTUAL LEVEL: Table Definitions
-- pg_class: Base tables
SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname IN ('orders', 'customer');
-- relkind = 'r' indicates a regular table
 
-- pg_attribute: Column definitions
SELECT attname, atttypid, attnum, attnotnull
FROM pg_attribute
WHERE attrelid = 'orders'::regclass AND attnum > 0;
 
-- pg_constraint: Integrity constraints
SELECT conname, contype, confrelid
FROM pg_constraint
WHERE conrelid = 'orders'::regclass;
 
 
-- INTERNAL LEVEL: Physical Structure Definitions
-- pg_class: Index information
SELECT relname, relkind, relam, relpages
FROM pg_class
WHERE relname LIKE 'idx_%' AND relkind = 'i';
 
-- pg_index: Index details
SELECT indexrelid::regclass, indrelid::regclass, indkey
FROM pg_index
WHERE indrelid = 'orders'::regclass;
 
-- pg_am: Access methods (B-tree, Hash, etc.)
SELECT amname FROM pg_am WHERE oid = (
    SELECT relam FROM pg_class WHERE relname = 'idx_orders_status'
);
 
-- pg_tablespace: Physical storage location
SELECT spcname, pg_tablespace_location(oid)
FROM pg_tablespace;
 
-- pg_statistic: Statistics for optimizer
SELECT starelid::regclass, staattnum, stanullfrac, stadistinct
FROM pg_statistic
WHERE starelid = 'orders'::regclass;

Catalog Tables by Mapping Level
Architecture Level	Catalog Tables	Information Stored
External	pg_views, pg_class (views), pg_depend	View definitions, column aliases, dependencies
Conceptual	pg_class (tables), pg_attribute, pg_constraint, pg_type	Table structures, columns, constraints, data types
Internal	pg_index, pg_am, pg_tablespace, pg_statistic	Indexes, access methods, storage, optimizer stats
Mappings	pg_depend, pg_rewrite	View-to-table dependencies, rewrite rules

Catalog is Self-Describing

The system catalog is itself stored in tables. You can query pg_class to find information about pg_class itself! This self-describing nature is powerful—tools, applications, and the DBMS itself all use the same mechanism to discover database structure.

Mapping Maintenance

Mappings aren't static—they evolve as schemas change. The DBMS must maintain mapping consistency, and schema changes must update the relevant mappings.

Mapping Updates During DDL Operations:

DDL Operations and Mapping Updates
DDL Operation	External/Conceptual Mapping	Conceptual/Internal Mapping
CREATE VIEW	New view definition added to catalog	No change (views don't have physical storage)
DROP VIEW	View definition removed; dependent views may become invalid	No change
ALTER TABLE ADD COLUMN	Existing views unaffected (unless they use SELECT *)	No physical mapping change; new column added to heap
ALTER TABLE DROP COLUMN	Views referencing column become invalid; revalidation needed	Column storage deallocated
CREATE INDEX	No change (indexes are transparent)	New index registered; optimizer considers it
DROP INDEX	No change	Index removed from consideration; optimizer adapts
ALTER TABLE ... PARTITION	No change (partitioning is physical)	Partition metadata updated; optimizer uses partition pruning

View Invalidation and Revalidation:

When underlying tables change, views may become invalid. The DBMS handles this in different ways:

view_invalidation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- ==============================================================
-- VIEW INVALIDATION SCENARIOS
-- ==============================================================
 
-- Original view
CREATE VIEW CustomerEmails AS
SELECT customer_id, email FROM Customer;
 
-- SCENARIO 1: Add column (usually OK)
ALTER TABLE Customer ADD COLUMN phone VARCHAR(20);
-- View still works—it doesn't reference 'phone'
 
-- SCENARIO 2: Drop column referenced by view
ALTER TABLE Customer DROP COLUMN email;
-- ERROR: cannot drop column email because other objects depend on it
-- PostgreSQL protects views by default
 
-- SCENARIO 3: Force drop (if CASCADE)
ALTER TABLE Customer DROP COLUMN email CASCADE;
-- View 'CustomerEmails' is also dropped automatically
-- All dependent views are invalidated and removed
 
-- SCENARIO 4: Rename column
ALTER TABLE Customer RENAME COLUMN email TO email_address;
-- PostgreSQL: View becomes invalid (references old name)
-- Oracle: View may auto-recompile with new name (depends on settings)
 
 
-- ==============================================================
-- ORACLE: DEFERRED VALIDATION MODEL
-- ==============================================================
 
-- Oracle marks views as INVALID rather than erroring immediately
-- View status in USER_OBJECTS:
SELECT object_name, status FROM user_objects 
WHERE object_type = 'VIEW' AND object_name = 'CUSTOMEREMAILS';
-- Returns: CUSTOMEREMAILS, INVALID
 
-- Next query against view triggers recompilation:
SELECT * FROM CustomerEmails;
-- If structure compatible: recompiles and executes
-- If incompatible: ORA-04063: view has errors
 
 
-- ==============================================================
-- SQL SERVER: SCHEMA BINDING OPTION
-- ==============================================================
 
-- SCHEMABINDING prevents changes that would break views
CREATE VIEW CustomerEmails WITH SCHEMABINDING AS
SELECT customer_id, email FROM dbo.Customer;
 
-- Now this fails:
ALTER TABLE Customer DROP COLUMN email;
-- Error: Cannot DROP COLUMN because 'CustomerEmails' depends on it
 
-- Provides protection at the cost of flexibility

Dependency Tracking is Critical

Before any schema change, always check dependencies using catalog queries or DBMS tools. PostgreSQL: pg_depend. Oracle: DBA_DEPENDENCIES. SQL Server: sys.sql_expression_dependencies. Unexpected view invalidation can break applications in production.

Performance Implications of Mappings

Mappings have performance costs. Every query must be parsed, views resolved, plans generated, and operations translated. Understanding these costs helps you design efficient systems.

Performance Considerations

•View Resolution Overhead — Deep view hierarchies (views on views on views) require recursive resolution. Each level adds catalog lookups and query rewriting. Keep view nesting shallow (3-4 levels max).
•Optimizer Complexity — More complex queries (after view expansion) give the optimizer more work. Exponentially more plan candidates exist for many-table joins. Optimizer time limits may cause suboptimal plans.
•Plan Caching Benefits — Most DBMS cache execution plans. The first execution pays the full optimization cost; subsequent executions reuse the plan. Prepared statements maximize caching benefits.
•Statistics Freshness — Conceptual/internal mapping quality depends on accurate statistics. Stale statistics cause poor plans. Regular ANALYZE is essential.
•View Materialization — For expensive views queried frequently, consider materialized views. Trade storage for query speed, accepting data freshness lag.

performance_tips.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
-- ==============================================================
-- MEASURING MAPPING OVERHEAD
-- ==============================================================
 
-- PostgreSQL: Compare planning vs execution time
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT name, total FROM OrderSummary WHERE status = 'pending';
 
-- Look for:
-- Planning Time: 0.245 ms   (view resolution + optimization)
-- Execution Time: 45.123 ms (actual data access)
 
-- If Planning Time is high relative to Execution, consider:
-- 1. Simplifying view structure
-- 2. Using prepared statements
-- 3. Plan caching (PL/pgSQL functions cache plans)
 
 
-- ==============================================================
-- PREPARED STATEMENTS: Amortizing Optimization Cost
-- ==============================================================
 
-- Without preparation: optimize every time
SELECT * FROM OrderSummary WHERE status = $1;  -- Optimize
SELECT * FROM OrderSummary WHERE status = $2;  -- Optimize again
 
-- With preparation: optimize once
PREPARE order_query AS SELECT * FROM OrderSummary WHERE status = $1;
EXECUTE order_query('pending');   -- Uses cached plan
EXECUTE order_query('shipped');   -- Uses cached plan (fast!)
 
-- Application code (e.g., Python psycopg2) uses server-side prepared statements automatically
 
 
-- ==============================================================
-- MATERIALIZED VIEWS: Trading Freshness for Speed
-- ==============================================================
 
-- Complex aggregation view (slow to compute live)
CREATE VIEW DailySalesReport AS
SELECT 
    date_trunc('day', order_date) AS sale_date,
    region,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue,
    AVG(total_amount) AS avg_order
FROM Orders o
JOIN Customer c ON o.customer_id = c.customer_id
JOIN Region r ON c.region_id = r.region_id
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1, 2;
 
-- Query takes 15 seconds (scans millions of rows)
 
-- Convert to materialized view
CREATE MATERIALIZED VIEW DailySalesReport_MV AS
SELECT ...;  -- Same query
 
CREATE INDEX idx_sales_report_date ON DailySalesReport_MV(sale_date);
 
-- Query now takes milliseconds (reads pre-computed results)
 
-- Refresh during off-peak hours
-- Schedule: REFRESH MATERIALIZED VIEW CONCURRENTLY DailySalesReport_MV;

Modern Optimizers Are Good

Modern query optimizers are sophisticated. Don't avoid views for fear of performance—measure first. Often, the optimizer fully inlines and optimizes views, producing the same plan as hand-written queries. Trust but verify: use EXPLAIN ANALYZE to confirm.

Summary: Mapping Between Levels

We've explored how mappings implement data independence at a technical level. Here are the essential takeaways:

Key Takeaways

•The system catalog stores all mappings — View definitions, table structures, indexes, and statistics are all persisted in catalog tables.
•External/conceptual mapping uses view resolution — Views are expanded, predicates merged, and column aliases resolved to produce base table queries.
•Conceptual/internal mapping uses the optimizer — The query optimizer selects physical access paths based on available structures and statistics.
•Queries flow through both mappings — From application query to disk I/O becomes a graph of transformations maintained by the DBMS.
•Catalog structure supports all levels — pg_views for external, pg_class/pg_attribute for conceptual, pg_index/pg_statistic for internal.
•Schema changes update mappings — DDL operations modify catalog entries; dependent views may need revalidation.
•Mappings have performance costs — View resolution and optimization have overhead, but caching, preparation, and materialization mitigate it.
•Mappings enable data independence — This translation machinery is what makes it possible to change schemas without changing applications.

Module Complete:

Congratulations! You've completed the Data Independence module. You now understand:

Logical data independence: protecting applications from conceptual schema changes
Physical data independence: protecting logical schemas from storage changes
Why it matters: organizational agility, economics, and system sustainability
How to achieve it: views, triggers, schema evolution patterns, and governance
How it works mechanically: mappings implemented through catalogs, view resolution, and query optimization

This knowledge is fundamental to database architecture and will inform every database design decision you make.

Module Complete

You now have comprehensive knowledge of data independence—both the why and the how. From the theoretical foundations through practical implementation to mechanical details, you understand how modern database systems achieve the separation of concerns that enables decades of system evolution. This is core knowledge for any database professional.

5 / 5

Loading learning content...

Database Management SystemsData Independence

Data Independence: The Foundation of Database Flexibility

LevelBeginner

Duration60 mins

TopicData Independence

5 / 5

Mapping Between Levels

The Translation Layers That Make Independence Possible

These mechanisms are the mappings between the three levels of the ANSI-SPARC architecture:

External/Conceptual Mapping — Translates between application views and the logical schema
Conceptual/Internal Mapping — Translates between the logical schema and physical storage

What You Will Learn

The Architecture of Mappings

The three-level architecture isn't just a conceptual model—it's implemented through concrete DBMS components. Each mapping corresponds to specific subsystems that perform translation.

System Catalog: The Foundation of All Mappings

At the heart of all mappings is the system catalog (also called the data dictionary or metadata repository). The catalog stores:

External schema definitions: View definitions, column aliases, computed expressions
Conceptual schema definitions: Table structures, relationships, constraints, data types
Internal schema definitions: Storage structures, indexes, partitions, tablespaces
Mapping relationships: How each level relates to the others

Every query processing operation consults the catalog to understand how to translate between levels.

mapping_architecture.txt

Text

┌─────────────────────────────────────────────────────────────────────────────┐
│                      DBMS ARCHITECTURE: MAPPING COMPONENTS                  │
└─────────────────────────────────────────────────────────────────────────────┘
 
APPLICATION QUERY
    │
    │  "SELECT name, email FROM CustomerView WHERE region = 'West'"
    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         QUERY PROCESSOR                                     │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                       PARSER                                          │  │
│  │  - Syntax validation                                                  │  │
│  │  - Lexical analysis                                                   │  │
│  │  - AST (Abstract Syntax Tree) generation                              │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │              VIEW RESOLVER (External/Conceptual Mapping)              │  │
│  │  - Looks up view definition in catalog                                │  │
│  │  - Expands view into base table references                            │  │
│  │  - Resolves column aliases to actual columns                          │  │
│  │  - Merges external predicates with view predicates                    │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │              QUERY REWRITER                                           │  │
│  │  - Applies transformation rules                                       │  │
│  │  - Simplifies expressions                                             │  │
│  │  - Flattens nested views                                              │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │               QUERY OPTIMIZER (Conceptual/Internal Mapping)           │  │
│  │  - Reads available physical structures from catalog                   │  │
│  │  - Generates execution plan candidates                                │  │
│  │  - Costs each plan based on statistics                                │  │
│  │  - Selects optimal plan                                               │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
└───────────────────────────────────┼─────────────────────────────────────────┘
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         EXECUTION ENGINE                                    │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │               PLAN EXECUTOR                                           │  │
│  │  - Executes operators (scan, join, filter, project)                   │  │
│  │  - Coordinates with storage manager                                   │  │
│  │  - Manages intermediate results                                       │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
│                                   ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │               STORAGE MANAGER (Conceptual/Internal Mapping)           │  │
│  │  - Translates logical table access to physical I/O                    │  │
│  │  - Manages buffer pool                                                │  │
│  │  - Executes index operations                                          │  │
│  │  - Handles physical record format                                     │  │
│  └────────────────────────────────┬─────────────────────────────────────┘  │
└───────────────────────────────────┼─────────────────────────────────────────┘
                                    ▼
                              DISK STORAGE

Catalog as Truth Source

External/Conceptual Mapping in Detail

The external/conceptual mapping translates between application-visible views and the underlying logical schema. This mapping is implemented through view resolution and query rewriting.

View Resolution Process:

When a query references a view, the DBMS must:

Locate the view definition in the catalog
Substitute the view reference with the view's query
Merge any predicates, projections, and joins
Resolve column name mappings

view_resolution_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
-- ==============================================================
-- VIEW DEFINITION (in catalog)
-- ==============================================================
 
CREATE VIEW WestCoastCustomers AS
SELECT 
    c.customer_id,
    c.first_name || ' ' || c.last_name AS full_name,
    c.email,
    a.city,
    a.state
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id AND a.address_type = 'primary'
WHERE a.state IN ('CA', 'OR', 'WA', 'NV', 'AZ');
 
 
-- ==============================================================
-- APPLICATION QUERY (what application submits)
-- ==============================================================
 
SELECT full_name, email 
FROM WestCoastCustomers 
WHERE city = 'Seattle' 
ORDER BY full_name;
 
 
-- ==============================================================
-- VIEW RESOLUTION (what DBMS transforms it to)
-- ==============================================================
 
-- Step 1: Substitute view reference with view definition
SELECT full_name, email
FROM (
    SELECT 
        c.customer_id,
        c.first_name || ' ' || c.last_name AS full_name,
        c.email,
        a.city,
        a.state
    FROM Customer c
    JOIN Address a ON c.customer_id = a.customer_id AND a.address_type = 'primary'
    WHERE a.state IN ('CA', 'OR', 'WA', 'NV', 'AZ')
) AS WestCoastCustomers
WHERE city = 'Seattle'
ORDER BY full_name;
 
-- Step 2: Merge predicates and flatten (query rewriting)
SELECT 
    c.first_name || ' ' || c.last_name AS full_name,
    c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id AND a.address_type = 'primary'
WHERE a.state IN ('CA', 'OR', 'WA', 'NV', 'AZ')  -- From view
  AND a.city = 'Seattle'                          -- From query (merged!)
ORDER BY c.first_name || ' ' || c.last_name;
 
-- This final query references only base tables!
-- The external/conceptual mapping is complete.

Column Mapping and Aliases:

The external/conceptual mapping handles more than just view expansion—it maintains the correspondence between external names (what applications see) and internal names (what actually exists):

Column Mapping Example
External Name (View)	Expression/Mapping	Conceptual Source
`full_name`	`first_name \|\| ' ' \|\| last_name`	`Customer.first_name`, `Customer.last_name`
`email`	Direct pass-through	`Customer.email`
`city`	Direct pass-through	`Address.city`
`state`	Direct pass-through	`Address.state`
`customer_id`	Direct (but hidden by final SELECT)	`Customer.customer_id`

Predicate Pushdown Optimization

Nested View Resolution:

Views can reference other views, creating hierarchies. The view resolver handles this recursively:

nested_view_resolution.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Base view
CREATE VIEW ActiveCustomers AS
SELECT * FROM Customer WHERE status = 'active';
 
-- View on view
CREATE VIEW ActivePremiumCustomers AS
SELECT * FROM ActiveCustomers WHERE loyalty_tier = 'premium';
 
-- View on view on view
CREATE VIEW ActivePremiumWestCoast AS
SELECT ac.customer_id, ac.first_name, ac.last_name, ac.email, a.state
FROM ActivePremiumCustomers ac
JOIN Address a ON ac.customer_id = a.customer_id
WHERE a.state IN ('CA', 'OR', 'WA');
 
-- Query:
SELECT first_name, email FROM ActivePremiumWestCoast WHERE state = 'CA';
 
-- After recursive view resolution:
SELECT c.first_name, c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id
WHERE c.status = 'active'           -- From ActiveCustomers
  AND c.loyalty_tier = 'premium'    -- From ActivePremiumCustomers
  AND a.state IN ('CA', 'OR', 'WA') -- From ActivePremiumWestCoast
  AND a.state = 'CA';               -- From query
 
-- Further optimized (redundant predicate elimination):
SELECT c.first_name, c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id
WHERE c.status = 'active'
  AND c.loyalty_tier = 'premium'
  AND a.state = 'CA';  -- 'CA' satisfies IN ('CA','OR','WA') so simplify

Conceptual/Internal Mapping in Detail

Key Translation Decisions:

The conceptual/internal mapping must answer questions like:

Which physical access path (scan, index) should be used for each table reference?
How should joins be physically executed (nested loop, hash, merge)?
In what order should operations be performed?
How should data be physically read from storage?
How should intermediate results be managed (memory vs. spill to disk)?

conceptual_internal_mapping.txt

Text

┌─────────────────────────────────────────────────────────────────────────────┐
│                 QUERY AFTER VIEW RESOLUTION (Conceptual Level)              │
└─────────────────────────────────────────────────────────────────────────────┘
 
SELECT c.first_name, c.email
FROM Customer c
JOIN Address a ON c.customer_id = a.customer_id
WHERE c.status = 'active'
  AND c.loyalty_tier = 'premium'
  AND a.state = 'CA';
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                         OPTIMIZER ANALYSIS                                   │
└─────────────────────────────────────────────────────────────────────────────┘
 
Available Physical Structures (from catalog):
├── Customer table: heap storage, 1,000,000 rows
│   ├── Primary key index: B-tree on customer_id
│   ├── Index: B-tree on status
│   └── Index: B-tree on loyalty_tier
├── Address table: heap storage, 1,200,000 rows
│   ├── Primary key index: B-tree on address_id
│   ├── Index: B-tree on customer_id
│   └── Index: B-tree on state
 
Statistics (from catalog):
├── Customer.status = 'active': ~70% selectivity (700,000 rows)
├── Customer.loyalty_tier = 'premium': ~5% selectivity (50,000 rows)
├── Address.state = 'CA': ~12% selectivity (144,000 rows)
└── Combined selectivity estimate: ~0.4% (4,000 result rows)
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTION PLAN CANDIDATES                                │
└─────────────────────────────────────────────────────────────────────────────┘
 
Plan A: Full Table Scans + Hash Join
├── Scan Customer (full table scan, filter status='active' AND tier='premium')
├── Scan Address (full table scan, filter state='CA')
├── Hash join on customer_id
└── COST: 45,000 (high I/O)
 
Plan B: Index Scan on Customer + Nested Loop to Address
├── Index scan on Customer.loyalty_tier = 'premium' (50,000 rows)
├── Filter status = 'active' (35,000 rows after filter)
├── For each: index lookup on Address.customer_id
├── Filter state = 'CA'
└── COST: 38,000 (many index lookups)
 
Plan C: Index Scan on Address + Nested Loop to Customer
├── Index scan on Address.state = 'CA' (144,000 rows)
├── For each: index lookup on Customer.customer_id
├── Filter status='active' AND tier='premium'
└── COST: 52,000 (more lookups than Plan B)
 
Plan D: Bitmap Index Combine (PostgreSQL)
├── Bitmap scan Customer.loyalty_tier = 'premium'
├── Bitmap AND with Customer.status = 'active' rows
├── Heap scan matching bitmaps (35,000 rows)
├── Hash join to Address with filter state='CA'
└── COST: 12,000 (winner!)
 
SELECTED: Plan D
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTION PLAN (Internal Level)                          │
└─────────────────────────────────────────────────────────────────────────────┘
 
1. Bitmap Index Scan on idx_customer_loyalty_tier
   → Physical: Read index pages for loyalty_tier='premium', build bitmap
   
2. Bitmap Index Scan on idx_customer_status  
   → Physical: Read index pages for status='active', build bitmap
   
3. BitmapAnd
   → Physical: AND the two bitmaps (pure CPU, no I/O)
   
4. Bitmap Heap Scan on Customer
   → Physical: Read heap pages indicated by bitmap
   → Output: 35,000 rows (c.customer_id, c.first_name, c.email)
   
5. Hash
   → Physical: Build hash table on a.customer_id from Address where state='CA'
   → Reads index, then heap pages for matching rows
   
6. Hash Join
   → Physical: Probe hash table with customer rows
   → Output: 4,000 matching rows
   
7. Project
   → Physical: Select only first_name, email from results

Storage Manager Operations:

Once the optimizer chooses a plan, the storage manager executes the physical operations:

Storage Manager Responsibilities

•Table/Index File Access — Translating table names to physical file locations. Customer table → /data/base/16384/52301 file
•Page-Based I/O — All disk access is in fixed-size pages (typically 4KB-16KB). Reading row 12,567 means reading the page containing it.
•Buffer Management — Caching recently accessed pages in memory. Deciding which pages to evict when memory is full.
•Record Format Translation — Converting between logical row representation and physical byte format on disk.
•Index Navigation — Traversing B-tree structures, following pointers, handling index page splits/merges.
•Compression/Decompression — If data is stored compressed, transparently decompress on read and compress on write.

Complete Abstraction

Query Flow: End to End

Let's trace a complete query through both mappings, from application to disk and back:

complete_query_flow.txt

Text

┌─────────────────────────────────────────────────────────────────────────────┐
│                      COMPLETE QUERY FLOW TRACE                               │
└─────────────────────────────────────────────────────────────────────────────┘
 
═══════════════════════════════════════════════════════════════════════════════
STEP 1: APPLICATION SUBMITS QUERY
═══════════════════════════════════════════════════════════════════════════════
 
Application (Python):
    cursor.execute("SELECT name, total FROM OrderSummary WHERE status = 'pending'")
 
Query arrives at DBMS:
    "SELECT name, total FROM OrderSummary WHERE status = 'pending'"
 
═══════════════════════════════════════════════════════════════════════════════
STEP 2: PARSING
═══════════════════════════════════════════════════════════════════════════════
 
Parser produces AST:
    SELECT
    ├── Columns: [name, total]
    ├── FROM: OrderSummary
    └── WHERE: status = 'pending'
 
═══════════════════════════════════════════════════════════════════════════════
STEP 3: VIEW RESOLUTION (External → Conceptual)
═══════════════════════════════════════════════════════════════════════════════
 
Catalog lookup for "OrderSummary":
    Type: VIEW
    Definition:
        SELECT 
            c.first_name || ' ' || c.last_name AS name,
            o.order_id,
            o.total_amount AS total,
            o.status
        FROM Orders o
        JOIN Customer c ON o.customer_id = c.customer_id
 
View expansion + merge:
    SELECT 
        c.first_name || ' ' || c.last_name AS name,
        o.total_amount AS total
    FROM Orders o
    JOIN Customer c ON o.customer_id = c.customer_id
    WHERE o.status = 'pending'
 
═══════════════════════════════════════════════════════════════════════════════
STEP 4: QUERY OPTIMIZATION (Conceptual → Internal)
═══════════════════════════════════════════════════════════════════════════════
 
Catalog lookup for physical structures:
    Orders: Heap, 5M rows
        Indexes: PK(order_id), status, customer_id, order_date
    Customer: Heap, 500K rows
        Indexes: PK(customer_id), email
 
Statistics:
    Orders where status='pending': 50,000 rows (1%)
    
Plan selected:
    1. Index Scan on Orders.idx_status = 'pending'
    2. Nested Loop Join with Customer using PK lookup
    3. Project name, total
 
═══════════════════════════════════════════════════════════════════════════════
STEP 5: EXECUTION (Physical Operations)
═══════════════════════════════════════════════════════════════════════════════
 
Storage Manager Operations:
 
    [1] Index Scan: idx_orders_status
        └── Read index root page (block 1045)
        └── Navigate to 'pending' leaf (blocks 1156, 1157, 1158...)
        └── Collect row pointers to Orders heap (50,000 TIDs)
 
    [2] Heap Access: Orders table
        └── For each TID: read heap page, extract row
        └── Heap pages accessed: ~3,000 pages (due to TID locality)
        └── Extract: customer_id, total_amount
 
    [3] Nested Loop: For each order row
        └── Index Lookup: Customer.PK(customer_id)
            └── Read index (1-3 pages per lookup, cached well)
        └── Heap Access: Customer row
            └── Extract: first_name, last_name
 
    [4] Expression Evaluation
        └── Concatenate: first_name || ' ' || last_name → name
        └── Pass through: total_amount → total
 
    [5] Result Buffer
        └── Accumulate result rows in memory buffer
        └── 50,000 result rows × ~50 bytes = ~2.5MB
 
═══════════════════════════════════════════════════════════════════════════════
STEP 6: RESULT RETURN
═══════════════════════════════════════════════════════════════════════════════
 
Results formatted per external schema:
    Columns: name (VARCHAR), total (DECIMAL)
    
    Row 1: "John Doe", 125.99
    Row 2: "Jane Smith", 89.50
    ...
    Row 50000: "Alex Johnson", 234.00
 
Network transfer to application with column names matching original query.
 
═══════════════════════════════════════════════════════════════════════════════
SUMMARY OF MAPPINGS APPLIED
═══════════════════════════════════════════════════════════════════════════════
 
External → Conceptual:
    - "OrderSummary" → Orders JOIN Customer
    - "name" → first_name || ' ' || last_name
    - "total" → total_amount
    - WHERE status='pending' merged into view query
 
Conceptual → Internal:
    - "Orders table" → heap file + idx_orders_status
    - "Customer table" → heap file + PK index
    - "JOIN" → Nested Loop with Index Lookup
    - "WHERE" → Index Scan condition

System Catalog Structure

The system catalog is where all mapping information is stored. Understanding catalog structure helps you understand how mappings are persisted and accessed.

Key Catalog Tables (PostgreSQL example):

catalog_tables.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
-- ==============================================================
-- POSTGRESQL SYSTEM CATALOG: Key Tables for Mappings
-- ==============================================================
 
-- EXTERNAL LEVEL: View Definitions
-- pg_class: All relations (tables, views, indexes, etc.)
SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname = 'ordersummary';
-- relkind = 'v' indicates a view
 
-- pg_views: View definitions
SELECT viewname, definition
FROM pg_views
WHERE viewname = 'ordersummary';
-- Returns the view's SQL definition for view resolution
 
-- pg_attribute: Column information for external schema
SELECT attname, atttypid, attnum
FROM pg_attribute
WHERE attrelid = 'ordersummary'::regclass;
 
 
-- CONCEPTUAL LEVEL: Table Definitions
-- pg_class: Base tables
SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname IN ('orders', 'customer');
-- relkind = 'r' indicates a regular table
 
-- pg_attribute: Column definitions
SELECT attname, atttypid, attnum, attnotnull
FROM pg_attribute
WHERE attrelid = 'orders'::regclass AND attnum > 0;
 
-- pg_constraint: Integrity constraints
SELECT conname, contype, confrelid
FROM pg_constraint
WHERE conrelid = 'orders'::regclass;
 
 
-- INTERNAL LEVEL: Physical Structure Definitions
-- pg_class: Index information
SELECT relname, relkind, relam, relpages
FROM pg_class
WHERE relname LIKE 'idx_%' AND relkind = 'i';
 
-- pg_index: Index details
SELECT indexrelid::regclass, indrelid::regclass, indkey
FROM pg_index
WHERE indrelid = 'orders'::regclass;
 
-- pg_am: Access methods (B-tree, Hash, etc.)
SELECT amname FROM pg_am WHERE oid = (
    SELECT relam FROM pg_class WHERE relname = 'idx_orders_status'
);
 
-- pg_tablespace: Physical storage location
SELECT spcname, pg_tablespace_location(oid)
FROM pg_tablespace;
 
-- pg_statistic: Statistics for optimizer
SELECT starelid::regclass, staattnum, stanullfrac, stadistinct
FROM pg_statistic
WHERE starelid = 'orders'::regclass;

Catalog Tables by Mapping Level
Architecture Level	Catalog Tables	Information Stored
External	pg_views, pg_class (views), pg_depend	View definitions, column aliases, dependencies
Conceptual	pg_class (tables), pg_attribute, pg_constraint, pg_type	Table structures, columns, constraints, data types
Internal	pg_index, pg_am, pg_tablespace, pg_statistic	Indexes, access methods, storage, optimizer stats
Mappings	pg_depend, pg_rewrite	View-to-table dependencies, rewrite rules

Catalog is Self-Describing

Mapping Maintenance

Mappings aren't static—they evolve as schemas change. The DBMS must maintain mapping consistency, and schema changes must update the relevant mappings.

Mapping Updates During DDL Operations:

DDL Operations and Mapping Updates
DDL Operation	External/Conceptual Mapping	Conceptual/Internal Mapping
CREATE VIEW	New view definition added to catalog	No change (views don't have physical storage)
DROP VIEW	View definition removed; dependent views may become invalid	No change
ALTER TABLE ADD COLUMN	Existing views unaffected (unless they use SELECT *)	No physical mapping change; new column added to heap
ALTER TABLE DROP COLUMN	Views referencing column become invalid; revalidation needed	Column storage deallocated
CREATE INDEX	No change (indexes are transparent)	New index registered; optimizer considers it
DROP INDEX	No change	Index removed from consideration; optimizer adapts
ALTER TABLE ... PARTITION	No change (partitioning is physical)	Partition metadata updated; optimizer uses partition pruning

View Invalidation and Revalidation:

When underlying tables change, views may become invalid. The DBMS handles this in different ways:

view_invalidation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- ==============================================================
-- VIEW INVALIDATION SCENARIOS
-- ==============================================================
 
-- Original view
CREATE VIEW CustomerEmails AS
SELECT customer_id, email FROM Customer;
 
-- SCENARIO 1: Add column (usually OK)
ALTER TABLE Customer ADD COLUMN phone VARCHAR(20);
-- View still works—it doesn't reference 'phone'
 
-- SCENARIO 2: Drop column referenced by view
ALTER TABLE Customer DROP COLUMN email;
-- ERROR: cannot drop column email because other objects depend on it
-- PostgreSQL protects views by default
 
-- SCENARIO 3: Force drop (if CASCADE)
ALTER TABLE Customer DROP COLUMN email CASCADE;
-- View 'CustomerEmails' is also dropped automatically
-- All dependent views are invalidated and removed
 
-- SCENARIO 4: Rename column
ALTER TABLE Customer RENAME COLUMN email TO email_address;
-- PostgreSQL: View becomes invalid (references old name)
-- Oracle: View may auto-recompile with new name (depends on settings)
 
 
-- ==============================================================
-- ORACLE: DEFERRED VALIDATION MODEL
-- ==============================================================
 
-- Oracle marks views as INVALID rather than erroring immediately
-- View status in USER_OBJECTS:
SELECT object_name, status FROM user_objects 
WHERE object_type = 'VIEW' AND object_name = 'CUSTOMEREMAILS';
-- Returns: CUSTOMEREMAILS, INVALID
 
-- Next query against view triggers recompilation:
SELECT * FROM CustomerEmails;
-- If structure compatible: recompiles and executes
-- If incompatible: ORA-04063: view has errors
 
 
-- ==============================================================
-- SQL SERVER: SCHEMA BINDING OPTION
-- ==============================================================
 
-- SCHEMABINDING prevents changes that would break views
CREATE VIEW CustomerEmails WITH SCHEMABINDING AS
SELECT customer_id, email FROM dbo.Customer;
 
-- Now this fails:
ALTER TABLE Customer DROP COLUMN email;
-- Error: Cannot DROP COLUMN because 'CustomerEmails' depends on it
 
-- Provides protection at the cost of flexibility

Dependency Tracking is Critical

Performance Implications of Mappings

Mappings have performance costs. Every query must be parsed, views resolved, plans generated, and operations translated. Understanding these costs helps you design efficient systems.

Performance Considerations

•View Resolution Overhead — Deep view hierarchies (views on views on views) require recursive resolution. Each level adds catalog lookups and query rewriting. Keep view nesting shallow (3-4 levels max).
•Optimizer Complexity — More complex queries (after view expansion) give the optimizer more work. Exponentially more plan candidates exist for many-table joins. Optimizer time limits may cause suboptimal plans.
•Plan Caching Benefits — Most DBMS cache execution plans. The first execution pays the full optimization cost; subsequent executions reuse the plan. Prepared statements maximize caching benefits.
•Statistics Freshness — Conceptual/internal mapping quality depends on accurate statistics. Stale statistics cause poor plans. Regular ANALYZE is essential.
•View Materialization — For expensive views queried frequently, consider materialized views. Trade storage for query speed, accepting data freshness lag.

performance_tips.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
-- ==============================================================
-- MEASURING MAPPING OVERHEAD
-- ==============================================================
 
-- PostgreSQL: Compare planning vs execution time
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT name, total FROM OrderSummary WHERE status = 'pending';
 
-- Look for:
-- Planning Time: 0.245 ms   (view resolution + optimization)
-- Execution Time: 45.123 ms (actual data access)
 
-- If Planning Time is high relative to Execution, consider:
-- 1. Simplifying view structure
-- 2. Using prepared statements
-- 3. Plan caching (PL/pgSQL functions cache plans)
 
 
-- ==============================================================
-- PREPARED STATEMENTS: Amortizing Optimization Cost
-- ==============================================================
 
-- Without preparation: optimize every time
SELECT * FROM OrderSummary WHERE status = $1;  -- Optimize
SELECT * FROM OrderSummary WHERE status = $2;  -- Optimize again
 
-- With preparation: optimize once
PREPARE order_query AS SELECT * FROM OrderSummary WHERE status = $1;
EXECUTE order_query('pending');   -- Uses cached plan
EXECUTE order_query('shipped');   -- Uses cached plan (fast!)
 
-- Application code (e.g., Python psycopg2) uses server-side prepared statements automatically
 
 
-- ==============================================================
-- MATERIALIZED VIEWS: Trading Freshness for Speed
-- ==============================================================
 
-- Complex aggregation view (slow to compute live)
CREATE VIEW DailySalesReport AS
SELECT 
    date_trunc('day', order_date) AS sale_date,
    region,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue,
    AVG(total_amount) AS avg_order
FROM Orders o
JOIN Customer c ON o.customer_id = c.customer_id
JOIN Region r ON c.region_id = r.region_id
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1, 2;
 
-- Query takes 15 seconds (scans millions of rows)
 
-- Convert to materialized view
CREATE MATERIALIZED VIEW DailySalesReport_MV AS
SELECT ...;  -- Same query
 
CREATE INDEX idx_sales_report_date ON DailySalesReport_MV(sale_date);
 
-- Query now takes milliseconds (reads pre-computed results)
 
-- Refresh during off-peak hours
-- Schedule: REFRESH MATERIALIZED VIEW CONCURRENTLY DailySalesReport_MV;

Modern Optimizers Are Good

Summary: Mapping Between Levels

We've explored how mappings implement data independence at a technical level. Here are the essential takeaways:

Key Takeaways

•The system catalog stores all mappings — View definitions, table structures, indexes, and statistics are all persisted in catalog tables.
•External/conceptual mapping uses view resolution — Views are expanded, predicates merged, and column aliases resolved to produce base table queries.
•Conceptual/internal mapping uses the optimizer — The query optimizer selects physical access paths based on available structures and statistics.
•Queries flow through both mappings — From application query to disk I/O becomes a graph of transformations maintained by the DBMS.
•Catalog structure supports all levels — pg_views for external, pg_class/pg_attribute for conceptual, pg_index/pg_statistic for internal.
•Schema changes update mappings — DDL operations modify catalog entries; dependent views may need revalidation.
•Mappings have performance costs — View resolution and optimization have overhead, but caching, preparation, and materialization mitigate it.
•Mappings enable data independence — This translation machinery is what makes it possible to change schemas without changing applications.

Module Complete:

Congratulations! You've completed the Data Independence module. You now understand:

Logical data independence: protecting applications from conceptual schema changes
Physical data independence: protecting logical schemas from storage changes
Why it matters: organizational agility, economics, and system sustainability
How to achieve it: views, triggers, schema evolution patterns, and governance
How it works mechanically: mappings implemented through catalogs, view resolution, and query optimization

This knowledge is fundamental to database architecture and will inform every database design decision you make.

Module Complete

5 / 5