Dbms Concepts - Learning Module

Loading content...

0/241

Data Abstraction Levels

Hiding Complexity Through Abstraction

When you execute a simple query like SELECT name FROM customers WHERE city = 'Paris', you think in terms of tables, columns, and conditions. You don't think about disk blocks, B-tree traversals, or page fetches. You certainly don't consider the physical byte layout of the name field on the SSD.

This separation between what you perceive and what actually happens is not accidental—it's the result of careful architectural design. Data abstraction is the principle that allows database users to work with logical concepts while the DBMS handles physical complexity beneath the surface.

In this page, we'll explore the three levels of data abstraction that form the ANSI-SPARC architecture, a foundational framework that has guided DBMS design since the 1970s.

What You Will Learn

By the end of this page, you'll understand the three levels of data abstraction (physical, logical, and view), how they relate to each other, why this separation matters, and how it enables data independence—one of the most important properties of database systems.

Why Data Abstraction Matters

Before diving into the levels themselves, let's understand why data abstraction is so important. Consider what happens without it:

The Pre-Abstraction World:

In early file-based systems, application programs dealt directly with physical storage details:

Programs contained hardcoded knowledge of file formats, record layouts, and storage locations
Changing how data was stored required modifying every application
Each application maintained its own view of data, leading to inconsistencies
Users needed to understand physical organization to access data

This tight coupling between applications and physical storage created enormous maintenance burdens. A simple change—like switching from fixed-length to variable-length records—could require rewriting dozens of programs.

Benefits of Data Abstraction

•Simplified User Interface — Users work with familiar concepts (tables, rows, columns) rather than physical storage details (blocks, sectors, pointers).
•Physical Data Independence — Physical storage can be reorganized, reindexed, or migrated without changing applications.
•Logical Data Independence — The logical schema can evolve (new tables, modified relationships) with minimal application impact.
•Multiple User Views — Different users see different perspectives of the same underlying data, tailored to their needs and access rights.
•Security Through Abstraction — Sensitive details can be hidden at different levels, enforcing access control naturally.
•Parallel Development — Database administrators can optimize storage while developers focus on application logic, working independently.

The Core Insight

Abstraction decouples 'what' from 'how.' Users describe WHAT data they want; the DBMS determines HOW to retrieve it. This separation is central to database technology and distinguishes it from file-based data management.

The ANSI-SPARC Three-Level Architecture

In 1975, the ANSI (American National Standards Institute) SPARC (Standards Planning and Requirements Committee) proposed a three-level architecture for database systems that has become the standard framework for understanding data abstraction.

This architecture defines three distinct levels of abstraction, each describing the same data from a different perspective:

Converting Mermaid diagram...

The Three Levels of Abstraction
Level	Also Called	Describes	Concerned With	Users
External Level	View Level	Individual user views	What specific users/groups see	End users, applications
Logical Level	Conceptual Level	Full logical schema	What data exists and its relationships	DBAs, developers
Physical Level	Internal Level	Physical storage	How data is stored on disk	DBMS internals, system admins

Key Architectural Principles:

Each level has its own schema — The physical schema describes storage structures. The logical schema describes tables and relationships. External schemas describe user views.
Mappings connect levels — The DBMS maintains mappings between levels. When a user accesses a view, the mapping translates it to logical operations, which are further mapped to physical operations.
Changes at one level don't necessarily affect others — This is data independence. Physical reorganization doesn't affect logical schema. Schema changes may not affect all views.
Only the physical level touches actual storage — All higher levels are logical constructs that the DBMS translates to physical operations.

Historical Context

The ANSI-SPARC architecture was revolutionary in 1975. Before this, there was no standard framework for thinking about database organization. This three-level model remains the conceptual foundation for all modern DBMS, even though modern systems have evolved significantly beyond the original proposal.

The Physical Level (Internal Level)

The Physical Level is the lowest level of abstraction, describing how data is actually stored on storage devices. This level is concerned with the physical implementation details that users and most developers never need to consider.

What the Physical Level Defines:

Physical Level Components

•Storage Structures — Whether data is stored in heap files, sorted files, hash files, or clustered together with related data.
•Record Formats — The byte-level layout of individual records: fixed vs. variable length fields, null bitmap location, record headers.
•Page Organization — How records are organized within pages, slot directories, free space management, page headers.
•Index Structures — B+ tree indexes, hash indexes, bitmap indexes—their implementation details and how they map to table data.
•File Allocation — How database files are allocated on disk, extent sizes, growth policies, file organization.
•Partitioning — How large tables are divided across multiple physical storage units for performance and manageability.
•Compression — Whether and how data is compressed at the page or record level.
•Encryption — How data is encrypted at rest, key management, and encryption scope.

Physical Schema Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- PostgreSQL: Physical storage configuration
CREATE TABLE large_orders (
    order_id        BIGINT NOT NULL,
    customer_id     INT NOT NULL,
    order_date      DATE NOT NULL,
    total_amount    DECIMAL(12,2),
    status          VARCHAR(20),
    order_data      JSONB
) 
PARTITION BY RANGE (order_date);  -- Physical partitioning
 
-- Create partitions (physical storage units)
CREATE TABLE orders_2024_q1 PARTITION OF large_orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01')
    TABLESPACE fast_ssd;  -- Physical storage location
 
CREATE TABLE orders_2024_q2 PARTITION OF large_orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01')
    TABLESPACE standard_storage;
 
-- Physical indexing structures
CREATE INDEX idx_orders_customer 
    ON large_orders USING btree (customer_id);  -- B-tree structure
 
CREATE INDEX idx_orders_data 
    ON large_orders USING gin (order_data);     -- GIN for JSONB
 
-- Oracle: Explicit physical organization
CREATE TABLE employees (
    emp_id    NUMBER(10),
    name      VARCHAR2(100),
    dept_id   NUMBER(5)
)
ORGANIZATION INDEX              -- Store in B-tree, not heap
TABLESPACE employee_data        -- Physical location
STORAGE (
    INITIAL 64K
    NEXT 64K
    PCTINCREASE 0
    BUFFER_POOL KEEP           -- Keep in buffer cache
)
COMPRESS FOR OLTP;             -- Physical compression

Record Layout Example:

Consider how a simple record might be laid out at the physical level:

Record for: Employee (id=1001, name='Alice Chen', dept=5, salary=85000)

| Header (8 bytes) | id (4 bytes) | name (var) | dept (4 bytes) | salary (8 bytes) |
| RID, flags, etc  |    1001      |  len:10 +  |       5        |      85000       |
|                  |              | 'Alice Chen'|                |                  |

The physical level must track:

Where the record starts and ends
How to interpret each field's bytes
Where to find variable-length data
Which bits indicate NULL values

Physical Level Complexity

Physical level decisions have profound performance implications. Choosing the right index type, partitioning strategy, or compression algorithm can mean the difference between queries taking seconds versus hours. This is why database administration is a specialized skill.

The Logical Level (Conceptual Level)

The Logical Level describes the entire database as a collection of logical structures—tables, columns, relationships, and constraints—without reference to physical implementation. This is what most developers and data modelers work with.

The Logical Schema describes WHAT data exists, not HOW it's stored.

Logical Level Components

•Tables/Relations — The fundamental structure for organizing data. Each table represents an entity type with named columns and typed attributes.
•Columns/Attributes — Named fields with data types (INTEGER, VARCHAR, DATE, etc.) that define what values can be stored.
•Primary Keys — Uniquely identify each row. The logical guarantee that no two rows have the same key value.
•Foreign Keys — Establish relationships between tables. The logical representation of references between entities.
•Constraints — Business rules encoded in the schema: NOT NULL, UNIQUE, CHECK conditions, domain restrictions.
•Relationships — One-to-one, one-to-many, many-to-many associations between entities.
•Derived Data — Computed columns, materialized views, and other data derived from base tables.

Logical Schema Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Logical schema: Describes data structures and relationships
-- No physical implementation details
 
-- Entity: Department
CREATE TABLE departments (
    department_id   INT PRIMARY KEY,
    name            VARCHAR(100) NOT NULL UNIQUE,
    budget          DECIMAL(15,2) CHECK (budget >= 0),
    location        VARCHAR(100),
    created_date    DATE DEFAULT CURRENT_DATE
);
 
-- Entity: Employee
CREATE TABLE employees (
    employee_id     INT PRIMARY KEY,
    first_name      VARCHAR(50) NOT NULL,
    last_name       VARCHAR(50) NOT NULL,
    email           VARCHAR(100) UNIQUE,
    hire_date       DATE NOT NULL,
    salary          DECIMAL(10,2) CHECK (salary > 0),
    department_id   INT REFERENCES departments(department_id),
    manager_id      INT REFERENCES employees(employee_id),
    
    -- Self-reference constraint at logical level
    CONSTRAINT no_self_management CHECK (manager_id <> employee_id)
);
 
-- Entity: Project (Many-to-Many with Employees)
CREATE TABLE projects (
    project_id      INT PRIMARY KEY,
    name            VARCHAR(200) NOT NULL,
    start_date      DATE,
    end_date        DATE,
    budget          DECIMAL(15,2),
    
    CONSTRAINT valid_dates CHECK (end_date >= start_date)
);
 
-- Relationship: Employee <-> Project (Junction Table)
CREATE TABLE employee_projects (
    employee_id     INT REFERENCES employees(employee_id),
    project_id      INT REFERENCES projects(project_id),
    role            VARCHAR(50),
    hours_allocated DECIMAL(5,2),
    
    PRIMARY KEY (employee_id, project_id)
);
 
-- Notice: Nothing about indexes, storage, partitioning
-- Pure logical description of data structure

Logical vs. Physical: The Separation

Notice what's absent from the logical schema:

No mention of indexes (that's physical optimization)
No tablespace or file references
No partitioning schemes
No compression settings
No storage allocation parameters

The logical level answers: "What entities exist in our business domain, what attributes do they have, and how do they relate?"

The physical level (separately) answers: "How do we efficiently store and retrieve this data on our hardware?"

This separation means a DBA can add an index, change partitioning, or move data to faster storage—all without changing the logical schema that applications depend on.

Thinking at the Right Level

When designing databases, start at the logical level. Model your entities and relationships without worrying about performance. Once the logical design is stable, layer on physical optimizations. This approach produces cleaner designs that are easier to maintain and optimize.

The External Level (View Level)

The External Level is the highest level of abstraction, representing the database as seen by individual users or applications. While the logical level describes the complete database, external schemas describe subsets and transformations relevant to specific use cases.

Each user group sees only what they need to see.

Purposes of External Views

•Simplification — Hide complexity by presenting only the columns and tables relevant to a user's task. A sales representative doesn't need to see manufacturing data.
•Security — Restrict access to sensitive data. HR can see salary information; other departments see only non-sensitive employee data.
•Customization — Present data in forms convenient for specific applications. Compute derived values, rename columns, combine tables.
•Consistency — Provide stable interfaces to applications. If the logical schema changes, views can maintain the old interface.
•Legacy Support — When schemas evolve, views can present old structures to applications that haven't been updated.

External Views Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- Base tables exist at the logical level
-- Views provide tailored external perspectives
 
-- View for HR Department: Full employee details
CREATE VIEW hr_employee_details AS
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    e.email,
    e.hire_date,
    e.salary,          -- HR can see salary
    d.name AS department_name,
    m.first_name || ' ' || m.last_name AS manager_name,
    DATE_PART('year', AGE(CURRENT_DATE, e.hire_date)) AS years_employed
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
LEFT JOIN employees m ON e.manager_id = m.employee_id;
 
-- View for General Staff: Limited employee info (no salary)
CREATE VIEW staff_directory AS
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    e.email,
    d.name AS department
    -- Note: salary is hidden
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;
 
-- View for Finance: Department budget analysis
CREATE VIEW department_budget_analysis AS
SELECT 
    d.department_id,
    d.name,
    d.budget AS allocated_budget,
    COUNT(e.employee_id) AS employee_count,
    SUM(e.salary) AS total_salary_cost,
    d.budget - SUM(e.salary) AS remaining_budget,
    ROUND(SUM(e.salary) / d.budget * 100, 2) AS budget_utilization_pct
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_id, d.name, d.budget;
 
-- View for Reporting: Flattened project data
CREATE VIEW project_team_report AS
SELECT 
    p.name AS project_name,
    p.start_date,
    p.end_date,
    p.budget AS project_budget,
    e.first_name || ' ' || e.last_name AS team_member,
    ep.role,
    ep.hours_allocated,
    d.name AS member_department
FROM projects p
JOIN employee_projects ep ON p.project_id = ep.project_id
JOIN employees e ON ep.employee_id = e.employee_id
JOIN departments d ON e.department_id = d.department_id
ORDER BY p.name, ep.role;

View Advantages

•Simplified queries for users
•Built-in security through hiding
•Schema change insulation
•Computed columns readily available
•Consistent naming conventions

View Limitations

•Updates through views are restricted
•Complex views may hinder optimization
•Views don't store data (no caching)
•Deeply nested views hurt performance
•Schema changes may break views

Updatable Views

Simple views (single table, no aggregation, no DISTINCT) can often support INSERT, UPDATE, and DELETE operations. Complex views typically support only SELECT. The DBMS determines updatability based on the view definition. When in doubt, check with INSTEAD OF triggers for complex update logic.

Mappings Between Levels

The three levels don't exist in isolation—they are connected by mappings that allow the DBMS to translate between representations. These mappings are what enable data independence.

Converting Mermaid diagram...

Types of Mappings

•External/Conceptual Mapping — Defines how each view maps to the logical schema. This includes: which logical columns map to view columns, how column names are translated, what derived values are computed, and which rows are visible (row-level security).
•Conceptual/Internal Mapping — Defines how logical structures map to physical storage. This includes: which files store which tables, how columns map to record fields, which indexes exist for which columns, and how tables are partitioned across storage.

How Mappings Enable Query Translation:

When a user queries a view, the DBMS uses mappings to translate:

User Query (External Level):
SELECT department, total_sales 
FROM sales_report 
WHERE region = 'West';

↓ External/Conceptual Mapping ↓

Logical Query (Conceptual Level):
SELECT d.name, SUM(o.amount)
FROM orders o
JOIN departments d ON o.dept_id = d.id
WHERE d.region = 'West'
GROUP BY d.name;

↓ Conceptual/Internal Mapping ↓

Physical Operations (Internal Level):
1. Use index idx_dept_region to find West departments
2. For each dept_id, probe hash index on orders.dept_id
3. Sum amounts, aggregate by department
4. Return results

The user wrote a simple query against a view. The DBMS, using its mappings, translated this into optimized physical operations.

Transparent Translation

Users don't see these mappings—they're invisible. A query against a view feels just like a query against a table. This transparency is the essence of good abstraction.

Data Independence: The Ultimate Goal

Data Independence is the ability to modify the schema at one level without affecting the schema at the next higher level. It's the fundamental benefit that the three-level architecture provides.

Data independence comes in two forms:

Physical Data Independence

The ability to modify the physical schema without changing the logical schema or applications.

What You Can Change:

Add or remove indexes
Change storage formats (row-store to column-store)
Reorganize data across files
Partition tables differently
Move data to different storage devices
Change compression algorithms
Migrate to different hardware

What Doesn't Change:

Table definitions
Application queries
User views
Business logic

Physical Independence Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Application query (unchanged throughout)
SELECT c.name, SUM(o.total)
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.region = 'West'
GROUP BY c.name;
 
-- DAY 1: No indexes, full table scans
-- Query works, but slow (10 seconds)
 
-- DAY 30: DBA adds indexes
CREATE INDEX idx_cust_region ON customers(region);
CREATE INDEX idx_orders_cust ON orders(customer_id);
-- Query unchanged, now 100ms
 
-- DAY 90: DBA partitions orders table by date
ALTER TABLE orders PARTITION BY RANGE (order_date);
-- Query unchanged, still works
 
-- DAY 180: Move to SSD storage, add compression
ALTER TABLE orders TABLESPACE fast_ssd;
ALTER TABLE orders SET (COMPRESSION = 'zstd');
-- Query unchanged, now 50ms
 
-- Application code NEVER changed!

Independence Has Limits

Data independence is not absolute. Major schema restructuring may require application changes. Removing tables or columns breaks dependent queries. The goal is to minimize, not eliminate, the impact of changes. Good design maximizes independence; poor design creates tight coupling.

Summary: Data Abstraction Levels

We've explored the three-level architecture that enables data abstraction in database systems. Let's consolidate the key insights:

Key Takeaways

•Three levels of abstraction exist: Physical (storage), Logical (schema), and External (views). Each serves different users and concerns.
•Physical level handles storage structures, record formats, indexes, and file organization—invisible to users but crucial for performance.
•Logical level describes the complete database structure—tables, columns, relationships, constraints—what developers typically work with.
•External level provides user-specific views, hiding irrelevant or sensitive data and presenting customized perspectives.
•Mappings connect levels, translating between representations and enabling transparent query processing across abstraction boundaries.
•Physical data independence means storage can change without affecting logical schema or applications.
•Logical data independence means schema can evolve with minimal application impact, especially when views provide insulation.
•The ANSI-SPARC architecture established this framework in 1975 and remains the conceptual foundation for modern DBMS design.

What's Next:

With a solid understanding of abstraction levels, we'll explore Data Models—the formal frameworks for describing data structures. Data models provide the vocabulary and rules for defining logical schemas, from the relational model that dominates today to historical and emerging alternatives.

Page Complete

You now understand how data abstraction works in database systems. This knowledge is fundamental—every time you write a query, create a view, or configure storage, you're operating at one of these abstraction levels. Understanding their separation enables cleaner designs and more effective database usage.

Data Abstraction Levels

Hiding Complexity Through Abstraction

In this page, we'll explore the three levels of data abstraction that form the ANSI-SPARC architecture, a foundational framework that has guided DBMS design since the 1970s.

What You Will Learn

Why Data Abstraction Matters

Before diving into the levels themselves, let's understand why data abstraction is so important. Consider what happens without it:

The Pre-Abstraction World:

In early file-based systems, application programs dealt directly with physical storage details:

Programs contained hardcoded knowledge of file formats, record layouts, and storage locations
Changing how data was stored required modifying every application
Each application maintained its own view of data, leading to inconsistencies
Users needed to understand physical organization to access data

Benefits of Data Abstraction

•Simplified User Interface — Users work with familiar concepts (tables, rows, columns) rather than physical storage details (blocks, sectors, pointers).
•Physical Data Independence — Physical storage can be reorganized, reindexed, or migrated without changing applications.
•Logical Data Independence — The logical schema can evolve (new tables, modified relationships) with minimal application impact.
•Multiple User Views — Different users see different perspectives of the same underlying data, tailored to their needs and access rights.
•Security Through Abstraction — Sensitive details can be hidden at different levels, enforcing access control naturally.
•Parallel Development — Database administrators can optimize storage while developers focus on application logic, working independently.

The Core Insight

The ANSI-SPARC Three-Level Architecture

This architecture defines three distinct levels of abstraction, each describing the same data from a different perspective:

Converting Mermaid diagram...

The Three Levels of Abstraction
Level	Also Called	Describes	Concerned With	Users
External Level	View Level	Individual user views	What specific users/groups see	End users, applications
Logical Level	Conceptual Level	Full logical schema	What data exists and its relationships	DBAs, developers
Physical Level	Internal Level	Physical storage	How data is stored on disk	DBMS internals, system admins

Key Architectural Principles:

Each level has its own schema — The physical schema describes storage structures. The logical schema describes tables and relationships. External schemas describe user views.
Mappings connect levels — The DBMS maintains mappings between levels. When a user accesses a view, the mapping translates it to logical operations, which are further mapped to physical operations.
Changes at one level don't necessarily affect others — This is data independence. Physical reorganization doesn't affect logical schema. Schema changes may not affect all views.
Only the physical level touches actual storage — All higher levels are logical constructs that the DBMS translates to physical operations.

Historical Context

The Physical Level (Internal Level)

What the Physical Level Defines:

Physical Level Components

•Storage Structures — Whether data is stored in heap files, sorted files, hash files, or clustered together with related data.
•Record Formats — The byte-level layout of individual records: fixed vs. variable length fields, null bitmap location, record headers.
•Page Organization — How records are organized within pages, slot directories, free space management, page headers.
•Index Structures — B+ tree indexes, hash indexes, bitmap indexes—their implementation details and how they map to table data.
•File Allocation — How database files are allocated on disk, extent sizes, growth policies, file organization.
•Partitioning — How large tables are divided across multiple physical storage units for performance and manageability.
•Compression — Whether and how data is compressed at the page or record level.
•Encryption — How data is encrypted at rest, key management, and encryption scope.

Physical Schema Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- PostgreSQL: Physical storage configuration
CREATE TABLE large_orders (
    order_id        BIGINT NOT NULL,
    customer_id     INT NOT NULL,
    order_date      DATE NOT NULL,
    total_amount    DECIMAL(12,2),
    status          VARCHAR(20),
    order_data      JSONB
) 
PARTITION BY RANGE (order_date);  -- Physical partitioning
 
-- Create partitions (physical storage units)
CREATE TABLE orders_2024_q1 PARTITION OF large_orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01')
    TABLESPACE fast_ssd;  -- Physical storage location
 
CREATE TABLE orders_2024_q2 PARTITION OF large_orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01')
    TABLESPACE standard_storage;
 
-- Physical indexing structures
CREATE INDEX idx_orders_customer 
    ON large_orders USING btree (customer_id);  -- B-tree structure
 
CREATE INDEX idx_orders_data 
    ON large_orders USING gin (order_data);     -- GIN for JSONB
 
-- Oracle: Explicit physical organization
CREATE TABLE employees (
    emp_id    NUMBER(10),
    name      VARCHAR2(100),
    dept_id   NUMBER(5)
)
ORGANIZATION INDEX              -- Store in B-tree, not heap
TABLESPACE employee_data        -- Physical location
STORAGE (
    INITIAL 64K
    NEXT 64K
    PCTINCREASE 0
    BUFFER_POOL KEEP           -- Keep in buffer cache
)
COMPRESS FOR OLTP;             -- Physical compression

Record Layout Example:

Consider how a simple record might be laid out at the physical level:

Record for: Employee (id=1001, name='Alice Chen', dept=5, salary=85000)

| Header (8 bytes) | id (4 bytes) | name (var) | dept (4 bytes) | salary (8 bytes) |
| RID, flags, etc  |    1001      |  len:10 +  |       5        |      85000       |
|                  |              | 'Alice Chen'|                |                  |

The physical level must track:

Where the record starts and ends
How to interpret each field's bytes
Where to find variable-length data
Which bits indicate NULL values

Physical Level Complexity

The Logical Level (Conceptual Level)

The Logical Schema describes WHAT data exists, not HOW it's stored.

Logical Level Components

•Tables/Relations — The fundamental structure for organizing data. Each table represents an entity type with named columns and typed attributes.
•Columns/Attributes — Named fields with data types (INTEGER, VARCHAR, DATE, etc.) that define what values can be stored.
•Primary Keys — Uniquely identify each row. The logical guarantee that no two rows have the same key value.
•Foreign Keys — Establish relationships between tables. The logical representation of references between entities.
•Constraints — Business rules encoded in the schema: NOT NULL, UNIQUE, CHECK conditions, domain restrictions.
•Relationships — One-to-one, one-to-many, many-to-many associations between entities.
•Derived Data — Computed columns, materialized views, and other data derived from base tables.

Logical Schema Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Logical schema: Describes data structures and relationships
-- No physical implementation details
 
-- Entity: Department
CREATE TABLE departments (
    department_id   INT PRIMARY KEY,
    name            VARCHAR(100) NOT NULL UNIQUE,
    budget          DECIMAL(15,2) CHECK (budget >= 0),
    location        VARCHAR(100),
    created_date    DATE DEFAULT CURRENT_DATE
);
 
-- Entity: Employee
CREATE TABLE employees (
    employee_id     INT PRIMARY KEY,
    first_name      VARCHAR(50) NOT NULL,
    last_name       VARCHAR(50) NOT NULL,
    email           VARCHAR(100) UNIQUE,
    hire_date       DATE NOT NULL,
    salary          DECIMAL(10,2) CHECK (salary > 0),
    department_id   INT REFERENCES departments(department_id),
    manager_id      INT REFERENCES employees(employee_id),
    
    -- Self-reference constraint at logical level
    CONSTRAINT no_self_management CHECK (manager_id <> employee_id)
);
 
-- Entity: Project (Many-to-Many with Employees)
CREATE TABLE projects (
    project_id      INT PRIMARY KEY,
    name            VARCHAR(200) NOT NULL,
    start_date      DATE,
    end_date        DATE,
    budget          DECIMAL(15,2),
    
    CONSTRAINT valid_dates CHECK (end_date >= start_date)
);
 
-- Relationship: Employee <-> Project (Junction Table)
CREATE TABLE employee_projects (
    employee_id     INT REFERENCES employees(employee_id),
    project_id      INT REFERENCES projects(project_id),
    role            VARCHAR(50),
    hours_allocated DECIMAL(5,2),
    
    PRIMARY KEY (employee_id, project_id)
);
 
-- Notice: Nothing about indexes, storage, partitioning
-- Pure logical description of data structure

Logical vs. Physical: The Separation

Notice what's absent from the logical schema:

No mention of indexes (that's physical optimization)
No tablespace or file references
No partitioning schemes
No compression settings
No storage allocation parameters

The logical level answers: "What entities exist in our business domain, what attributes do they have, and how do they relate?"

The physical level (separately) answers: "How do we efficiently store and retrieve this data on our hardware?"

This separation means a DBA can add an index, change partitioning, or move data to faster storage—all without changing the logical schema that applications depend on.

Thinking at the Right Level

The External Level (View Level)

Each user group sees only what they need to see.

Purposes of External Views

•Simplification — Hide complexity by presenting only the columns and tables relevant to a user's task. A sales representative doesn't need to see manufacturing data.
•Security — Restrict access to sensitive data. HR can see salary information; other departments see only non-sensitive employee data.
•Customization — Present data in forms convenient for specific applications. Compute derived values, rename columns, combine tables.
•Consistency — Provide stable interfaces to applications. If the logical schema changes, views can maintain the old interface.
•Legacy Support — When schemas evolve, views can present old structures to applications that haven't been updated.

External Views Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- Base tables exist at the logical level
-- Views provide tailored external perspectives
 
-- View for HR Department: Full employee details
CREATE VIEW hr_employee_details AS
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    e.email,
    e.hire_date,
    e.salary,          -- HR can see salary
    d.name AS department_name,
    m.first_name || ' ' || m.last_name AS manager_name,
    DATE_PART('year', AGE(CURRENT_DATE, e.hire_date)) AS years_employed
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
LEFT JOIN employees m ON e.manager_id = m.employee_id;
 
-- View for General Staff: Limited employee info (no salary)
CREATE VIEW staff_directory AS
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    e.email,
    d.name AS department
    -- Note: salary is hidden
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;
 
-- View for Finance: Department budget analysis
CREATE VIEW department_budget_analysis AS
SELECT 
    d.department_id,
    d.name,
    d.budget AS allocated_budget,
    COUNT(e.employee_id) AS employee_count,
    SUM(e.salary) AS total_salary_cost,
    d.budget - SUM(e.salary) AS remaining_budget,
    ROUND(SUM(e.salary) / d.budget * 100, 2) AS budget_utilization_pct
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_id, d.name, d.budget;
 
-- View for Reporting: Flattened project data
CREATE VIEW project_team_report AS
SELECT 
    p.name AS project_name,
    p.start_date,
    p.end_date,
    p.budget AS project_budget,
    e.first_name || ' ' || e.last_name AS team_member,
    ep.role,
    ep.hours_allocated,
    d.name AS member_department
FROM projects p
JOIN employee_projects ep ON p.project_id = ep.project_id
JOIN employees e ON ep.employee_id = e.employee_id
JOIN departments d ON e.department_id = d.department_id
ORDER BY p.name, ep.role;

View Advantages

•Simplified queries for users
•Built-in security through hiding
•Schema change insulation
•Computed columns readily available
•Consistent naming conventions

View Limitations

•Updates through views are restricted
•Complex views may hinder optimization
•Views don't store data (no caching)
•Deeply nested views hurt performance
•Schema changes may break views

Updatable Views

Mappings Between Levels

The three levels don't exist in isolation—they are connected by mappings that allow the DBMS to translate between representations. These mappings are what enable data independence.

Converting Mermaid diagram...

Types of Mappings

•External/Conceptual Mapping — Defines how each view maps to the logical schema. This includes: which logical columns map to view columns, how column names are translated, what derived values are computed, and which rows are visible (row-level security).
•Conceptual/Internal Mapping — Defines how logical structures map to physical storage. This includes: which files store which tables, how columns map to record fields, which indexes exist for which columns, and how tables are partitioned across storage.

How Mappings Enable Query Translation:

When a user queries a view, the DBMS uses mappings to translate:

User Query (External Level):
SELECT department, total_sales 
FROM sales_report 
WHERE region = 'West';

↓ External/Conceptual Mapping ↓

Logical Query (Conceptual Level):
SELECT d.name, SUM(o.amount)
FROM orders o
JOIN departments d ON o.dept_id = d.id
WHERE d.region = 'West'
GROUP BY d.name;

↓ Conceptual/Internal Mapping ↓

Physical Operations (Internal Level):
1. Use index idx_dept_region to find West departments
2. For each dept_id, probe hash index on orders.dept_id
3. Sum amounts, aggregate by department
4. Return results

The user wrote a simple query against a view. The DBMS, using its mappings, translated this into optimized physical operations.

Transparent Translation

Users don't see these mappings—they're invisible. A query against a view feels just like a query against a table. This transparency is the essence of good abstraction.

Data Independence: The Ultimate Goal

Data Independence is the ability to modify the schema at one level without affecting the schema at the next higher level. It's the fundamental benefit that the three-level architecture provides.

Data independence comes in two forms:

Physical Data Independence

The ability to modify the physical schema without changing the logical schema or applications.

What You Can Change:

Add or remove indexes
Change storage formats (row-store to column-store)
Reorganize data across files
Partition tables differently
Move data to different storage devices
Change compression algorithms
Migrate to different hardware

What Doesn't Change:

Table definitions
Application queries
User views
Business logic

Physical Independence Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Application query (unchanged throughout)
SELECT c.name, SUM(o.total)
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.region = 'West'
GROUP BY c.name;
 
-- DAY 1: No indexes, full table scans
-- Query works, but slow (10 seconds)
 
-- DAY 30: DBA adds indexes
CREATE INDEX idx_cust_region ON customers(region);
CREATE INDEX idx_orders_cust ON orders(customer_id);
-- Query unchanged, now 100ms
 
-- DAY 90: DBA partitions orders table by date
ALTER TABLE orders PARTITION BY RANGE (order_date);
-- Query unchanged, still works
 
-- DAY 180: Move to SSD storage, add compression
ALTER TABLE orders TABLESPACE fast_ssd;
ALTER TABLE orders SET (COMPRESSION = 'zstd');
-- Query unchanged, now 50ms
 
-- Application code NEVER changed!

Independence Has Limits

Summary: Data Abstraction Levels

We've explored the three-level architecture that enables data abstraction in database systems. Let's consolidate the key insights:

Key Takeaways

•Three levels of abstraction exist: Physical (storage), Logical (schema), and External (views). Each serves different users and concerns.
•Physical level handles storage structures, record formats, indexes, and file organization—invisible to users but crucial for performance.
•Logical level describes the complete database structure—tables, columns, relationships, constraints—what developers typically work with.
•External level provides user-specific views, hiding irrelevant or sensitive data and presenting customized perspectives.
•Mappings connect levels, translating between representations and enabling transparent query processing across abstraction boundaries.
•Physical data independence means storage can change without affecting logical schema or applications.
•Logical data independence means schema can evolve with minimal application impact, especially when views provide insulation.
•The ANSI-SPARC architecture established this framework in 1975 and remains the conceptual foundation for modern DBMS design.

What's Next:

Page Complete