Entity Mapping - Learning Module

Loading content...

0/241

Derived Attributes

Calculated, Not Stored

How old is a person? The answer depends on when you ask. Today, Jane might be 32 years old. Tomorrow, she might be 33. Her age is not an intrinsic value stored somewhere—it's derived from her date of birth and the current date.

This is the essence of a derived attribute: an attribute whose value is computed from other attributes rather than explicitly stored. The value can change even when no update occurs to the base data. Age changes daily. An order's total changes if item prices are updated. An employee's tenure grows with each passing day.

In Entity-Relationship modeling, derived attributes are shown with a dashed oval, indicating their computed nature. But how do we handle them in the relational model? Unlike simple, composite, or multivalued attributes that have clear mapping rules, derived attributes require a decision: compute on demand or store and maintain?

This seemingly simple question opens a rich design space involving computation vs. storage tradeoffs, consistency challenges, and various implementation strategies. Making the wrong choice can lead to stale data, performance bottlenecks, or maintenance nightmares.

What You Will Learn

By the end of this page, you will understand the theoretical foundation of derived attributes, master multiple implementation strategies (computed at query time, computed columns, views, triggers, materialized views), know when to store vs. compute, and be able to make informed design decisions for various scenarios.

Understanding Derived Attributes

A derived attribute (also called a computed attribute or calculated attribute) is an attribute whose value can be determined from other attributes in the database. Unlike stored attributes that represent independently entered data, derived attributes are functionally dependent on other data.

Formal Definition:

An attribute D is derived if there exists a function f such that:

D = f(A₁, A₂, ..., Aₙ, T)

where A₁...Aₙ are stored attributes and T is optionally the current time/date.

Categories of Derived Attributes:

Types of Derived Attributes
Category	Description	Examples
Time-Based	Computed from stored date/time and current time	age (from birth_date), tenure (from hire_date), days_until_expiry
Arithmetic	Mathematical computation from numeric attributes	total_price (quantity × unit_price), profit_margin (revenue - cost)
Aggregate	Summary of multiple related records	order_total (sum of line items), student_gpa (average of grades)
Conditional	Value determined by logic/conditions on other attributes	grade_letter (from numeric grade), status (from date comparisons)
String-Based	Derived from string manipulation	full_name (first + ' ' + last), initials, email domain
Existence-Based	Based on presence/absence of related data	has_orders (EXISTS check), is_premium (based on purchase history)

Common Examples of Derived Attributes

•age — Derived from date_of_birth and current date
•total_order_amount — Sum of (quantity × price) for all order items
•account_balance — Sum of all transaction amounts (deposits - withdrawals)
•full_name — Concatenation of first_name and last_name
•number_of_employees — Count of employees in a department
•years_of_service — Derived from hire_date and current date
•discounted_price — original_price × (1 - discount_percentage)
•is_overdue — due_date < current_date (boolean)
•average_rating — Mean of all review ratings for a product
•tax_amount — subtotal × tax_rate

ER Diagram Notation

In ER diagrams, derived attributes are shown with a dashed (dotted) oval boundary, distinguishing them from stored attributes (solid oval). This visual cue indicates that the attribute should not be directly stored but calculated. When mapping to relational schema, this notation guides our implementation decision.

The Storage Decision: Compute vs. Store

The fundamental question for derived attributes: Should we store the computed value or calculate it on demand?

This is not a trivial decision. Each approach has significant implications for correctness, performance, storage, and maintenance. The choice depends on multiple factors:

Compute on Demand

•Always Current: Value is always up-to-date
•No Redundancy: Single source of truth
•No Update Anomalies: Base data changes automatically reflected
•Less Storage: No column for derived value
•Computation Cost: CPU cycles on every query
•Complex Queries: Joins may be needed for aggregates

Store (Materialize)

•Fast Reads: Pre-computed, no calculation
•Simple Queries: Direct column access, indexable
•Historical Accuracy: Preserves point-in-time values
•Staleness Risk: May become outdated
•Update Overhead: Must maintain on base data changes
•Redundancy: Violates normalized form

Decision Framework:

When to Compute on Demand

•Time-dependent values: Age, tenure, days_remaining—these change without any data update. Storing would require constant refresh.
•Simple calculations: Arithmetic on readily available columns. Little computation cost.
•Infrequently accessed: If the derived value is rarely queried, computing on demand avoids storage overhead.
•Consistency is critical: When staleness is unacceptable, always compute.
•Base data changes frequently: If source attributes update often and maintaining a stored value would be expensive.

When to Store (Materialize)

•Expensive computation: Complex aggregations over large datasets that would slow queries.
•Frequent reads, rare writes: When the derived value is read constantly but base data changes rarely.
•Historical preservation: When you need to know the value at a point in time (what was the order total when placed?).
•Indexing requirement: If you need to search/sort by the derived value, storing enables indexing.
•Reporting performance: Dashboard queries that aggregate across large tables benefit from pre-computation.

The Default Is Not To Store

Unless you have a compelling reason to store a derived value, compute it on demand. Storage introduces redundancy that must be maintained. Stale derived values are a common source of bugs and data inconsistencies. Reserve materialization for cases where performance requirements justify the complexity.

Implementation: Query-Time Computation

The simplest approach to derived attributes: compute them in SELECT statements as needed. The derived value exists only in query results, not in any stored column. This is the recommended default approach.

query_time_computation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
-- Table stores only base attributes (no derived columns)
CREATE TABLE Employee (
    employee_id     SERIAL PRIMARY KEY,
    first_name      VARCHAR(50) NOT NULL,
    last_name       VARCHAR(50) NOT NULL,
    date_of_birth   DATE NOT NULL,
    hire_date       DATE NOT NULL,
    base_salary     DECIMAL(12, 2) NOT NULL,
    bonus_percent   DECIMAL(5, 2) DEFAULT 0
);
 
-- Derived attributes computed at query time:
 
-- Age (time-based derived attribute)
SELECT 
    employee_id,
    first_name,
    last_name,
    date_of_birth,
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, date_of_birth)) AS age
FROM Employee;
 
-- Years of tenure (time-based)
SELECT 
    employee_id,
    first_name || ' ' || last_name AS full_name,  -- String-derived
    hire_date,
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, hire_date)) AS years_of_service,
    -- More precise: total months
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, hire_date)) * 12 +
        EXTRACT(MONTH FROM AGE(CURRENT_DATE, hire_date)) AS months_of_service
FROM Employee;
 
-- Total compensation (arithmetic derived)
SELECT 
    employee_id,
    first_name,
    base_salary,
    bonus_percent,
    base_salary * (1 + bonus_percent / 100) AS total_compensation
FROM Employee;
 
-- Conditional derived: Salary grade
SELECT 
    employee_id,
    first_name,
    base_salary,
    CASE 
        WHEN base_salary >= 150000 THEN 'Executive'
        WHEN base_salary >= 100000 THEN 'Senior'
        WHEN base_salary >= 70000 THEN 'Mid-Level'
        WHEN base_salary >= 45000 THEN 'Junior'
        ELSE 'Entry'
    END AS salary_grade
FROM Employee;
 
-- Aggregate derived: Department employee count
SELECT 
    d.department_id,
    d.department_name,
    COUNT(e.employee_id) AS employee_count,
    AVG(e.base_salary) AS average_salary,
    SUM(e.base_salary) AS total_payroll
FROM Department d
LEFT JOIN Employee e ON d.department_id = e.department_id
GROUP BY d.department_id, d.department_name;
 
-- Order total (aggregate derived from line items)
SELECT 
    o.order_id,
    o.order_date,
    o.customer_id,
    SUM(li.quantity * li.unit_price) AS subtotal,
    SUM(li.quantity * li.unit_price) * 0.08 AS tax,
    SUM(li.quantity * li.unit_price) * 1.08 AS total
FROM Order o
JOIN Line_Item li ON o.order_id = li.order_id
GROUP BY o.order_id, o.order_date, o.customer_id;

Query Reuse

If you find yourself copying the same derived attribute calculation across many queries, consider encapsulating it in a VIEW or a user-defined function. This centralizes the logic and prevents inconsistencies from slightly different implementations.

Implementation: Views

Views provide a way to encapsulate derived attribute calculations into a reusable, named query. Applications can query the view as if it were a table, and the derived values appear as regular columns—but they're computed on demand, not stored.

Benefits of Views:

Centralized logic: Definition in one place, used everywhere
Abstraction: Applications don't need to know the calculation
No storage overhead: Computed per query
Always current: No staleness concerns
Security: Can expose derived values without revealing base tables

views_derived.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
-- View that includes derived attributes
CREATE VIEW Employee_Extended AS
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    e.first_name || ' ' || e.last_name AS full_name,  -- Derived
    e.date_of_birth,
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, e.date_of_birth)) AS age,  -- Derived
    e.hire_date,
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, e.hire_date)) AS years_of_service,  -- Derived
    e.base_salary,
    e.bonus_percent,
    e.base_salary * (1 + COALESCE(e.bonus_percent, 0) / 100) AS total_compensation,  -- Derived
    CASE 
        WHEN e.base_salary >= 100000 THEN 'Senior'
        WHEN e.base_salary >= 70000 THEN 'Mid'
        ELSE 'Junior'
    END AS salary_tier,  -- Derived conditional
    d.department_name
FROM Employee e
LEFT JOIN Department d ON e.department_id = d.department_id;
 
-- Application queries the view like a table
SELECT employee_id, full_name, age, years_of_service, salary_tier
FROM Employee_Extended
WHERE age > 30 AND years_of_service >= 5;
 
-- Order view with computed totals
CREATE VIEW Order_Summary AS
SELECT 
    o.order_id,
    o.order_date,
    o.customer_id,
    c.customer_name,
    COUNT(li.line_item_id) AS item_count,
    SUM(li.quantity) AS total_units,
    SUM(li.quantity * li.unit_price) AS subtotal,
    SUM(li.quantity * li.unit_price * li.discount_percent / 100) AS total_discount,
    SUM(li.quantity * li.unit_price * (1 - li.discount_percent / 100)) AS total_after_discount,
    SUM(li.quantity * li.unit_price * (1 - li.discount_percent / 100)) * o.tax_rate AS tax_amount,
    SUM(li.quantity * li.unit_price * (1 - li.discount_percent / 100)) * (1 + o.tax_rate) AS grand_total,
    o.status,
    CASE 
        WHEN o.status = 'pending' AND o.order_date < CURRENT_DATE - INTERVAL '7 days' 
        THEN TRUE ELSE FALSE 
    END AS is_stale
FROM Order o
JOIN Customer c ON o.customer_id = c.customer_id
LEFT JOIN Line_Item li ON o.order_id = li.order_id
GROUP BY o.order_id, o.order_date, o.customer_id, c.customer_name, o.tax_rate, o.status;
 
-- Product view with inventory status
CREATE VIEW Product_Status AS
SELECT 
    p.product_id,
    p.product_name,
    p.unit_price,
    p.units_in_stock,
    p.reorder_level,
    CASE 
        WHEN p.units_in_stock = 0 THEN 'Out of Stock'
        WHEN p.units_in_stock <= p.reorder_level THEN 'Low Stock'
        ELSE 'In Stock'
    END AS stock_status,  -- Derived
    p.units_in_stock > p.reorder_level AS needs_reorder  -- Derived
FROM Product p;

View Limitations

Standard views cannot be indexed on derived columns (they're not stored). For high-performance filtering on derived values, consider computed columns (next section) or materialized views. Also, complex views with many joins may have performance issues on large datasets.

Implementation: Computed (Generated) Columns

Many modern databases support computed columns (also called generated columns): columns defined by an expression rather than stored data. They appear as regular columns but are computed (either on read or stored automatically). This provides the convenience of a column with the correctness of on-demand computation.

Two Types:

Virtual (VIRTUAL): Computed on read, not stored. No storage overhead.
Stored (STORED): Computed and stored on write. Requires storage but enables indexing.

computed_columns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
-- PostgreSQL: GENERATED columns (v12+)
CREATE TABLE Employee (
    employee_id     SERIAL PRIMARY KEY,
    first_name      VARCHAR(50) NOT NULL,
    last_name       VARCHAR(50) NOT NULL,
    date_of_birth   DATE NOT NULL,
    hire_date       DATE NOT NULL,
    base_salary     DECIMAL(12, 2) NOT NULL,
    bonus_percent   DECIMAL(5, 2) DEFAULT 0,
    
    -- Virtual generated column (computed on read)
    -- PostgreSQL requires STORED for generated columns currently
    full_name       VARCHAR(101) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);
 
-- MySQL: VIRTUAL and STORED generated columns
CREATE TABLE Employee_MySQL (
    employee_id     INT AUTO_INCREMENT PRIMARY KEY,
    first_name      VARCHAR(50) NOT NULL,
    last_name       VARCHAR(50) NOT NULL,
    date_of_birth   DATE NOT NULL,
    quantity        INT NOT NULL,
    unit_price      DECIMAL(10, 2) NOT NULL,
    
    -- Virtual (computed on read, not stored, not indexable)
    full_name       VARCHAR(101) AS (CONCAT(first_name, ' ', last_name)) VIRTUAL,
    
    -- Stored (computed on write, stored, indexable)
    line_total      DECIMAL(12, 2) AS (quantity * unit_price) STORED
);
 
-- SQL Server: Computed columns
CREATE TABLE Employee_SQLServer (
    employee_id     INT IDENTITY(1,1) PRIMARY KEY,
    first_name      NVARCHAR(50) NOT NULL,
    last_name       NVARCHAR(50) NOT NULL,
    date_of_birth   DATE NOT NULL,
    
    -- Non-persisted (computed on read)
    full_name       AS (first_name + ' ' + last_name),
    
    -- Persisted (stored, can be indexed)
    age             AS (DATEDIFF(YEAR, date_of_birth, GETDATE())) PERSISTED,
    
    -- Can create index on PERSISTED computed column
    -- CREATE INDEX idx_age ON Employee_SQLServer(age);
);
 
-- Oracle: Virtual columns (11g+)
CREATE TABLE Employee_Oracle (
    employee_id     NUMBER PRIMARY KEY,
    first_name      VARCHAR2(50) NOT NULL,
    last_name       VARCHAR2(50) NOT NULL,
    base_salary     NUMBER(12, 2) NOT NULL,
    bonus_percent   NUMBER(5, 2) DEFAULT 0,
    
    -- Virtual column
    full_name       VARCHAR2(101) GENERATED ALWAYS AS (first_name || ' ' || last_name) VIRTUAL,
    total_comp      NUMBER(12, 2) GENERATED ALWAYS AS (base_salary * (1 + NVL(bonus_percent, 0) / 100)) VIRTUAL
);
 
-- Computed columns with expressions
CREATE TABLE Order_Line (
    line_id         SERIAL PRIMARY KEY,
    order_id        INTEGER NOT NULL,
    product_id      INTEGER NOT NULL,
    quantity        INTEGER NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10, 2) NOT NULL,
    discount_pct    DECIMAL(5, 2) DEFAULT 0,
    
    -- Computed: Gross line total
    gross_total     DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price) STORED,
    
    -- Computed: Discount amount
    discount_amount DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price * discount_pct / 100) STORED,
    
    -- Computed: Net total after discount
    net_total       DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price * (1 - discount_pct / 100)) STORED
);

Computed Column Support by Database
Database	Feature Name	Virtual Support	Stored/Persisted	Indexable
PostgreSQL	GENERATED ALWAYS AS	STORED only (v12+)	Yes	Yes (stored)
MySQL 5.7+	GENERATED COLUMNS	VIRTUAL or STORED	Yes	Yes (stored)
SQL Server	Computed Columns	Default (non-persisted)	PERSISTED keyword	Yes (persisted)
Oracle 11g+	Virtual Columns	VIRTUAL (default)	Limited	Yes
SQLite 3.31+	Generated Columns	VIRTUAL or STORED	Yes	Yes (stored)

When to Use Computed Columns

Use computed columns when: (1) the derived value depends only on columns in the same row (no aggregates, no other table data), (2) you want the value to appear as a regular column without view overhead, (3) you need index support on the derived value (use STORED/PERSISTED). They're ideal for simple arithmetic, string concatenation, and conditional expressions.

Implementation: Triggers for Stored Derived Values

When you decide to store a derived value—for performance, indexing, or historical reasons—you must ensure it stays synchronized with the base data. Triggers provide automatic maintenance: whenever base data changes, the trigger recalculates and updates the derived value.

triggers_derived.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
-- Scenario: Order table with stored total (aggregate derived)
-- Order total = SUM of all line item totals
-- We store it for fast access and indexing, maintain via triggers
 
CREATE TABLE Order (
    order_id        SERIAL PRIMARY KEY,
    customer_id     INTEGER NOT NULL,
    order_date      DATE NOT NULL DEFAULT CURRENT_DATE,
    
    -- Stored derived attribute (aggregate from Line_Item)
    order_total     DECIMAL(12, 2) NOT NULL DEFAULT 0,
    item_count      INTEGER NOT NULL DEFAULT 0,
    
    status          VARCHAR(20) DEFAULT 'pending'
);
 
CREATE TABLE Line_Item (
    line_item_id    SERIAL PRIMARY KEY,
    order_id        INTEGER NOT NULL REFERENCES Order(order_id) ON DELETE CASCADE,
    product_id      INTEGER NOT NULL,
    quantity        INTEGER NOT NULL,
    unit_price      DECIMAL(10, 2) NOT NULL,
    line_total      DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price) STORED
);
 
-- Trigger function to update order totals
CREATE OR REPLACE FUNCTION update_order_totals()
RETURNS TRIGGER AS $$
DECLARE
    v_order_id INTEGER;
BEGIN
    -- Determine which order was affected
    IF TG_OP = 'DELETE' THEN
        v_order_id := OLD.order_id;
    ELSE
        v_order_id := NEW.order_id;
    END IF;
    
    -- Recalculate and update the order totals
    UPDATE Order
    SET 
        order_total = COALESCE((
            SELECT SUM(quantity * unit_price)
            FROM Line_Item
            WHERE order_id = v_order_id
        ), 0),
        item_count = (
            SELECT COUNT(*)
            FROM Line_Item
            WHERE order_id = v_order_id
        )
    WHERE order_id = v_order_id;
    
    RETURN NULL;  -- Result is ignored for AFTER triggers
END;
$$ LANGUAGE plpgsql;
 
-- Attach trigger to Line_Item for INSERT, UPDATE, DELETE
CREATE TRIGGER trg_update_order_totals
AFTER INSERT OR UPDATE OR DELETE ON Line_Item
FOR EACH ROW EXECUTE FUNCTION update_order_totals();
 
-- Now any change to Line_Item automatically updates Order.order_total
 
-- Test:
INSERT INTO Order (customer_id) VALUES (101) RETURNING order_id;  -- Returns 1
INSERT INTO Line_Item (order_id, product_id, quantity, unit_price) VALUES (1, 1001, 2, 29.99);
INSERT INTO Line_Item (order_id, product_id, quantity, unit_price) VALUES (1, 1002, 1, 49.99);
 
SELECT order_id, order_total, item_count FROM Order WHERE order_id = 1;
-- order_total = 109.97, item_count = 2
 
--------------------------------------------------------------------------------
-- Example 2: Department employee count and total salary
--------------------------------------------------------------------------------
 
CREATE TABLE Department (
    department_id   SERIAL PRIMARY KEY,
    department_name VARCHAR(100) NOT NULL,
    
    -- Stored derived attributes (maintained by triggers)
    employee_count  INTEGER NOT NULL DEFAULT 0,
    total_salary    DECIMAL(14, 2) NOT NULL DEFAULT 0
);
 
CREATE OR REPLACE FUNCTION update_department_stats()
RETURNS TRIGGER AS $$
DECLARE
    v_old_dept INTEGER;
    v_new_dept INTEGER;
BEGIN
    v_old_dept := COALESCE(OLD.department_id, NULL);
    v_new_dept := COALESCE(NEW.department_id, NULL);
    
    -- Update old department (if employee left it)
    IF v_old_dept IS NOT NULL THEN
        UPDATE Department
        SET 
            employee_count = (SELECT COUNT(*) FROM Employee WHERE department_id = v_old_dept),
            total_salary = COALESCE((SELECT SUM(base_salary) FROM Employee WHERE department_id = v_old_dept), 0)
        WHERE department_id = v_old_dept;
    END IF;
    
    -- Update new department (if employee joined or salary changed)
    IF v_new_dept IS NOT NULL AND (v_old_dept IS NULL OR v_old_dept != v_new_dept OR TG_OP = 'UPDATE') THEN
        UPDATE Department
        SET 
            employee_count = (SELECT COUNT(*) FROM Employee WHERE department_id = v_new_dept),
            total_salary = COALESCE((SELECT SUM(base_salary) FROM Employee WHERE department_id = v_new_dept), 0)
        WHERE department_id = v_new_dept;
    END IF;
    
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_update_dept_stats
AFTER INSERT OR UPDATE OR DELETE ON Employee
FOR EACH ROW EXECUTE FUNCTION update_department_stats();

Trigger Overhead

Triggers add overhead to every INSERT, UPDATE, DELETE on the base table. For high-transaction tables, this can impact performance. Consider: (1) batch updates instead of per-row triggers where possible, (2) async refresh for non-critical derived values, (3) whether the read performance gain justifies the write performance cost.

Implementation: Materialized Views

Materialized views (MViews) are stored query results that can be refreshed periodically or on demand. Unlike regular views (computed on every query), materialized views persist their results, enabling indexing and fast reads. They're ideal for expensive aggregate computations that don't need real-time accuracy.

materialized_views.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
-- Materialized view for expensive aggregate (PostgreSQL)
CREATE MATERIALIZED VIEW Sales_Summary AS
SELECT 
    DATE_TRUNC('month', o.order_date) AS month,
    p.category_id,
    c.category_name,
    COUNT(DISTINCT o.order_id) AS order_count,
    COUNT(DISTINCT o.customer_id) AS customer_count,
    SUM(li.quantity) AS units_sold,
    SUM(li.quantity * li.unit_price) AS gross_revenue,
    SUM(li.quantity * li.unit_price * (1 - li.discount_pct / 100)) AS net_revenue,
    AVG(li.quantity * li.unit_price) AS avg_order_value
FROM Order o
JOIN Line_Item li ON o.order_id = li.order_id
JOIN Product p ON li.product_id = p.product_id
JOIN Category c ON p.category_id = c.category_id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('month', o.order_date), p.category_id, c.category_name;
 
-- Create indexes on the materialized view (huge performance boost)
CREATE INDEX idx_sales_summary_month ON Sales_Summary(month);
CREATE INDEX idx_sales_summary_category ON Sales_Summary(category_id);
 
-- Query the materialized view (very fast, like a table)
SELECT month, category_name, net_revenue
FROM Sales_Summary
WHERE month >= '2024-01-01'
ORDER BY month, net_revenue DESC;
 
-- Refresh the materialized view (can be scheduled)
REFRESH MATERIALIZED VIEW Sales_Summary;
 
-- Concurrent refresh (doesn't lock for reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY Sales_Summary;
-- Note: Requires unique index for CONCURRENTLY
 
--------------------------------------------------------------------------------
-- Materialized view with derived attributes per entity
--------------------------------------------------------------------------------
 
CREATE MATERIALIZED VIEW Customer_Stats AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.signup_date,
    
    -- Derived: Customer lifetime
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, c.signup_date)) AS years_as_customer,
    
    -- Derived aggregates
    COUNT(o.order_id) AS total_orders,
    COALESCE(SUM(o.order_total), 0) AS lifetime_value,
    AVG(o.order_total) AS avg_order_value,
    MAX(o.order_date) AS last_order_date,
    MIN(o.order_date) AS first_order_date,
    
    -- Derived: Days since last order
    EXTRACT(DAY FROM AGE(CURRENT_DATE, MAX(o.order_date))) AS days_since_last_order,
    
    -- Derived: Customer tier
    CASE 
        WHEN SUM(o.order_total) >= 10000 THEN 'Platinum'
        WHEN SUM(o.order_total) >= 5000 THEN 'Gold'
        WHEN SUM(o.order_total) >= 1000 THEN 'Silver'
        ELSE 'Bronze'
    END AS customer_tier
    
FROM Customer c
LEFT JOIN Order o ON c.customer_id = o.customer_id AND o.status = 'completed'
GROUP BY c.customer_id, c.customer_name, c.signup_date;
 
-- Fast lookups for customer segmentation
CREATE INDEX idx_customer_stats_tier ON Customer_Stats(customer_tier);
CREATE INDEX idx_customer_stats_ltv ON Customer_Stats(lifetime_value);
 
-- Schedule refresh (using pg_cron extension or external scheduler)
-- Example: Refresh every night at 2 AM
-- SELECT cron.schedule('0 2 * * *', 'REFRESH MATERIALIZED VIEW CONCURRENTLY Customer_Stats');

Materialized View Support by Database
Database	Feature	Refresh Options	Indexable
PostgreSQL	MATERIALIZED VIEW	Manual, Concurrent	Yes
Oracle	MATERIALIZED VIEW	On Commit, On Demand, Scheduled	Yes
SQL Server	Indexed Views	Auto-maintained	Yes (clustered index required)
MySQL	Not native	Simulate with tables + events	N/A
SQLite	Not supported	N/A	N/A

Staleness Tolerance

Materialized views are best when some staleness is acceptable. If your reporting dashboard shows 'data as of last night,' materialized views are perfect. If you need sub-second accuracy, you'll need either real-time computation or more sophisticated incremental refresh strategies.

Summary: Derived Attribute Mapping

Derived attributes are computed from other data rather than independently stored. The key design decision is whether to compute on demand or store and maintain. Let's consolidate what we've learned:

Key Takeaways

•Default: Don't Store: Unless there's a compelling reason, compute derived values on demand to ensure correctness and avoid maintenance overhead.
•Types Vary: Time-based (age), arithmetic (totals), aggregate (counts, sums), conditional (status), and string-based (full_name) all require different handling.
•Query-Time Computation: Simplest approach—compute in SELECT statements. Always current, no storage, but repeated in every query.
•Views Encapsulate Logic: Provide reusable, named queries that present derived values as columns. No storage; computed per query.
•Computed Columns: Database-level support for derived expressions. Virtual (no storage) or stored (indexable). Limited to same-row data.
•Triggers Maintain Stored Values: When you choose to store, triggers ensure the derived value updates when base data changes.
•Materialized Views for Aggregates: Pre-compute expensive queries with periodic refresh. Indexable for fast reads; tolerate staleness.
•Storage Enables Indexing: If you need to filter/sort by derived values efficiently, storage (computed column STORED, materialized view) is required.

What's Next:

We've now covered all attribute types: simple, composite (flattened), multivalued (separate tables), and derived (computed). The final piece of entity mapping is Key Attributes—understanding how to properly map and implement primary keys, candidate keys, and alternate keys in the relational schema.

Page Complete

You now understand when and how to handle derived attributes. You can make informed decisions about computing vs. storing, implement solutions using views, computed columns, triggers, or materialized views, and understand the tradeoffs of each approach. Next, we complete the entity mapping module with key attributes.