Loading content...
How old is a person? The answer depends on when you ask. Today, Jane might be 32 years old. Tomorrow, she might be 33. Her age is not an intrinsic value stored somewhere—it's derived from her date of birth and the current date.
This is the essence of a derived attribute: an attribute whose value is computed from other attributes rather than explicitly stored. The value can change even when no update occurs to the base data. Age changes daily. An order's total changes if item prices are updated. An employee's tenure grows with each passing day.
In Entity-Relationship modeling, derived attributes are shown with a dashed oval, indicating their computed nature. But how do we handle them in the relational model? Unlike simple, composite, or multivalued attributes that have clear mapping rules, derived attributes require a decision: compute on demand or store and maintain?
This seemingly simple question opens a rich design space involving computation vs. storage tradeoffs, consistency challenges, and various implementation strategies. Making the wrong choice can lead to stale data, performance bottlenecks, or maintenance nightmares.
By the end of this page, you will understand the theoretical foundation of derived attributes, master multiple implementation strategies (computed at query time, computed columns, views, triggers, materialized views), know when to store vs. compute, and be able to make informed design decisions for various scenarios.
A derived attribute (also called a computed attribute or calculated attribute) is an attribute whose value can be determined from other attributes in the database. Unlike stored attributes that represent independently entered data, derived attributes are functionally dependent on other data.
Formal Definition:
An attribute D is derived if there exists a function f such that:
D = f(A₁, A₂, ..., Aₙ, T)
where A₁...Aₙ are stored attributes and T is optionally the current time/date.
Categories of Derived Attributes:
| Category | Description | Examples |
|---|---|---|
| Time-Based | Computed from stored date/time and current time | age (from birth_date), tenure (from hire_date), days_until_expiry |
| Arithmetic | Mathematical computation from numeric attributes | total_price (quantity × unit_price), profit_margin (revenue - cost) |
| Aggregate | Summary of multiple related records | order_total (sum of line items), student_gpa (average of grades) |
| Conditional | Value determined by logic/conditions on other attributes | grade_letter (from numeric grade), status (from date comparisons) |
| String-Based | Derived from string manipulation | full_name (first + ' ' + last), initials, email domain |
| Existence-Based | Based on presence/absence of related data | has_orders (EXISTS check), is_premium (based on purchase history) |
In ER diagrams, derived attributes are shown with a dashed (dotted) oval boundary, distinguishing them from stored attributes (solid oval). This visual cue indicates that the attribute should not be directly stored but calculated. When mapping to relational schema, this notation guides our implementation decision.
The fundamental question for derived attributes: Should we store the computed value or calculate it on demand?
This is not a trivial decision. Each approach has significant implications for correctness, performance, storage, and maintenance. The choice depends on multiple factors:
Decision Framework:
Unless you have a compelling reason to store a derived value, compute it on demand. Storage introduces redundancy that must be maintained. Stale derived values are a common source of bugs and data inconsistencies. Reserve materialization for cases where performance requirements justify the complexity.
The simplest approach to derived attributes: compute them in SELECT statements as needed. The derived value exists only in query results, not in any stored column. This is the recommended default approach.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
-- Table stores only base attributes (no derived columns)CREATE TABLE Employee ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, date_of_birth DATE NOT NULL, hire_date DATE NOT NULL, base_salary DECIMAL(12, 2) NOT NULL, bonus_percent DECIMAL(5, 2) DEFAULT 0); -- Derived attributes computed at query time: -- Age (time-based derived attribute)SELECT employee_id, first_name, last_name, date_of_birth, EXTRACT(YEAR FROM AGE(CURRENT_DATE, date_of_birth)) AS ageFROM Employee; -- Years of tenure (time-based)SELECT employee_id, first_name || ' ' || last_name AS full_name, -- String-derived hire_date, EXTRACT(YEAR FROM AGE(CURRENT_DATE, hire_date)) AS years_of_service, -- More precise: total months EXTRACT(YEAR FROM AGE(CURRENT_DATE, hire_date)) * 12 + EXTRACT(MONTH FROM AGE(CURRENT_DATE, hire_date)) AS months_of_serviceFROM Employee; -- Total compensation (arithmetic derived)SELECT employee_id, first_name, base_salary, bonus_percent, base_salary * (1 + bonus_percent / 100) AS total_compensationFROM Employee; -- Conditional derived: Salary gradeSELECT employee_id, first_name, base_salary, CASE WHEN base_salary >= 150000 THEN 'Executive' WHEN base_salary >= 100000 THEN 'Senior' WHEN base_salary >= 70000 THEN 'Mid-Level' WHEN base_salary >= 45000 THEN 'Junior' ELSE 'Entry' END AS salary_gradeFROM Employee; -- Aggregate derived: Department employee countSELECT d.department_id, d.department_name, COUNT(e.employee_id) AS employee_count, AVG(e.base_salary) AS average_salary, SUM(e.base_salary) AS total_payrollFROM Department dLEFT JOIN Employee e ON d.department_id = e.department_idGROUP BY d.department_id, d.department_name; -- Order total (aggregate derived from line items)SELECT o.order_id, o.order_date, o.customer_id, SUM(li.quantity * li.unit_price) AS subtotal, SUM(li.quantity * li.unit_price) * 0.08 AS tax, SUM(li.quantity * li.unit_price) * 1.08 AS totalFROM Order oJOIN Line_Item li ON o.order_id = li.order_idGROUP BY o.order_id, o.order_date, o.customer_id;If you find yourself copying the same derived attribute calculation across many queries, consider encapsulating it in a VIEW or a user-defined function. This centralizes the logic and prevents inconsistencies from slightly different implementations.
Views provide a way to encapsulate derived attribute calculations into a reusable, named query. Applications can query the view as if it were a table, and the derived values appear as regular columns—but they're computed on demand, not stored.
Benefits of Views:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
-- View that includes derived attributesCREATE VIEW Employee_Extended ASSELECT e.employee_id, e.first_name, e.last_name, e.first_name || ' ' || e.last_name AS full_name, -- Derived e.date_of_birth, EXTRACT(YEAR FROM AGE(CURRENT_DATE, e.date_of_birth)) AS age, -- Derived e.hire_date, EXTRACT(YEAR FROM AGE(CURRENT_DATE, e.hire_date)) AS years_of_service, -- Derived e.base_salary, e.bonus_percent, e.base_salary * (1 + COALESCE(e.bonus_percent, 0) / 100) AS total_compensation, -- Derived CASE WHEN e.base_salary >= 100000 THEN 'Senior' WHEN e.base_salary >= 70000 THEN 'Mid' ELSE 'Junior' END AS salary_tier, -- Derived conditional d.department_nameFROM Employee eLEFT JOIN Department d ON e.department_id = d.department_id; -- Application queries the view like a tableSELECT employee_id, full_name, age, years_of_service, salary_tierFROM Employee_ExtendedWHERE age > 30 AND years_of_service >= 5; -- Order view with computed totalsCREATE VIEW Order_Summary ASSELECT o.order_id, o.order_date, o.customer_id, c.customer_name, COUNT(li.line_item_id) AS item_count, SUM(li.quantity) AS total_units, SUM(li.quantity * li.unit_price) AS subtotal, SUM(li.quantity * li.unit_price * li.discount_percent / 100) AS total_discount, SUM(li.quantity * li.unit_price * (1 - li.discount_percent / 100)) AS total_after_discount, SUM(li.quantity * li.unit_price * (1 - li.discount_percent / 100)) * o.tax_rate AS tax_amount, SUM(li.quantity * li.unit_price * (1 - li.discount_percent / 100)) * (1 + o.tax_rate) AS grand_total, o.status, CASE WHEN o.status = 'pending' AND o.order_date < CURRENT_DATE - INTERVAL '7 days' THEN TRUE ELSE FALSE END AS is_staleFROM Order oJOIN Customer c ON o.customer_id = c.customer_idLEFT JOIN Line_Item li ON o.order_id = li.order_idGROUP BY o.order_id, o.order_date, o.customer_id, c.customer_name, o.tax_rate, o.status; -- Product view with inventory statusCREATE VIEW Product_Status ASSELECT p.product_id, p.product_name, p.unit_price, p.units_in_stock, p.reorder_level, CASE WHEN p.units_in_stock = 0 THEN 'Out of Stock' WHEN p.units_in_stock <= p.reorder_level THEN 'Low Stock' ELSE 'In Stock' END AS stock_status, -- Derived p.units_in_stock > p.reorder_level AS needs_reorder -- DerivedFROM Product p;Standard views cannot be indexed on derived columns (they're not stored). For high-performance filtering on derived values, consider computed columns (next section) or materialized views. Also, complex views with many joins may have performance issues on large datasets.
Many modern databases support computed columns (also called generated columns): columns defined by an expression rather than stored data. They appear as regular columns but are computed (either on read or stored automatically). This provides the convenience of a column with the correctness of on-demand computation.
Two Types:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
-- PostgreSQL: GENERATED columns (v12+)CREATE TABLE Employee ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, date_of_birth DATE NOT NULL, hire_date DATE NOT NULL, base_salary DECIMAL(12, 2) NOT NULL, bonus_percent DECIMAL(5, 2) DEFAULT 0, -- Virtual generated column (computed on read) -- PostgreSQL requires STORED for generated columns currently full_name VARCHAR(101) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED); -- MySQL: VIRTUAL and STORED generated columnsCREATE TABLE Employee_MySQL ( employee_id INT AUTO_INCREMENT PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, date_of_birth DATE NOT NULL, quantity INT NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, -- Virtual (computed on read, not stored, not indexable) full_name VARCHAR(101) AS (CONCAT(first_name, ' ', last_name)) VIRTUAL, -- Stored (computed on write, stored, indexable) line_total DECIMAL(12, 2) AS (quantity * unit_price) STORED); -- SQL Server: Computed columnsCREATE TABLE Employee_SQLServer ( employee_id INT IDENTITY(1,1) PRIMARY KEY, first_name NVARCHAR(50) NOT NULL, last_name NVARCHAR(50) NOT NULL, date_of_birth DATE NOT NULL, -- Non-persisted (computed on read) full_name AS (first_name + ' ' + last_name), -- Persisted (stored, can be indexed) age AS (DATEDIFF(YEAR, date_of_birth, GETDATE())) PERSISTED, -- Can create index on PERSISTED computed column -- CREATE INDEX idx_age ON Employee_SQLServer(age);); -- Oracle: Virtual columns (11g+)CREATE TABLE Employee_Oracle ( employee_id NUMBER PRIMARY KEY, first_name VARCHAR2(50) NOT NULL, last_name VARCHAR2(50) NOT NULL, base_salary NUMBER(12, 2) NOT NULL, bonus_percent NUMBER(5, 2) DEFAULT 0, -- Virtual column full_name VARCHAR2(101) GENERATED ALWAYS AS (first_name || ' ' || last_name) VIRTUAL, total_comp NUMBER(12, 2) GENERATED ALWAYS AS (base_salary * (1 + NVL(bonus_percent, 0) / 100)) VIRTUAL); -- Computed columns with expressionsCREATE TABLE Order_Line ( line_id SERIAL PRIMARY KEY, order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, quantity INTEGER NOT NULL CHECK (quantity > 0), unit_price DECIMAL(10, 2) NOT NULL, discount_pct DECIMAL(5, 2) DEFAULT 0, -- Computed: Gross line total gross_total DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price) STORED, -- Computed: Discount amount discount_amount DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price * discount_pct / 100) STORED, -- Computed: Net total after discount net_total DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price * (1 - discount_pct / 100)) STORED);| Database | Feature Name | Virtual Support | Stored/Persisted | Indexable |
|---|---|---|---|---|
| PostgreSQL | GENERATED ALWAYS AS | STORED only (v12+) | Yes | Yes (stored) |
| MySQL 5.7+ | GENERATED COLUMNS | VIRTUAL or STORED | Yes | Yes (stored) |
| SQL Server | Computed Columns | Default (non-persisted) | PERSISTED keyword | Yes (persisted) |
| Oracle 11g+ | Virtual Columns | VIRTUAL (default) | Limited | Yes |
| SQLite 3.31+ | Generated Columns | VIRTUAL or STORED | Yes | Yes (stored) |
Use computed columns when: (1) the derived value depends only on columns in the same row (no aggregates, no other table data), (2) you want the value to appear as a regular column without view overhead, (3) you need index support on the derived value (use STORED/PERSISTED). They're ideal for simple arithmetic, string concatenation, and conditional expressions.
When you decide to store a derived value—for performance, indexing, or historical reasons—you must ensure it stays synchronized with the base data. Triggers provide automatic maintenance: whenever base data changes, the trigger recalculates and updates the derived value.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
-- Scenario: Order table with stored total (aggregate derived)-- Order total = SUM of all line item totals-- We store it for fast access and indexing, maintain via triggers CREATE TABLE Order ( order_id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL, order_date DATE NOT NULL DEFAULT CURRENT_DATE, -- Stored derived attribute (aggregate from Line_Item) order_total DECIMAL(12, 2) NOT NULL DEFAULT 0, item_count INTEGER NOT NULL DEFAULT 0, status VARCHAR(20) DEFAULT 'pending'); CREATE TABLE Line_Item ( line_item_id SERIAL PRIMARY KEY, order_id INTEGER NOT NULL REFERENCES Order(order_id) ON DELETE CASCADE, product_id INTEGER NOT NULL, quantity INTEGER NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, line_total DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price) STORED); -- Trigger function to update order totalsCREATE OR REPLACE FUNCTION update_order_totals()RETURNS TRIGGER AS $$DECLARE v_order_id INTEGER;BEGIN -- Determine which order was affected IF TG_OP = 'DELETE' THEN v_order_id := OLD.order_id; ELSE v_order_id := NEW.order_id; END IF; -- Recalculate and update the order totals UPDATE Order SET order_total = COALESCE(( SELECT SUM(quantity * unit_price) FROM Line_Item WHERE order_id = v_order_id ), 0), item_count = ( SELECT COUNT(*) FROM Line_Item WHERE order_id = v_order_id ) WHERE order_id = v_order_id; RETURN NULL; -- Result is ignored for AFTER triggersEND;$$ LANGUAGE plpgsql; -- Attach trigger to Line_Item for INSERT, UPDATE, DELETECREATE TRIGGER trg_update_order_totalsAFTER INSERT OR UPDATE OR DELETE ON Line_ItemFOR EACH ROW EXECUTE FUNCTION update_order_totals(); -- Now any change to Line_Item automatically updates Order.order_total -- Test:INSERT INTO Order (customer_id) VALUES (101) RETURNING order_id; -- Returns 1INSERT INTO Line_Item (order_id, product_id, quantity, unit_price) VALUES (1, 1001, 2, 29.99);INSERT INTO Line_Item (order_id, product_id, quantity, unit_price) VALUES (1, 1002, 1, 49.99); SELECT order_id, order_total, item_count FROM Order WHERE order_id = 1;-- order_total = 109.97, item_count = 2 ---------------------------------------------------------------------------------- Example 2: Department employee count and total salary-------------------------------------------------------------------------------- CREATE TABLE Department ( department_id SERIAL PRIMARY KEY, department_name VARCHAR(100) NOT NULL, -- Stored derived attributes (maintained by triggers) employee_count INTEGER NOT NULL DEFAULT 0, total_salary DECIMAL(14, 2) NOT NULL DEFAULT 0); CREATE OR REPLACE FUNCTION update_department_stats()RETURNS TRIGGER AS $$DECLARE v_old_dept INTEGER; v_new_dept INTEGER;BEGIN v_old_dept := COALESCE(OLD.department_id, NULL); v_new_dept := COALESCE(NEW.department_id, NULL); -- Update old department (if employee left it) IF v_old_dept IS NOT NULL THEN UPDATE Department SET employee_count = (SELECT COUNT(*) FROM Employee WHERE department_id = v_old_dept), total_salary = COALESCE((SELECT SUM(base_salary) FROM Employee WHERE department_id = v_old_dept), 0) WHERE department_id = v_old_dept; END IF; -- Update new department (if employee joined or salary changed) IF v_new_dept IS NOT NULL AND (v_old_dept IS NULL OR v_old_dept != v_new_dept OR TG_OP = 'UPDATE') THEN UPDATE Department SET employee_count = (SELECT COUNT(*) FROM Employee WHERE department_id = v_new_dept), total_salary = COALESCE((SELECT SUM(base_salary) FROM Employee WHERE department_id = v_new_dept), 0) WHERE department_id = v_new_dept; END IF; RETURN NULL;END;$$ LANGUAGE plpgsql; CREATE TRIGGER trg_update_dept_statsAFTER INSERT OR UPDATE OR DELETE ON EmployeeFOR EACH ROW EXECUTE FUNCTION update_department_stats();Triggers add overhead to every INSERT, UPDATE, DELETE on the base table. For high-transaction tables, this can impact performance. Consider: (1) batch updates instead of per-row triggers where possible, (2) async refresh for non-critical derived values, (3) whether the read performance gain justifies the write performance cost.
Materialized views (MViews) are stored query results that can be refreshed periodically or on demand. Unlike regular views (computed on every query), materialized views persist their results, enabling indexing and fast reads. They're ideal for expensive aggregate computations that don't need real-time accuracy.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
-- Materialized view for expensive aggregate (PostgreSQL)CREATE MATERIALIZED VIEW Sales_Summary ASSELECT DATE_TRUNC('month', o.order_date) AS month, p.category_id, c.category_name, COUNT(DISTINCT o.order_id) AS order_count, COUNT(DISTINCT o.customer_id) AS customer_count, SUM(li.quantity) AS units_sold, SUM(li.quantity * li.unit_price) AS gross_revenue, SUM(li.quantity * li.unit_price * (1 - li.discount_pct / 100)) AS net_revenue, AVG(li.quantity * li.unit_price) AS avg_order_valueFROM Order oJOIN Line_Item li ON o.order_id = li.order_idJOIN Product p ON li.product_id = p.product_idJOIN Category c ON p.category_id = c.category_idWHERE o.status = 'completed'GROUP BY DATE_TRUNC('month', o.order_date), p.category_id, c.category_name; -- Create indexes on the materialized view (huge performance boost)CREATE INDEX idx_sales_summary_month ON Sales_Summary(month);CREATE INDEX idx_sales_summary_category ON Sales_Summary(category_id); -- Query the materialized view (very fast, like a table)SELECT month, category_name, net_revenueFROM Sales_SummaryWHERE month >= '2024-01-01'ORDER BY month, net_revenue DESC; -- Refresh the materialized view (can be scheduled)REFRESH MATERIALIZED VIEW Sales_Summary; -- Concurrent refresh (doesn't lock for reads during refresh)REFRESH MATERIALIZED VIEW CONCURRENTLY Sales_Summary;-- Note: Requires unique index for CONCURRENTLY ---------------------------------------------------------------------------------- Materialized view with derived attributes per entity-------------------------------------------------------------------------------- CREATE MATERIALIZED VIEW Customer_Stats ASSELECT c.customer_id, c.customer_name, c.signup_date, -- Derived: Customer lifetime EXTRACT(YEAR FROM AGE(CURRENT_DATE, c.signup_date)) AS years_as_customer, -- Derived aggregates COUNT(o.order_id) AS total_orders, COALESCE(SUM(o.order_total), 0) AS lifetime_value, AVG(o.order_total) AS avg_order_value, MAX(o.order_date) AS last_order_date, MIN(o.order_date) AS first_order_date, -- Derived: Days since last order EXTRACT(DAY FROM AGE(CURRENT_DATE, MAX(o.order_date))) AS days_since_last_order, -- Derived: Customer tier CASE WHEN SUM(o.order_total) >= 10000 THEN 'Platinum' WHEN SUM(o.order_total) >= 5000 THEN 'Gold' WHEN SUM(o.order_total) >= 1000 THEN 'Silver' ELSE 'Bronze' END AS customer_tier FROM Customer cLEFT JOIN Order o ON c.customer_id = o.customer_id AND o.status = 'completed'GROUP BY c.customer_id, c.customer_name, c.signup_date; -- Fast lookups for customer segmentationCREATE INDEX idx_customer_stats_tier ON Customer_Stats(customer_tier);CREATE INDEX idx_customer_stats_ltv ON Customer_Stats(lifetime_value); -- Schedule refresh (using pg_cron extension or external scheduler)-- Example: Refresh every night at 2 AM-- SELECT cron.schedule('0 2 * * *', 'REFRESH MATERIALIZED VIEW CONCURRENTLY Customer_Stats');| Database | Feature | Refresh Options | Indexable |
|---|---|---|---|
| PostgreSQL | MATERIALIZED VIEW | Manual, Concurrent | Yes |
| Oracle | MATERIALIZED VIEW | On Commit, On Demand, Scheduled | Yes |
| SQL Server | Indexed Views | Auto-maintained | Yes (clustered index required) |
| MySQL | Not native | Simulate with tables + events | N/A |
| SQLite | Not supported | N/A | N/A |
Materialized views are best when some staleness is acceptable. If your reporting dashboard shows 'data as of last night,' materialized views are perfect. If you need sub-second accuracy, you'll need either real-time computation or more sophisticated incremental refresh strategies.
Derived attributes are computed from other data rather than independently stored. The key design decision is whether to compute on demand or store and maintain. Let's consolidate what we've learned:
What's Next:
We've now covered all attribute types: simple, composite (flattened), multivalued (separate tables), and derived (computed). The final piece of entity mapping is Key Attributes—understanding how to properly map and implement primary keys, candidate keys, and alternate keys in the relational schema.
You now understand when and how to handle derived attributes. You can make informed decisions about computing vs. storing, implement solutions using views, computed columns, triggers, or materialized views, and understand the tradeoffs of each approach. Next, we complete the entity mapping module with key attributes.