Loading content...
Consider a Person entity with a date_of_birth attribute. Now imagine you also need to display the person's age. Should you store the age?
At first glance, storing age seems reasonable—it's a useful piece of information. But think deeper:
Age isn't really independent data—it's a function of date_of_birth and the current date. It can be derived on demand. This is the essence of a derived attribute: a value that can be computed from other stored data whenever needed.
By the end of this page, you will understand what makes an attribute derived, the fundamental 'store vs. compute' trade-off, how to represent derived attributes in ER diagrams, and multiple implementation strategies—from computed columns to materialized views to application-layer calculation.
A derived attribute (also called a computed attribute or calculated attribute) is an attribute whose value can be determined from other attributes in the database—either from the same entity or from related entities.
Formal Definition:
A derived attribute is an attribute whose value at any point in time can be computed from the values of other stored attributes using a defined formula or algorithm.
Key Characteristics:
The Source of Derivation:
Derived attributes can be computed from:
| Source Type | Example Attribute | Derivation |
|---|---|---|
| Same entity, single attribute | age | From date_of_birth |
| Same entity, multiple attributes | full_name | From first_name + last_name |
| Related entities (aggregation) | total_order_value | Sum of line_items.subtotal |
| Related entities (count) | employee_count | Count of related employees |
| External context | age | date_of_birth + current date |
| Complex formula | grade_point_average | Weighted average of course grades |
Derived attributes can form chains: Attribute C is derived from B, which is derived from A. While conceptually valid, deep derivation chains increase computation complexity and can make reasoning about data dependencies difficult. Document these chains clearly.
During requirements analysis, derived attributes often appear alongside stored attributes. Recognizing them requires asking the right questions:
| Entity | Derived Attribute | Source Attributes | Derivation Formula |
|---|---|---|---|
| Person | age | date_of_birth | YEAR(NOW()) - YEAR(dob) adjusted |
| Person | full_name | first, middle, last | CONCAT with spaces |
| Order | total_amount | line_items | SUM(quantity × unit_price) |
| Order | line_count | line_items | COUNT(*) |
| Employee | tenure_years | hire_date | YEAR(NOW()) - YEAR(hire_date) |
| Product | average_rating | reviews | AVG(rating) |
| Product | review_count | reviews | COUNT(*) |
| Course | student_count | enrollments | COUNT(DISTINCT student_id) |
| Invoice | tax_amount | subtotal, tax_rate | subtotal × tax_rate |
| Invoice | grand_total | subtotal, tax, discount | subtotal + tax - discount |
| Rectangle | area | width, height | width × height |
| Circle | circumference | radius | 2 × π × radius |
Not everything that could be calculated should be derived at query time. Historical snapshots are NOT derived: the price of a product at the time of order should be STORED because the current price may change. The 'derivation' logic would give wrong historical data. Always consider whether derivation produces the semantically correct value.
ER diagrams use specific notation to distinguish derived attributes from stored attributes. This distinction is crucial for later implementation decisions.
Chen Notation (Original ER):
In Chen notation, derived attributes are represented with a dashed (dotted) oval:
┌─────────────────┐
│ PERSON │
└────────┬────────┘
│
┌─────────────┼─────────────┐
│ │ │
╱───┴───╲ ╱───┴───╲ ╭┄┄┄┴┄┄┄╮
( Name ) ( DOB ) ┊ Age ┊
╲───────╱ ╲───────╱ ╰┄┄┄┄┄┄┄╯
Solid Oval Solid Oval Dashed Oval
(stored) (stored) (derived)
The dashed border immediately signals that this attribute is computed, not stored directly.
Extended Notation with Formula:
Some documentation styles include the derivation formula:
╭┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╮
┊ Age ┊
┊ = YEAR(NOW) - YEAR(DOB) ┊
╰┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╯
This makes the derivation explicit and serves as documentation.
Many modern diagramming tools use alternate notations: '/age' (slash prefix), 'age [derived]', 'age {computed}', or color coding (gray for derived). Whatever notation your team uses, ensure it's clearly documented and consistently applied.
Crow's Foot (IE) and UML Notation:
In Crow's Foot and UML class diagrams, derived attributes are typically marked with a prefix or annotation:
┌─────────────────────────────┐
│ PERSON │
├─────────────────────────────┤
│ person_id (PK) │
│ first_name VARCHAR │
│ last_name VARCHAR │
│ date_of_birth DATE │
│ /age INTEGER │ ← slash prefix indicates derived
│ /full_name VARCHAR │
├─────────────────────────────┤
│ Derived: │
│ age = DATEDIFF(years, dob, NOW) │
│ full_name = first + ' ' + last │
└─────────────────────────────┘
The detailed derivation formulas can appear in a separate section or linked documentation.
The fundamental question for any derived attribute: Should we store the computed value, or compute it on demand?
This is one of the most important trade-offs in database design, with implications for performance, consistency, and complexity.
Decision Framework:
| Factor | Favor Compute | Favor Store |
|---|---|---|
| Read frequency | Low reads | High reads, critical latency |
| Computation cost | Simple, fast | Complex, expensive (aggregates) |
| Source updates | Frequent | Rare |
| Staleness tolerance | None (must be current) | Some lag acceptable |
| Time dependency | Current time matters | Point-in-time snapshot |
| Need to index/sort | Never | Frequently |
| Query complexity | Isolated calculation | Cross-table aggregation |
Some cases are clear: (1) Age from DOB → ALWAYS compute (time-dependent, trivial computation). (2) Average rating over millions of reviews → ALWAYS store/cache (expensive aggregation). (3) Line item subtotal (qty × price) → usually compute or use generated column (cheap, frequent updates). The interesting decisions are in the middle ground.
There are multiple ways to implement derived attributes, each with different trade-offs. Understanding these options allows you to choose the right approach for each situation.
Strategy 1: Database Computed/Generated Columns
Modern databases support columns that are defined by expressions and computed automatically.
12345678910111213141516171819202122232425262728293031
-- PostgreSQL Generated Columns (v12+)CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL, subtotal DECIMAL(12,2) NOT NULL, tax_rate DECIMAL(5,4) NOT NULL DEFAULT 0.0825, shipping DECIMAL(8,2) NOT NULL DEFAULT 0, -- STORED generated column (persisted) tax_amount DECIMAL(12,2) GENERATED ALWAYS AS (subtotal * tax_rate) STORED, -- STORED generated column (persisted) grand_total DECIMAL(12,2) GENERATED ALWAYS AS (subtotal + (subtotal * tax_rate) + shipping) STORED, order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- The computed columns work like regular columnsINSERT INTO orders (customer_id, subtotal, tax_rate, shipping)VALUES (101, 100.00, 0.0825, 9.99); SELECT order_id, subtotal, tax_amount, grand_total FROM orders;-- Returns: 1, 100.00, 8.25, 118.24 -- Can be indexed!CREATE INDEX idx_orders_total ON orders(grand_total); -- Can be used in WHERE clausesSELECT * FROM orders WHERE grand_total > 100;Advantages: Database handles computation automatically; consistent; can be indexed (if persisted); transparent to applications.
Limitations: Limited to expressions using same-row data; no cross-table aggregations; no current-time functions in some DBs for persisted columns.
Let's examine complete scenarios showing how derived attributes are handled in practice.
Scenario: E-Commerce Order Calculations
An order has many line items. We need: line subtotals, order subtotal, tax, and grand total.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
-- Line items with derived subtotal (computed column)CREATE TABLE order_items ( item_id SERIAL PRIMARY KEY, order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, quantity INTEGER NOT NULL CHECK (quantity > 0), unit_price DECIMAL(10,2) NOT NULL, -- Derived: line subtotal (stored computed column) subtotal DECIMAL(12,2) GENERATED ALWAYS AS (quantity * unit_price) STORED); -- Orders tableCREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL, order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, tax_rate DECIMAL(5,4) DEFAULT 0.0825, shipping DECIMAL(8,2) DEFAULT 0, -- Note: We DON'T store order totals here status VARCHAR(20) DEFAULT 'pending'); -- View for order totals (derived from line items)CREATE VIEW order_totals ASSELECT o.order_id, o.customer_id, o.order_date, o.tax_rate, o.shipping, -- Derived: subtotal (sum of line item subtotals) COALESCE(SUM(oi.subtotal), 0) AS subtotal, -- Derived: tax amount COALESCE(SUM(oi.subtotal), 0) * o.tax_rate AS tax_amount, -- Derived: grand total COALESCE(SUM(oi.subtotal), 0) * (1 + o.tax_rate) + o.shipping AS grand_total, -- Derived: item count COUNT(oi.item_id) AS item_countFROM orders oLEFT JOIN order_items oi ON o.order_id = oi.order_idGROUP BY o.order_id; -- Query: Orders over $100SELECT * FROM order_totals WHERE grand_total > 100; -- If performance is critical, materialize itCREATE MATERIALIZED VIEW order_totals_mv ASSELECT * FROM order_totals; CREATE UNIQUE INDEX ON order_totals_mv(order_id);-- Refresh after order changesDerived attribute handling is a common source of bugs and design problems. Here are the pitfalls to avoid:
rating_sum / rating_count and count is 0, you get division by zero. Always handle edge cases in derivation formulas.A derived attribute computed today may give a different result than the same computation run yesterday—especially for time-based or aggregate values. If you need historical values, you need temporal tables or event sourcing, not derived attributes. Derivation gives you 'as of now,' not 'as of then.'
Derived attributes represent information that can be computed rather than stored—a fundamental concept that affects data integrity, performance, and system design. Let's consolidate the key concepts:
What's Next:
We've explored the full taxonomy of attribute types based on value structure. But there's one more crucial attribute characteristic: key attributes. Key attributes uniquely identify entity instances and form the foundation of entity integrity and relationships. Understanding keys is essential for proper entity modeling—and that's where we'll conclude this module.
You now understand derived attributes—their nature, notation, trade-offs, and implementation strategies. You can identify when an attribute should be computed vs. stored, choose appropriate implementation mechanisms, and avoid common pitfalls. Next: key attributes and entity identification.