Attributes - Learning Module

Loading content...

0/241

Derived Attributes

Values That Compute Themselves

Consider a Person entity with a date_of_birth attribute. Now imagine you also need to display the person's age. Should you store the age?

At first glance, storing age seems reasonable—it's a useful piece of information. But think deeper:

Age changes every year on the person's birthday
A stored age becomes stale the moment the clock strikes midnight on their birthday
You'd need a background job to update millions of ages daily
Or worse, you display '35' for someone who turned 36 last week

Age isn't really independent data—it's a function of date_of_birth and the current date. It can be derived on demand. This is the essence of a derived attribute: a value that can be computed from other stored data whenever needed.

What You Will Learn

By the end of this page, you will understand what makes an attribute derived, the fundamental 'store vs. compute' trade-off, how to represent derived attributes in ER diagrams, and multiple implementation strategies—from computed columns to materialized views to application-layer calculation.

What is a Derived Attribute?

A derived attribute (also called a computed attribute or calculated attribute) is an attribute whose value can be determined from other attributes in the database—either from the same entity or from related entities.

Formal Definition:

A derived attribute is an attribute whose value at any point in time can be computed from the values of other stored attributes using a defined formula or algorithm.

Key Characteristics:

Properties of Derived Attributes

•No independent value — The attribute's value is determined entirely by other data; it brings no new information
•Deterministic computation — Given the same source values, the derivation always produces the same result
•Time-dependence possible — Some derivations involve the current timestamp (age from birth date)
•Potential redundancy — Storing derived values duplicates information that exists elsewhere
•Consistency implications — Stored derived values can become inconsistent with source data if not synchronized

The Source of Derivation:

Derived attributes can be computed from:

Source Type	Example Attribute	Derivation
Same entity, single attribute	`age`	From `date_of_birth`
Same entity, multiple attributes	`full_name`	From `first_name` + `last_name`
Related entities (aggregation)	`total_order_value`	Sum of `line_items.subtotal`
Related entities (count)	`employee_count`	Count of related `employees`
External context	`age`	`date_of_birth` + current date
Complex formula	`grade_point_average`	Weighted average of course grades

Derivation Chain

Derived attributes can form chains: Attribute C is derived from B, which is derived from A. While conceptually valid, deep derivation chains increase computation complexity and can make reasoning about data dependencies difficult. Document these chains clearly.

Identifying Derived Attributes

During requirements analysis, derived attributes often appear alongside stored attributes. Recognizing them requires asking the right questions:

Questions to Identify Derived Attributes

•Can this value be calculated from other data? If yes, it's a candidate for derivation.
•Does storing this create redundancy? If the same information exists in another form, derivation may be appropriate.
•Does this value need to be 'refreshed'? Values that go stale (age, time-since-event) are typically derived.
•Is this a summary of related data? Counts, sums, averages of related entities are derived.
•Would an update elsewhere invalidate this value? Tight coupling to other data suggests derivation.

Common Derived Attributes Across Domains
Entity	Derived Attribute	Source Attributes	Derivation Formula
Person	age	date_of_birth	YEAR(NOW()) - YEAR(dob) adjusted
Person	full_name	first, middle, last	CONCAT with spaces
Order	total_amount	line_items	SUM(quantity × unit_price)
Order	line_count	line_items	COUNT(*)
Employee	tenure_years	hire_date	YEAR(NOW()) - YEAR(hire_date)
Product	average_rating	reviews	AVG(rating)
Product	review_count	reviews	COUNT(*)
Course	student_count	enrollments	COUNT(DISTINCT student_id)
Invoice	tax_amount	subtotal, tax_rate	subtotal × tax_rate
Invoice	grand_total	subtotal, tax, discount	subtotal + tax - discount
Rectangle	area	width, height	width × height
Circle	circumference	radius	2 × π × radius

Beware of False Derivation

Not everything that could be calculated should be derived at query time. Historical snapshots are NOT derived: the price of a product at the time of order should be STORED because the current price may change. The 'derivation' logic would give wrong historical data. Always consider whether derivation produces the semantically correct value.

Representing Derived Attributes in ER Diagrams

ER diagrams use specific notation to distinguish derived attributes from stored attributes. This distinction is crucial for later implementation decisions.

Chen Notation (Original ER):

In Chen notation, derived attributes are represented with a dashed (dotted) oval:

              ┌─────────────────┐
              │     PERSON      │
              └────────┬────────┘
                       │
         ┌─────────────┼─────────────┐
         │             │             │
     ╱───┴───╲     ╱───┴───╲     ╭┄┄┄┴┄┄┄╮
    (   Name  )   (   DOB   )   ┊  Age   ┊
     ╲───────╱     ╲───────╱     ╰┄┄┄┄┄┄┄╯

    Solid Oval      Solid Oval    Dashed Oval
    (stored)        (stored)      (derived)

The dashed border immediately signals that this attribute is computed, not stored directly.

Extended Notation with Formula:

Some documentation styles include the derivation formula:

     ╭┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╮
    ┊  Age                 ┊
    ┊  = YEAR(NOW) - YEAR(DOB) ┊
     ╰┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╯

This makes the derivation explicit and serves as documentation.

Modern Tool Variations

Many modern diagramming tools use alternate notations: '/age' (slash prefix), 'age [derived]', 'age {computed}', or color coding (gray for derived). Whatever notation your team uses, ensure it's clearly documented and consistently applied.

Crow's Foot (IE) and UML Notation:

In Crow's Foot and UML class diagrams, derived attributes are typically marked with a prefix or annotation:

┌─────────────────────────────┐
│           PERSON            │
├─────────────────────────────┤
│ person_id (PK)              │
│ first_name       VARCHAR    │
│ last_name        VARCHAR    │
│ date_of_birth    DATE       │
│ /age             INTEGER    │  ← slash prefix indicates derived
│ /full_name       VARCHAR    │
├─────────────────────────────┤
│ Derived:                    │
│   age = DATEDIFF(years, dob, NOW) │
│   full_name = first + ' ' + last  │
└─────────────────────────────┘

The detailed derivation formulas can appear in a separate section or linked documentation.

The Store vs. Compute Trade-off

The fundamental question for any derived attribute: Should we store the computed value, or compute it on demand?

This is one of the most important trade-offs in database design, with implications for performance, consistency, and complexity.

Compute on Demand

•Always fresh — Never stale; reflects current source values
•No synchronization — Can't be inconsistent with source
•Less storage — Only source data is stored
•Simpler updates — No derived values to maintain
•But: Query-time cost for every access
•But: Complex queries for aggregations
•But: May require JOINs or subqueries

Store (Materialize)

•Fast reads — Pre-computed; O(1) access
•Simpler queries — Direct column access
•Indexable — Can index derived values
•But: Can become stale
•But: Synchronization complexity
•But: Storage overhead
•But: Update anomalies possible

Decision Framework:

Factor	Favor Compute	Favor Store
Read frequency	Low reads	High reads, critical latency
Computation cost	Simple, fast	Complex, expensive (aggregates)
Source updates	Frequent	Rare
Staleness tolerance	None (must be current)	Some lag acceptable
Time dependency	Current time matters	Point-in-time snapshot
Need to index/sort	Never	Frequently
Query complexity	Isolated calculation	Cross-table aggregation

The 'Obvious' Rules

Some cases are clear: (1) Age from DOB → ALWAYS compute (time-dependent, trivial computation). (2) Average rating over millions of reviews → ALWAYS store/cache (expensive aggregation). (3) Line item subtotal (qty × price) → usually compute or use generated column (cheap, frequent updates). The interesting decisions are in the middle ground.

Implementation Strategies

There are multiple ways to implement derived attributes, each with different trade-offs. Understanding these options allows you to choose the right approach for each situation.

Strategy 1: Database Computed/Generated Columns

Modern databases support columns that are defined by expressions and computed automatically.

computed-columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- PostgreSQL Generated Columns (v12+)
CREATE TABLE orders (
    order_id        SERIAL PRIMARY KEY,
    customer_id     INTEGER NOT NULL,
    subtotal        DECIMAL(12,2) NOT NULL,
    tax_rate        DECIMAL(5,4) NOT NULL DEFAULT 0.0825,
    shipping        DECIMAL(8,2) NOT NULL DEFAULT 0,
    
    -- STORED generated column (persisted)
    tax_amount      DECIMAL(12,2) GENERATED ALWAYS AS 
                    (subtotal * tax_rate) STORED,
    
    -- STORED generated column (persisted)
    grand_total     DECIMAL(12,2) GENERATED ALWAYS AS 
                    (subtotal + (subtotal * tax_rate) + shipping) STORED,
    
    order_date      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- The computed columns work like regular columns
INSERT INTO orders (customer_id, subtotal, tax_rate, shipping)
VALUES (101, 100.00, 0.0825, 9.99);
 
SELECT order_id, subtotal, tax_amount, grand_total FROM orders;
-- Returns: 1, 100.00, 8.25, 118.24
 
-- Can be indexed!
CREATE INDEX idx_orders_total ON orders(grand_total);
 
-- Can be used in WHERE clauses
SELECT * FROM orders WHERE grand_total > 100;

Advantages: Database handles computation automatically; consistent; can be indexed (if persisted); transparent to applications.

Limitations: Limited to expressions using same-row data; no cross-table aggregations; no current-time functions in some DBs for persisted columns.

Real-World Derived Attribute Scenarios

Let's examine complete scenarios showing how derived attributes are handled in practice.

Scenario: E-Commerce Order Calculations

An order has many line items. We need: line subtotals, order subtotal, tax, and grand total.

order-derivations.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- Line items with derived subtotal (computed column)
CREATE TABLE order_items (
    item_id         SERIAL PRIMARY KEY,
    order_id        INTEGER NOT NULL,
    product_id      INTEGER NOT NULL,
    quantity        INTEGER NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10,2) NOT NULL,
    
    -- Derived: line subtotal (stored computed column)
    subtotal        DECIMAL(12,2) GENERATED ALWAYS AS 
                    (quantity * unit_price) STORED
);
 
-- Orders table
CREATE TABLE orders (
    order_id        SERIAL PRIMARY KEY,
    customer_id     INTEGER NOT NULL,
    order_date      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    tax_rate        DECIMAL(5,4) DEFAULT 0.0825,
    shipping        DECIMAL(8,2) DEFAULT 0,
    -- Note: We DON'T store order totals here
    status          VARCHAR(20) DEFAULT 'pending'
);
 
-- View for order totals (derived from line items)
CREATE VIEW order_totals AS
SELECT 
    o.order_id,
    o.customer_id,
    o.order_date,
    o.tax_rate,
    o.shipping,
    -- Derived: subtotal (sum of line item subtotals)
    COALESCE(SUM(oi.subtotal), 0) AS subtotal,
    -- Derived: tax amount
    COALESCE(SUM(oi.subtotal), 0) * o.tax_rate AS tax_amount,
    -- Derived: grand total
    COALESCE(SUM(oi.subtotal), 0) * (1 + o.tax_rate) + o.shipping AS grand_total,
    -- Derived: item count
    COUNT(oi.item_id) AS item_count
FROM orders o
LEFT JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.order_id;
 
-- Query: Orders over $100
SELECT * FROM order_totals WHERE grand_total > 100;
 
-- If performance is critical, materialize it
CREATE MATERIALIZED VIEW order_totals_mv AS
SELECT * FROM order_totals;
 
CREATE UNIQUE INDEX ON order_totals_mv(order_id);
-- Refresh after order changes

Common Mistakes with Derived Attributes

Derived attribute handling is a common source of bugs and design problems. Here are the pitfalls to avoid:

Common Mistakes to Avoid

•Storing time-dependent values — Never store age, tenure, or 'days since X.' These go stale immediately. Always compute from base dates.
•Confusing snapshot vs. derivation — A historical 'order total' at time of purchase should be STORED (it's a snapshot). Current 'cart total' should be computed (it's dynamic).
•Not handling NULL sources — If rating_sum / rating_count and count is 0, you get division by zero. Always handle edge cases in derivation formulas.
•Inconsistent derivation logic — Computing the same derived value differently in different queries leads to mismatched results. Centralize logic in views or functions.
•Over-storing for premature optimization — Storing derived values 'for performance' when query volume doesn't warrant it. Measure first.
•Under-caching expensive aggregations — Computing COUNT(*) over millions of rows on every page load. Consider materialized views.
•Forgot to refresh materialized views — Materialized data goes stale. Ensure refresh mechanisms are in place and monitored.
•Circular derivations — A derived from B which is derived from A which is derived from... Create derivation graphs to prevent cycles.

The Historical Trap

A derived attribute computed today may give a different result than the same computation run yesterday—especially for time-based or aggregate values. If you need historical values, you need temporal tables or event sourcing, not derived attributes. Derivation gives you 'as of now,' not 'as of then.'

Summary: Derived Attributes

Derived attributes represent information that can be computed rather than stored—a fundamental concept that affects data integrity, performance, and system design. Let's consolidate the key concepts:

Key Takeaways

•Derived = Computed from other data — Derived attributes bring no new information; their values are determined by other stored attributes.
•Store vs. compute is the core trade-off — Fast reads (store) vs. guaranteed freshness (compute). Choose based on query patterns and update frequency.
•Dashed oval notation — In ER diagrams, derived attributes use dashed/dotted borders to distinguish from stored attributes.
•Multiple implementation strategies — Computed columns, views, materialized views, triggers, application layer—each has different trade-offs.
•Time-dependency matters — Values depending on current time (age, tenure) should virtually always be computed, not stored.
•Beware of staleness — Stored derived values can become inconsistent. Choose refresh strategies carefully.

What's Next:

We've explored the full taxonomy of attribute types based on value structure. But there's one more crucial attribute characteristic: key attributes. Key attributes uniquely identify entity instances and form the foundation of entity integrity and relationships. Understanding keys is essential for proper entity modeling—and that's where we'll conclude this module.

Page Complete

You now understand derived attributes—their nature, notation, trade-offs, and implementation strategies. You can identify when an attribute should be computed vs. stored, choose appropriate implementation mechanisms, and avoid common pitfalls. Next: key attributes and entity identification.