Database Management SystemsAggregate Functions

SQL Aggregate Functions: Summarizing Data with Precision

LevelIntermediate

Duration75 mins

TopicAggregate Functions

2 / 5

SUM: Calculating Totals with Precision

The Mathematics of Aggregation

If COUNT tells us how many, SUM tells us how much. The SUM aggregate function transforms individual values into totals—revenue into total revenue, quantities into inventory levels, costs into budgets. It's the workhorse of financial reporting, inventory management, and virtually every domain that deals with quantifiable measures.

Yet SUM carries subtleties that can lead to incorrect results, unexpected NULLs, precision loss, or even overflow errors. Mastering SUM means understanding not just its syntax, but its interaction with NULL values, numeric data types, and the mathematical properties that databases exploit for optimization.

What You Will Learn

By the end of this page, you will understand SUM's precise semantics including NULL handling, type coercion, and overflow behavior. You'll master conditional summing techniques, understand precision considerations for financial calculations, and learn how to combine SUM with other SQL constructs for sophisticated analysis.

SUM Fundamentals

The SUM() function adds together all non-NULL values in a column or expression, producing a single total. It operates only on numeric data types—attempting to SUM a string column results in an error (or, in permissive databases, unexpected type coercion).

Basic syntax:

SUM(expression)
SUM(DISTINCT expression)

The expression is typically a column name but can be any valid numeric expression, including arithmetic operations, CASE expressions, or function calls that return numeric values.

sum_basics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Total of all salaries
SELECT SUM(salary) AS total_payroll
FROM employees;
 
-- Total with arithmetic expression
SELECT SUM(quantity * unit_price) AS total_revenue
FROM order_items;
 
-- SUM with WHERE clause
SELECT SUM(amount) AS q1_revenue
FROM transactions
WHERE transaction_date >= '2024-01-01'
  AND transaction_date < '2024-04-01';
 
-- Multiple SUMs in one query
SELECT 
    SUM(revenue) AS total_revenue,
    SUM(cost) AS total_cost,
    SUM(revenue) - SUM(cost) AS total_profit,
    SUM(revenue - cost) AS total_profit_alt  -- Mathematically equivalent
FROM monthly_financials;

Mathematical Equivalence

SUM(a) - SUM(b) equals SUM(a - b) mathematically—the sum of differences equals the difference of sums. However, with NULL values, they may differ. If a row has revenue=100 but cost=NULL, SUM(revenue - cost) excludes that row (expression is NULL), but SUM(revenue) - SUM(cost) includes the 100 in revenue sum.

Return type considerations:

The return type of SUM depends on the input type and the database system:

Input Type	Typical Return Type	Notes
INTEGER/INT	BIGINT or DECIMAL	Widened to prevent overflow
SMALLINT	BIGINT or DECIMAL	Significantly widened
DECIMAL(p,s)	DECIMAL (higher precision)	Precision may increase
FLOAT/DOUBLE	DOUBLE	Maintains floating point

Most modern databases automatically widen the return type to reduce overflow risk, but this behavior isn't universal. Always check your database's documentation for exact behavior.

NULL Handling in SUM

SUM's NULL handling follows a critical rule: NULL values are ignored during aggregation. This is consistent with COUNT(column) behavior but carries unique implications for totaling.

The NULL-ignoring behavior:

NULL values don't contribute to the sum (they're not treated as zero)
If all values are NULL, SUM returns NULL (not zero)
Rows with NULL in the summed column are silently excluded

This behavior is mathematically sound—NULL means "unknown," and adding an unknown quantity to a known total should preserve uncertainty. However, it often surprises developers who expect NULL to behave like zero.

SUM NULL Behavior Examples
Sample Data	Query	Result	Explanation
100, 200, 300	SUM(val)	600	All non-NULL, simple addition
100, NULL, 300	SUM(val)	400	NULL ignored, 100+300
NULL, NULL, NULL	SUM(val)	NULL	All NULL returns NULL
(empty table)	SUM(val)	NULL	No rows returns NULL

sum_null_handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Sample table for examples
-- bonuses: employee_id, bonus_amount (some NULL)
 
-- Basic SUM ignores NULLs
SELECT SUM(bonus_amount) AS total_bonuses
FROM bonuses;
-- If some bonuses are NULL, they don't reduce the total
 
-- Check for completely NULL data
SELECT 
    SUM(bonus_amount) AS raw_sum,           -- NULL if all NULL
    COALESCE(SUM(bonus_amount), 0) AS safe_sum  -- 0 if all NULL
FROM bonuses;
 
-- Understanding the difference: NULL vs zero
-- Scenario: Employees with no bonus record vs $0 bonus
SELECT 
    e.name,
    b.bonus_amount,
    COALESCE(b.bonus_amount, 0) AS effective_bonus
FROM employees e
LEFT JOIN bonuses b ON e.id = b.employee_id;
 
-- SUM with COALESCE inside (treats NULL as 0)
SELECT SUM(COALESCE(bonus_amount, 0)) AS total_with_nulls_as_zero
FROM bonuses;

The Empty Set Trap

SUM over an empty result set returns NULL, not zero. This can cause issues in applications: SELECT SUM(amount) FROM orders WHERE order_date = '2099-01-01' returns NULL if no orders exist for that date. Always use COALESCE when you need zero for no-data scenarios.

When to treat NULL as zero:

The decision to use COALESCE(SUM(...), 0) or SUM(COALESCE(..., 0)) depends on your domain:

Use COALESCE outside SUM when NULL legitimately means "no data yet" and you want zero for display/calculation purposes
Use COALESCE inside SUM when NULL conceptually means "zero" in your domain (e.g., missing optional discounts)
Keep NULL as NULL when unknown values should propagate and indicate data quality issues

The right choice depends on whether NULL represents "unknown" or "not applicable/zero" in your specific context.

SUM(DISTINCT): Summing Unique Values

SUM(DISTINCT expression) adds only unique values, ignoring duplicates. Like COUNT(DISTINCT), it first deduplicates, then aggregates.

This is rarely used compared to COUNT(DISTINCT) because summing unique values has fewer practical applications. The most common use case involves eliminating accidental duplicates from joins before summing.

sum_distinct.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Sample data: Values are 10, 20, 20, 30
SELECT 
    SUM(amount) AS regular_sum,              -- 80 (10+20+20+30)
    SUM(DISTINCT amount) AS distinct_sum     -- 60 (10+20+30)
FROM sample_values;
 
-- Practical use: Avoiding double-counting in joins
-- Problem: Order joined to multiple line items
SELECT 
    SUM(o.shipping_cost) AS wrong_total  -- Counted once per line item!
FROM orders o
JOIN order_items oi ON o.id = oi.order_id;
 
-- Solution 1: SUM(DISTINCT) if shipping costs are unique per order
SELECT 
    SUM(DISTINCT o.shipping_cost) AS shipping_total
FROM orders o
JOIN order_items oi ON o.id = oi.order_id;
 
-- Solution 2 (more reliable): Aggregate separately
SELECT 
    (SELECT SUM(shipping_cost) FROM orders) AS shipping_total,
    (SELECT SUM(line_total) FROM order_items) AS items_total;
 
-- Solution 3: Use a subquery or CTE
WITH order_totals AS (
    SELECT id, shipping_cost FROM orders
)
SELECT SUM(shipping_cost) FROM order_totals;

Prefer Avoiding Duplicates to SUM(DISTINCT)

While SUM(DISTINCT) can fix duplicate issues, it's often a code smell indicating incorrect joins. If you find yourself needing SUM(DISTINCT), reconsider your query structure. It's usually better to fix the join logic or restructure the query to avoid duplicates in the first place.

When SUM(DISTINCT) is legitimately useful:

Distinct values in analysis: Summing unique discount tiers to see total possible discount range
Quick fixes for known duplicate issues: When the data model creates unavoidable duplicates
Set-based calculations: When you need the sum of all distinct values in a mathematical sense

However, in production code, prefer restructuring queries over relying on DISTINCT to mask duplicate issues.

Conditional Summing with CASE and FILTER

One of SUM's most powerful applications is conditional aggregation—summing values only when certain conditions are met. This enables pivoting data, calculating segmented totals, and building complex reports in a single query.

conditional_sum.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Conditional sum using CASE
SELECT 
    SUM(CASE WHEN region = 'North' THEN revenue ELSE 0 END) AS north_revenue,
    SUM(CASE WHEN region = 'South' THEN revenue ELSE 0 END) AS south_revenue,
    SUM(CASE WHEN region = 'East' THEN revenue ELSE 0 END) AS east_revenue,
    SUM(CASE WHEN region = 'West' THEN revenue ELSE 0 END) AS west_revenue
FROM sales;
 
-- Alternative: Use NULL instead of 0 (SUM ignores NULLs anyway)
SELECT 
    SUM(CASE WHEN region = 'North' THEN revenue END) AS north_revenue,
    SUM(CASE WHEN region = 'South' THEN revenue END) AS south_revenue
FROM sales;
-- Same result, slightly cleaner syntax
 
-- Calculate percentages of total
SELECT 
    SUM(revenue) AS total,
    SUM(CASE WHEN product_type = 'Electronics' THEN revenue END) AS electronics,
    SUM(CASE WHEN product_type = 'Electronics' THEN revenue END) * 100.0 
        / NULLIF(SUM(revenue), 0) AS electronics_pct
FROM sales;

PostgreSQL's FILTER clause:

PostgreSQL (and SQL Server 2022+) offers a cleaner syntax for conditional aggregation with the FILTER clause, which is part of the SQL:2003 standard.

sum_filter_clause.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- PostgreSQL: FILTER clause for conditional aggregation
SELECT 
    SUM(revenue) AS total_revenue,
    SUM(revenue) FILTER (WHERE region = 'North') AS north_revenue,
    SUM(revenue) FILTER (WHERE region = 'South') AS south_revenue,
    SUM(revenue) FILTER (WHERE transaction_date >= '2024-01-01') AS ytd_revenue
FROM sales;
 
-- Complex conditions with FILTER
SELECT 
    customer_id,
    SUM(amount) AS total_spent,
    SUM(amount) FILTER (WHERE status = 'completed') AS completed_orders,
    SUM(amount) FILTER (WHERE status = 'refunded') AS refunded_amount,
    SUM(amount) FILTER (WHERE category = 'Premium') AS premium_purchases
FROM orders
GROUP BY customer_id;
 
-- FILTER vs CASE performance
-- Generally equivalent, but FILTER can be more readable
-- Some query planners may optimize FILTER differently

CASE vs FILTER

FILTER (WHERE ...) is more readable than CASE WHEN ... THEN value END for conditional aggregation. If your database supports it, prefer FILTER. For portability, CASE works across all SQL databases. Performance is typically identical.

conditional_sum_advanced.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Running totals with conditional sums
SELECT 
    month,
    revenue,
    SUM(CASE WHEN revenue > 0 THEN revenue ELSE 0 END) 
        OVER (ORDER BY month) AS cumulative_positive,
    SUM(CASE WHEN revenue < 0 THEN revenue ELSE 0 END) 
        OVER (ORDER BY month) AS cumulative_losses
FROM monthly_performance;
 
-- Pivot-style reporting
SELECT 
    product_category,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 1 THEN amount END) AS q1,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 2 THEN amount END) AS q2,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 3 THEN amount END) AS q3,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 4 THEN amount END) AS q4,
    SUM(amount) AS annual_total
FROM sales
WHERE EXTRACT(YEAR FROM sale_date) = 2024
GROUP BY product_category;

Precision, Overflow, and Numeric Safety

When summing large numbers or dealing with financial data, precision and overflow become critical concerns. Understanding how your database handles numeric types during aggregation can prevent subtle bugs and data corruption.

Overflow risks:

Every numeric type has a maximum value. When SUM exceeds this maximum, the result is either:

An error (safer, found in PostgreSQL with integer overflow)
Silent wraparound or truncation (dangerous, found in some systems)
Automatic promotion to a larger type (common for SUM)

Most databases widen the return type of SUM to reduce overflow risk, but you should never assume this without verification.

Integer Type Ranges and SUM Behavior
Type	Max Value	Typical SUM Return Type
SMALLINT	32,767	BIGINT or DECIMAL
INTEGER	2,147,483,647	BIGINT or DECIMAL
BIGINT	9.2 × 10^18	DECIMAL or NUMERIC
DECIMAL(10,2)	99,999,999.99	Higher precision DECIMAL

sum_precision.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Potential overflow scenario
-- If salary is INTEGER and sum exceeds 2 billion:
SELECT SUM(salary) FROM huge_employee_table;
-- Modern databases typically return BIGINT, avoiding overflow
 
-- Explicit casting for safety
SELECT SUM(salary::BIGINT) FROM employees;  -- PostgreSQL
SELECT SUM(CAST(salary AS BIGINT)) FROM employees;  -- Standard SQL
 
-- Financial precision: Use DECIMAL, not FLOAT
-- WRONG: Float precision loss
SELECT SUM(price * 0.1) FROM items;  -- Floating point errors possible
 
-- CORRECT: DECIMAL for exact arithmetic
SELECT SUM(CAST(price AS DECIMAL(15,4)) * 0.1) FROM items;
 
-- Check actual types returned
-- PostgreSQL:
SELECT pg_typeof(SUM(int_column)) FROM sample;  -- Usually returns 'bigint'
SELECT pg_typeof(SUM(numeric_column)) FROM sample;  -- Returns 'numeric'

Financial Calculations Require DECIMAL

Never use FLOAT or DOUBLE for money. Floating-point representation can't precisely represent many decimal values (like 0.10), leading to cumulative rounding errors. For financial calculations, use DECIMAL/NUMERIC with appropriate precision (e.g., DECIMAL(15,2) for currency).

Precision best practices:

Use DECIMAL for currency: DECIMAL(15,2) for values up to trillions with cent precision
Cast before summing if concerned: SUM(CAST(col AS DECIMAL(20,4)))
Be aware of intermediate calculations: SUM(price * quantity) may have different precision than SUM(price) * SUM(quantity)
Test with extreme values: Sum the maximum value of your type millions of times in development
Consider using monetary types: PostgreSQL's money type, or application-layer decimal libraries

SUM with GROUP BY: Segmented Totals

Like COUNT, SUM becomes especially powerful when combined with GROUP BY for segmented analysis. You can calculate totals per category, per time period, per region, or any other dimensional breakdown.

sum_group_by.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Total revenue per product category
SELECT 
    category,
    SUM(revenue) AS category_revenue
FROM sales
GROUP BY category
ORDER BY category_revenue DESC;
 
-- Multiple levels of grouping
SELECT 
    region,
    product_line,
    SUM(units_sold) AS total_units,
    SUM(revenue) AS total_revenue,
    SUM(revenue) / NULLIF(SUM(units_sold), 0) AS avg_price
FROM sales
GROUP BY region, product_line
ORDER BY region, total_revenue DESC;
 
-- Time-series aggregation
SELECT 
    DATE_TRUNC('month', order_date) AS month,
    SUM(order_total) AS monthly_revenue,
    SUM(SUM(order_total)) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS cumulative_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Advanced grouping with ROLLUP and CUBE:

For summary reports that need subtotals and grand totals, SQL provides ROLLUP and CUBE extensions to GROUP BY.

sum_rollup_cube.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- ROLLUP: Hierarchical subtotals
SELECT 
    COALESCE(region, 'ALL REGIONS') AS region,
    COALESCE(product_category, 'ALL CATEGORIES') AS category,
    SUM(revenue) AS total_revenue
FROM sales
GROUP BY ROLLUP(region, product_category)
ORDER BY region NULLS FIRST, product_category NULLS FIRST;
-- Results include: each region/category combo, region subtotals, grand total
 
-- CUBE: All possible subtotal combinations
SELECT 
    COALESCE(region, 'ALL') AS region,
    COALESCE(channel, 'ALL') AS channel,
    SUM(revenue) AS total_revenue
FROM sales
GROUP BY CUBE(region, channel);
-- Results include: each combo, region totals, channel totals, grand total
 
-- GROUPING function to identify subtotal rows
SELECT 
    region,
    product_category,
    SUM(revenue) AS total_revenue,
    GROUPING(region) AS is_region_total,
    GROUPING(product_category) AS is_category_total
FROM sales
GROUP BY ROLLUP(region, product_category);

GROUPING() Function

Use GROUPING(column) to programmatically distinguish between actual NULL values in your data and NULLs that represent "all values" in subtotal rows. GROUPING() returns 1 for subtotal rows, 0 for regular rows.

Common Patterns and Best Practices

Let's examine production-ready patterns that leverage SUM effectively while avoiding common pitfalls.

sum_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Pattern: Running total (cumulative sum)
SELECT 
    transaction_date,
    amount,
    SUM(amount) OVER (ORDER BY transaction_date) AS running_total
FROM transactions;
 
-- Pattern: Running total with partition reset
SELECT 
    account_id,
    transaction_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY account_id 
        ORDER BY transaction_date
    ) AS account_running_total
FROM transactions;
 
-- Pattern: Percentage of total
SELECT 
    category,
    SUM(revenue) AS category_revenue,
    SUM(revenue) * 100.0 / SUM(SUM(revenue)) OVER () AS pct_of_total
FROM sales
GROUP BY category;
 
-- Pattern: Year-over-year comparison
SELECT 
    product_id,
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2023 THEN amount END) AS sales_2023,
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2024 THEN amount END) AS sales_2024,
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2024 THEN amount END) -
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2023 THEN amount END) AS yoy_change
FROM sales
GROUP BY product_id;

SUM Best Practices

•Use COALESCE for safe defaults: COALESCE(SUM(amount), 0) prevents NULL results from empty sets
•Prefer DECIMAL for financial data: Avoid floating-point precision issues in money calculations
•Watch for join fan-out: If a join duplicates rows, SUM will count values multiple times
•Use window functions for running totals: More efficient and readable than self-joins
•Consider materialized views for expensive sums: Pre-aggregate frequently queried totals
•Test with NULL data: Ensure your queries handle missing values correctly

Common Mistakes to Avoid

•Summing after a fan-out join: Values counted multiple times give inflated totals
•Expecting zero from empty sets: SUM returns NULL, not 0, when no rows match
•Using FLOAT for currency: Leads to subtle rounding errors over time
•Ignoring NULL propagation: SUM(a + b) is NULL if either a or b is NULL in any row
•Mixing aggregated and non-aggregated columns: Causes errors or undefined behavior

Summary: Mastering SUM

The SUM function is fundamental to data analysis and reporting. Let's consolidate the key concepts:

Key Takeaways

•SUM adds all non-NULL values — NULL values are silently ignored. An all-NULL or empty set returns NULL, not zero.
•Use COALESCE for safe defaults — Wrap SUM in COALESCE when you need zero instead of NULL for empty results.
•SUM(DISTINCT) deduplicates first — Useful for specific scenarios but often indicates join issues that should be fixed structurally.
•Conditional summing with CASE — Enables pivot-style queries and segmented totals in a single pass.
•Use DECIMAL for financial data — FLOAT precision issues compound over time; DECIMAL ensures exact arithmetic.
•Combine with GROUP BY for segmented analysis — Use ROLLUP/CUBE for hierarchical subtotals in reports.

What's next:

Having mastered counting and summing, we'll explore AVG—the average function that combines aspects of both. You'll learn how AVG handles NULLs differently than you might expect and master techniques for weighted averages and safe division.

Page Complete

You now understand SUM in depth—its NULL handling, precision considerations, conditional summing patterns, and integration with GROUP BY. These skills form the foundation for financial reporting, inventory management, and virtually any quantitative analysis.

2 / 5

Loading learning content...

Database Management SystemsAggregate Functions

SQL Aggregate Functions: Summarizing Data with Precision

LevelIntermediate

Duration75 mins

TopicAggregate Functions

2 / 5

SUM: Calculating Totals with Precision

The Mathematics of Aggregation

What You Will Learn

SUM Fundamentals

Basic syntax:

SUM(expression)
SUM(DISTINCT expression)

The expression is typically a column name but can be any valid numeric expression, including arithmetic operations, CASE expressions, or function calls that return numeric values.

sum_basics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Total of all salaries
SELECT SUM(salary) AS total_payroll
FROM employees;
 
-- Total with arithmetic expression
SELECT SUM(quantity * unit_price) AS total_revenue
FROM order_items;
 
-- SUM with WHERE clause
SELECT SUM(amount) AS q1_revenue
FROM transactions
WHERE transaction_date >= '2024-01-01'
  AND transaction_date < '2024-04-01';
 
-- Multiple SUMs in one query
SELECT 
    SUM(revenue) AS total_revenue,
    SUM(cost) AS total_cost,
    SUM(revenue) - SUM(cost) AS total_profit,
    SUM(revenue - cost) AS total_profit_alt  -- Mathematically equivalent
FROM monthly_financials;

Mathematical Equivalence

Return type considerations:

The return type of SUM depends on the input type and the database system:

Input Type	Typical Return Type	Notes
INTEGER/INT	BIGINT or DECIMAL	Widened to prevent overflow
SMALLINT	BIGINT or DECIMAL	Significantly widened
DECIMAL(p,s)	DECIMAL (higher precision)	Precision may increase
FLOAT/DOUBLE	DOUBLE	Maintains floating point

Most modern databases automatically widen the return type to reduce overflow risk, but this behavior isn't universal. Always check your database's documentation for exact behavior.

NULL Handling in SUM

SUM's NULL handling follows a critical rule: NULL values are ignored during aggregation. This is consistent with COUNT(column) behavior but carries unique implications for totaling.

The NULL-ignoring behavior:

NULL values don't contribute to the sum (they're not treated as zero)
If all values are NULL, SUM returns NULL (not zero)
Rows with NULL in the summed column are silently excluded

SUM NULL Behavior Examples
Sample Data	Query	Result	Explanation
100, 200, 300	SUM(val)	600	All non-NULL, simple addition
100, NULL, 300	SUM(val)	400	NULL ignored, 100+300
NULL, NULL, NULL	SUM(val)	NULL	All NULL returns NULL
(empty table)	SUM(val)	NULL	No rows returns NULL

sum_null_handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Sample table for examples
-- bonuses: employee_id, bonus_amount (some NULL)
 
-- Basic SUM ignores NULLs
SELECT SUM(bonus_amount) AS total_bonuses
FROM bonuses;
-- If some bonuses are NULL, they don't reduce the total
 
-- Check for completely NULL data
SELECT 
    SUM(bonus_amount) AS raw_sum,           -- NULL if all NULL
    COALESCE(SUM(bonus_amount), 0) AS safe_sum  -- 0 if all NULL
FROM bonuses;
 
-- Understanding the difference: NULL vs zero
-- Scenario: Employees with no bonus record vs $0 bonus
SELECT 
    e.name,
    b.bonus_amount,
    COALESCE(b.bonus_amount, 0) AS effective_bonus
FROM employees e
LEFT JOIN bonuses b ON e.id = b.employee_id;
 
-- SUM with COALESCE inside (treats NULL as 0)
SELECT SUM(COALESCE(bonus_amount, 0)) AS total_with_nulls_as_zero
FROM bonuses;

The Empty Set Trap

When to treat NULL as zero:

The decision to use COALESCE(SUM(...), 0) or SUM(COALESCE(..., 0)) depends on your domain:

Use COALESCE outside SUM when NULL legitimately means "no data yet" and you want zero for display/calculation purposes
Use COALESCE inside SUM when NULL conceptually means "zero" in your domain (e.g., missing optional discounts)
Keep NULL as NULL when unknown values should propagate and indicate data quality issues

The right choice depends on whether NULL represents "unknown" or "not applicable/zero" in your specific context.

SUM(DISTINCT): Summing Unique Values

SUM(DISTINCT expression) adds only unique values, ignoring duplicates. Like COUNT(DISTINCT), it first deduplicates, then aggregates.

sum_distinct.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Sample data: Values are 10, 20, 20, 30
SELECT 
    SUM(amount) AS regular_sum,              -- 80 (10+20+20+30)
    SUM(DISTINCT amount) AS distinct_sum     -- 60 (10+20+30)
FROM sample_values;
 
-- Practical use: Avoiding double-counting in joins
-- Problem: Order joined to multiple line items
SELECT 
    SUM(o.shipping_cost) AS wrong_total  -- Counted once per line item!
FROM orders o
JOIN order_items oi ON o.id = oi.order_id;
 
-- Solution 1: SUM(DISTINCT) if shipping costs are unique per order
SELECT 
    SUM(DISTINCT o.shipping_cost) AS shipping_total
FROM orders o
JOIN order_items oi ON o.id = oi.order_id;
 
-- Solution 2 (more reliable): Aggregate separately
SELECT 
    (SELECT SUM(shipping_cost) FROM orders) AS shipping_total,
    (SELECT SUM(line_total) FROM order_items) AS items_total;
 
-- Solution 3: Use a subquery or CTE
WITH order_totals AS (
    SELECT id, shipping_cost FROM orders
)
SELECT SUM(shipping_cost) FROM order_totals;

Prefer Avoiding Duplicates to SUM(DISTINCT)

When SUM(DISTINCT) is legitimately useful:

Distinct values in analysis: Summing unique discount tiers to see total possible discount range
Quick fixes for known duplicate issues: When the data model creates unavoidable duplicates
Set-based calculations: When you need the sum of all distinct values in a mathematical sense

However, in production code, prefer restructuring queries over relying on DISTINCT to mask duplicate issues.

Conditional Summing with CASE and FILTER

conditional_sum.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Conditional sum using CASE
SELECT 
    SUM(CASE WHEN region = 'North' THEN revenue ELSE 0 END) AS north_revenue,
    SUM(CASE WHEN region = 'South' THEN revenue ELSE 0 END) AS south_revenue,
    SUM(CASE WHEN region = 'East' THEN revenue ELSE 0 END) AS east_revenue,
    SUM(CASE WHEN region = 'West' THEN revenue ELSE 0 END) AS west_revenue
FROM sales;
 
-- Alternative: Use NULL instead of 0 (SUM ignores NULLs anyway)
SELECT 
    SUM(CASE WHEN region = 'North' THEN revenue END) AS north_revenue,
    SUM(CASE WHEN region = 'South' THEN revenue END) AS south_revenue
FROM sales;
-- Same result, slightly cleaner syntax
 
-- Calculate percentages of total
SELECT 
    SUM(revenue) AS total,
    SUM(CASE WHEN product_type = 'Electronics' THEN revenue END) AS electronics,
    SUM(CASE WHEN product_type = 'Electronics' THEN revenue END) * 100.0 
        / NULLIF(SUM(revenue), 0) AS electronics_pct
FROM sales;

PostgreSQL's FILTER clause:

PostgreSQL (and SQL Server 2022+) offers a cleaner syntax for conditional aggregation with the FILTER clause, which is part of the SQL:2003 standard.

sum_filter_clause.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- PostgreSQL: FILTER clause for conditional aggregation
SELECT 
    SUM(revenue) AS total_revenue,
    SUM(revenue) FILTER (WHERE region = 'North') AS north_revenue,
    SUM(revenue) FILTER (WHERE region = 'South') AS south_revenue,
    SUM(revenue) FILTER (WHERE transaction_date >= '2024-01-01') AS ytd_revenue
FROM sales;
 
-- Complex conditions with FILTER
SELECT 
    customer_id,
    SUM(amount) AS total_spent,
    SUM(amount) FILTER (WHERE status = 'completed') AS completed_orders,
    SUM(amount) FILTER (WHERE status = 'refunded') AS refunded_amount,
    SUM(amount) FILTER (WHERE category = 'Premium') AS premium_purchases
FROM orders
GROUP BY customer_id;
 
-- FILTER vs CASE performance
-- Generally equivalent, but FILTER can be more readable
-- Some query planners may optimize FILTER differently

CASE vs FILTER

conditional_sum_advanced.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Running totals with conditional sums
SELECT 
    month,
    revenue,
    SUM(CASE WHEN revenue > 0 THEN revenue ELSE 0 END) 
        OVER (ORDER BY month) AS cumulative_positive,
    SUM(CASE WHEN revenue < 0 THEN revenue ELSE 0 END) 
        OVER (ORDER BY month) AS cumulative_losses
FROM monthly_performance;
 
-- Pivot-style reporting
SELECT 
    product_category,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 1 THEN amount END) AS q1,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 2 THEN amount END) AS q2,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 3 THEN amount END) AS q3,
    SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 4 THEN amount END) AS q4,
    SUM(amount) AS annual_total
FROM sales
WHERE EXTRACT(YEAR FROM sale_date) = 2024
GROUP BY product_category;

Precision, Overflow, and Numeric Safety

Overflow risks:

Every numeric type has a maximum value. When SUM exceeds this maximum, the result is either:

An error (safer, found in PostgreSQL with integer overflow)
Silent wraparound or truncation (dangerous, found in some systems)
Automatic promotion to a larger type (common for SUM)

Most databases widen the return type of SUM to reduce overflow risk, but you should never assume this without verification.

Integer Type Ranges and SUM Behavior
Type	Max Value	Typical SUM Return Type
SMALLINT	32,767	BIGINT or DECIMAL
INTEGER	2,147,483,647	BIGINT or DECIMAL
BIGINT	9.2 × 10^18	DECIMAL or NUMERIC
DECIMAL(10,2)	99,999,999.99	Higher precision DECIMAL

sum_precision.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Potential overflow scenario
-- If salary is INTEGER and sum exceeds 2 billion:
SELECT SUM(salary) FROM huge_employee_table;
-- Modern databases typically return BIGINT, avoiding overflow
 
-- Explicit casting for safety
SELECT SUM(salary::BIGINT) FROM employees;  -- PostgreSQL
SELECT SUM(CAST(salary AS BIGINT)) FROM employees;  -- Standard SQL
 
-- Financial precision: Use DECIMAL, not FLOAT
-- WRONG: Float precision loss
SELECT SUM(price * 0.1) FROM items;  -- Floating point errors possible
 
-- CORRECT: DECIMAL for exact arithmetic
SELECT SUM(CAST(price AS DECIMAL(15,4)) * 0.1) FROM items;
 
-- Check actual types returned
-- PostgreSQL:
SELECT pg_typeof(SUM(int_column)) FROM sample;  -- Usually returns 'bigint'
SELECT pg_typeof(SUM(numeric_column)) FROM sample;  -- Returns 'numeric'

Financial Calculations Require DECIMAL

Precision best practices:

Use DECIMAL for currency: DECIMAL(15,2) for values up to trillions with cent precision
Cast before summing if concerned: SUM(CAST(col AS DECIMAL(20,4)))
Be aware of intermediate calculations: SUM(price * quantity) may have different precision than SUM(price) * SUM(quantity)
Test with extreme values: Sum the maximum value of your type millions of times in development
Consider using monetary types: PostgreSQL's money type, or application-layer decimal libraries

SUM with GROUP BY: Segmented Totals

Like COUNT, SUM becomes especially powerful when combined with GROUP BY for segmented analysis. You can calculate totals per category, per time period, per region, or any other dimensional breakdown.

sum_group_by.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Total revenue per product category
SELECT 
    category,
    SUM(revenue) AS category_revenue
FROM sales
GROUP BY category
ORDER BY category_revenue DESC;
 
-- Multiple levels of grouping
SELECT 
    region,
    product_line,
    SUM(units_sold) AS total_units,
    SUM(revenue) AS total_revenue,
    SUM(revenue) / NULLIF(SUM(units_sold), 0) AS avg_price
FROM sales
GROUP BY region, product_line
ORDER BY region, total_revenue DESC;
 
-- Time-series aggregation
SELECT 
    DATE_TRUNC('month', order_date) AS month,
    SUM(order_total) AS monthly_revenue,
    SUM(SUM(order_total)) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS cumulative_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Advanced grouping with ROLLUP and CUBE:

For summary reports that need subtotals and grand totals, SQL provides ROLLUP and CUBE extensions to GROUP BY.

sum_rollup_cube.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- ROLLUP: Hierarchical subtotals
SELECT 
    COALESCE(region, 'ALL REGIONS') AS region,
    COALESCE(product_category, 'ALL CATEGORIES') AS category,
    SUM(revenue) AS total_revenue
FROM sales
GROUP BY ROLLUP(region, product_category)
ORDER BY region NULLS FIRST, product_category NULLS FIRST;
-- Results include: each region/category combo, region subtotals, grand total
 
-- CUBE: All possible subtotal combinations
SELECT 
    COALESCE(region, 'ALL') AS region,
    COALESCE(channel, 'ALL') AS channel,
    SUM(revenue) AS total_revenue
FROM sales
GROUP BY CUBE(region, channel);
-- Results include: each combo, region totals, channel totals, grand total
 
-- GROUPING function to identify subtotal rows
SELECT 
    region,
    product_category,
    SUM(revenue) AS total_revenue,
    GROUPING(region) AS is_region_total,
    GROUPING(product_category) AS is_category_total
FROM sales
GROUP BY ROLLUP(region, product_category);

GROUPING() Function

Common Patterns and Best Practices

Let's examine production-ready patterns that leverage SUM effectively while avoiding common pitfalls.

sum_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Pattern: Running total (cumulative sum)
SELECT 
    transaction_date,
    amount,
    SUM(amount) OVER (ORDER BY transaction_date) AS running_total
FROM transactions;
 
-- Pattern: Running total with partition reset
SELECT 
    account_id,
    transaction_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY account_id 
        ORDER BY transaction_date
    ) AS account_running_total
FROM transactions;
 
-- Pattern: Percentage of total
SELECT 
    category,
    SUM(revenue) AS category_revenue,
    SUM(revenue) * 100.0 / SUM(SUM(revenue)) OVER () AS pct_of_total
FROM sales
GROUP BY category;
 
-- Pattern: Year-over-year comparison
SELECT 
    product_id,
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2023 THEN amount END) AS sales_2023,
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2024 THEN amount END) AS sales_2024,
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2024 THEN amount END) -
    SUM(CASE WHEN EXTRACT(YEAR FROM sale_date) = 2023 THEN amount END) AS yoy_change
FROM sales
GROUP BY product_id;

SUM Best Practices

•Use COALESCE for safe defaults: COALESCE(SUM(amount), 0) prevents NULL results from empty sets
•Prefer DECIMAL for financial data: Avoid floating-point precision issues in money calculations
•Watch for join fan-out: If a join duplicates rows, SUM will count values multiple times
•Use window functions for running totals: More efficient and readable than self-joins
•Consider materialized views for expensive sums: Pre-aggregate frequently queried totals
•Test with NULL data: Ensure your queries handle missing values correctly

Common Mistakes to Avoid

•Summing after a fan-out join: Values counted multiple times give inflated totals
•Expecting zero from empty sets: SUM returns NULL, not 0, when no rows match
•Using FLOAT for currency: Leads to subtle rounding errors over time
•Ignoring NULL propagation: SUM(a + b) is NULL if either a or b is NULL in any row
•Mixing aggregated and non-aggregated columns: Causes errors or undefined behavior

Summary: Mastering SUM

The SUM function is fundamental to data analysis and reporting. Let's consolidate the key concepts:

Key Takeaways

•SUM adds all non-NULL values — NULL values are silently ignored. An all-NULL or empty set returns NULL, not zero.
•Use COALESCE for safe defaults — Wrap SUM in COALESCE when you need zero instead of NULL for empty results.
•SUM(DISTINCT) deduplicates first — Useful for specific scenarios but often indicates join issues that should be fixed structurally.
•Conditional summing with CASE — Enables pivot-style queries and segmented totals in a single pass.
•Use DECIMAL for financial data — FLOAT precision issues compound over time; DECIMAL ensures exact arithmetic.
•Combine with GROUP BY for segmented analysis — Use ROLLUP/CUBE for hierarchical subtotals in reports.

What's next:

Page Complete

2 / 5