Aggregate Functions - Learning Module

Loading content...

0/252

Handling NULLs: Mastering Missing Data in Aggregation

The NULL Challenge in Aggregation

NULL is SQL's representation of missing, unknown, or inapplicable data. In aggregation, NULL doesn't behave like a normal value—it's neither zero nor empty string nor any other concrete value. This creates both challenges and opportunities.

Across the aggregate functions we've studied, NULL handling follows consistent patterns, yet the implications differ based on context. Understanding these patterns—and knowing when to override them—is essential for producing accurate aggregation results.

What You Will Learn

By the end of this page, you will have a consolidated understanding of NULL handling across all standard aggregates, master techniques for controlling NULL behavior, understand the three-valued logic implications for aggregation, and know when to treat NULL as zero versus when to exclude it.

The Unified NULL Exclusion Rule

All standard SQL aggregate functions (except COUNT(*)) follow the same fundamental rule:

NULL values are excluded from aggregation.

This means:

NULLs don't contribute to sums
NULLs don't increment counts (in COUNT(column))
NULLs don't affect averages
NULLs aren't considered when finding MIN or MAX

The only exception is COUNT(*), which counts rows regardless of what values they contain.

NULL Handling Across Aggregate Functions
Function	NULL Handling	Empty/All-NULL Result	Key Implication
COUNT(*)	Counts all rows including NULLs	0	Only aggregate that never returns NULL
COUNT(column)	Excludes NULL values	0	COUNT(*) − COUNT(col) = NULL count
COUNT(DISTINCT col)	Excludes NULL values	0	NULLs don't count as distinct
SUM(column)	Excludes NULL values	NULL	NULL ≠ 0 in sums
AVG(column)	Excludes NULLs from both sum and count	NULL	Can differ from SUM/COUNT(*)
MIN(column)	Excludes NULL values	NULL	NULL isn't 'smallest'
MAX(column)	Excludes NULL values	NULL	NULL isn't 'largest'

null_unified_demo.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Sample data for demonstrations
-- values: 10, NULL, 30, NULL, 50
 
-- All aggregates in action
SELECT 
    COUNT(*) AS all_rows,           -- 5 (counts all rows)
    COUNT(value) AS non_null,       -- 3 (excludes NULLs)
    SUM(value) AS total,            -- 90 (10+30+50)
    AVG(value) AS average,          -- 30 (90/3, not 90/5)
    MIN(value) AS minimum,          -- 10
    MAX(value) AS maximum           -- 50
FROM sample_values;
 
-- Demonstrating COUNT(*)  - COUNT(column) = NULL count
SELECT 
    COUNT(*) - COUNT(value) AS null_count
FROM sample_values;  -- Result: 2

Why This Behavior?

NULL means 'unknown.' Adding an unknown value to 90 gives an unknown total—not 90. The smallest of (10, unknown, 30) is still 10 because we know 10 exists; we just don't know what the unknown value is. SQL's NULL-exclusion preserves what we know rather than guessing at what we don't.

Empty Set and All-NULL Behavior

A critical distinction exists between an empty result set and a result set containing only NULL values. Understanding both behaviors is essential for robust queries.

Empty result set (no rows):

When no rows match the WHERE clause, aggregates face the mathematical question: what is the sum/average/min/max of nothing?

empty_set_behavior.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- No matching rows
SELECT 
    COUNT(*) AS cnt,     -- 0 (zero rows counted)
    COUNT(value) AS non_null,  -- 0
    SUM(value) AS total,       -- NULL (no values to sum)
    AVG(value) AS average,     -- NULL (undefined)
    MIN(value) AS minimum,     -- NULL (no values exist)
    MAX(value) AS maximum      -- NULL (no values exist)
FROM sample_values
WHERE 1 = 0;  -- False condition = empty set
 
-- All-NULL data (rows exist, but all values are NULL)
SELECT 
    COUNT(*) AS cnt,           -- 5 (if 5 NULL rows)
    COUNT(value) AS non_null,  -- 0 (none are non-NULL)
    SUM(value) AS total,       -- NULL
    AVG(value) AS average,     -- NULL (0 non-NULL values)
    MIN(value) AS minimum,     -- NULL
    MAX(value) AS maximum      -- NULL
FROM (SELECT NULL AS value FROM generate_series(1,5)) AS nulls;

COUNT(*) is Different

COUNT() returns 0 for empty sets but returns the actual row count for all-NULL data. This is the key differentiator: COUNT() counts rows, while all other aggregates process values. Empty set = 0 rows. All-NULL set = N rows, all with NULL values.

Practical handling:

Use COALESCE to provide default values when aggregates might return NULL:

null_coalesce_handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Safe defaults for empty/all-NULL scenarios
SELECT 
    COUNT(*) AS row_count,  -- Already returns 0, not NULL
    COALESCE(SUM(amount), 0) AS total,
    COALESCE(AVG(amount), 0) AS average,
    COALESCE(MIN(amount), 0) AS minimum,
    COALESCE(MAX(amount), 0) AS maximum
FROM orders
WHERE customer_id = 99999;  -- Non-existent customer
 
-- In reports, handle NULL explicitly
SELECT 
    department_id,
    CASE 
        WHEN COUNT(salary) = 0 THEN 'No salary data'
        ELSE CAST(AVG(salary) AS VARCHAR)
    END AS avg_salary_display
FROM employees
GROUP BY department_id;

Controlling NULL Behavior: COALESCE Strategies

While default NULL exclusion is often correct, sometimes you need different behavior. The primary tool for controlling NULL in aggregation is COALESCE—and where you place it matters greatly.

COALESCE inside the aggregate:

When COALESCE is inside the aggregate function, you're converting NULL values before aggregation. This changes which values participate in the calculation.

coalesce_inside.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Data: bonus values are 100, NULL, 200, NULL, 300
 
-- Standard AVG (excludes NULLs)
SELECT AVG(bonus) AS avg_bonus FROM employees;
-- Result: (100+200+300)/3 = 200
 
-- COALESCE inside: Treats NULL as 0
SELECT AVG(COALESCE(bonus, 0)) AS avg_bonus_with_zeros FROM employees;
-- Result: (100+0+200+0+300)/5 = 120
 
-- For SUM, COALESCE inside has no effect (NULL + 0 = same sum)
SELECT SUM(bonus) AS regular_sum FROM employees;              -- 600
SELECT SUM(COALESCE(bonus, 0)) AS sum_with_coalesce FROM employees;  -- 600
 
-- For COUNT, COALESCE allows counting NULLs as values
SELECT COUNT(bonus) AS count_regular FROM employees;          -- 3
SELECT COUNT(COALESCE(bonus, 0)) AS count_with_coalesce FROM employees;  -- 5
-- (Equivalent to COUNT(*) if all NULLs convert to non-NULL values)

COALESCE outside the aggregate:

When COALESCE wraps the entire aggregate, you're providing a default for when the aggregate result itself is NULL (empty set or all-NULL data).

coalesce_outside.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Empty result set scenario
SELECT SUM(bonus) AS raw_sum FROM employees WHERE 1 = 0;
-- Result: NULL (no rows to sum)
 
SELECT COALESCE(SUM(bonus), 0) AS safe_sum FROM employees WHERE 1 = 0;
-- Result: 0 (default when aggregate is NULL)
 
-- All-NULL data scenario
SELECT AVG(bonus) AS raw_avg 
FROM employees 
WHERE department = 'New Hires';  -- All have NULL bonuses
-- Result: NULL
 
SELECT COALESCE(AVG(bonus), 0) AS safe_avg 
FROM employees 
WHERE department = 'New Hires';
-- Result: 0 (default when no non-NULL values)

COALESCE Placement Effects
Placement	Effect	Use When
Inside: AGG(COALESCE(col, default))	Changes which values participate	NULL semantically means the default value
Outside: COALESCE(AGG(col), default)	Provides default for NULL result	Empty/all-NULL should return a specific value
Both: COALESCE(AGG(COALESCE(...)),default)	Both effects combined	Need both behaviors simultaneously

Semantic Decision

Ask yourself: Does NULL mean 'zero' or 'unknown'? If an employee without a recorded bonus truly received no bonus (=$0), use COALESCE inside. If NULL means 'bonus not yet determined,' exclude it from averages—don't treat it as zero.

NULL Propagation in Aggregate Expressions

When aggregating expressions rather than single columns, NULL propagation follows standard SQL rules: any operation involving NULL yields NULL. This can unexpectedly exclude rows from aggregation.

null_propagation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Sample data:
-- employee | salary | bonus
-- Alice    | 50000  | 5000
-- Bob      | 60000  | NULL
-- Carol    | 55000  | 3000
 
-- Aggregating an expression with potential NULL
SELECT SUM(salary + bonus) AS total_compensation
FROM employees;
-- Result: (50000+5000) + (55000+3000) = 113000
-- Bob excluded! salary + NULL = NULL, and SUM ignores NULL
 
-- What you probably wanted:
SELECT SUM(salary + COALESCE(bonus, 0)) AS total_compensation
FROM employees;
-- Result: 55000 + 60000 + 58000 = 173000
-- Bob now included with bonus = 0
 
-- Understanding the difference
SELECT 
    SUM(salary) AS total_salary,                    -- 165000 (all included)
    SUM(bonus) AS total_bonus,                      -- 8000 (Bob excluded)
    SUM(salary) + SUM(bonus) AS sum_of_sums,        -- 173000
    SUM(salary + bonus) AS sum_of_expression        -- 113000 (Bob excluded!)
FROM employees;

Critical Insight

SUM(A) + SUM(B) is NOT always equal to SUM(A + B) when NULLs exist! The former sums A and B separately (potentially different rows), while the latter excludes any row where A + B is NULL (i.e., where either A or B is NULL).

null_expression_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Pattern: Safely aggregate expressions with multiple columns
SELECT 
    SUM(COALESCE(price, 0) * COALESCE(quantity, 0)) AS safe_total
FROM order_items;
 
-- Pattern: Conditional aggregation with NULL-generating CASE
SELECT 
    SUM(CASE WHEN status = 'completed' THEN amount END) AS completed_sum
FROM orders;
-- CASE returns NULL when condition is false, excluded from SUM
-- This is the intended behavior for conditional sums!
 
-- Pattern: Watch for NULL in WHERE vs in aggregation
SELECT AVG(salary) FROM employees WHERE bonus IS NOT NULL;
-- Different from:
SELECT AVG(salary) FROM employees;
-- First filters rows first; second filters only for AVG of bonus
 
-- Pattern: NULLIF to force NULL (inverse of COALESCE)
SELECT AVG(NULLIF(score, 0)) AS avg_nonzero_score
FROM test_results;
-- Treats 0 as 'no data' by converting it to NULL before aggregation

NULL Behavior in GROUP BY

When NULL appears in a GROUP BY column, it behaves differently than you might expect: all NULL values form a single group. This is an exception to the usual rule that NULL ≠ NULL.

null_group_by.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Sample data:
-- employee    | department_id
-- Alice       | 10
-- Bob         | 10
-- Carol       | 20
-- David       | NULL
-- Eve         | NULL
 
SELECT 
    department_id,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department_id;
 
-- Results:
-- department_id | employee_count
-- 10            | 2
-- 20            | 1
-- NULL          | 2  <-- David and Eve grouped together!

NULL Grouping Exception

In equality comparisons, NULL = NULL is UNKNOWN (not TRUE). But in GROUP BY, all NULLs are placed in the same group—as if they're equal. This is a practical necessity; otherwise, each NULL would form its own group.

null_group_by_handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Handling NULL in grouping column for display
SELECT 
    COALESCE(department_id::text, 'Unassigned') AS department,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
ORDER BY department_id NULLS LAST;
 
-- Excluding NULL group entirely
SELECT 
    department_id,
    COUNT(*) AS employee_count
FROM employees
WHERE department_id IS NOT NULL
GROUP BY department_id;
 
-- Treating NULL as a specific value for grouping
SELECT 
    COALESCE(department_id, -1) AS dept_group,
    COUNT(*) AS employee_count
FROM employees
GROUP BY COALESCE(department_id, -1);
 
-- Multiple NULL columns in GROUP BY
SELECT 
    region,
    product_category,
    SUM(sales)
FROM sales_data
GROUP BY region, product_category;
-- If region=NULL, category='Electronics': forms one group
-- If region=NULL, category=NULL: forms another group
-- All NULL-NULL combinations group together

NULL and ORDER BY NULLS FIRST/LAST:

When ordering grouped results, you can control where NULL groups appear:

null_ordering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- NULL groups at the end
SELECT department_id, COUNT(*)
FROM employees
GROUP BY department_id
ORDER BY department_id NULLS LAST;
 
-- NULL groups at the beginning
SELECT department_id, COUNT(*)
FROM employees
GROUP BY department_id
ORDER BY department_id NULLS FIRST;
 
-- Database-specific defaults:
-- PostgreSQL: NULLS LAST for ASC, NULLS FIRST for DESC
-- SQL Server: NULLs treated as smallest value (first in ASC)
-- Oracle: NULLS LAST for ASC by default

Database-Specific NULL Features

While standard aggregate NULL handling is consistent across databases, some systems offer additional NULL-related features.

null_postgresql.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- FILTER clause excludes NULLs naturally via condition
SELECT 
    COUNT(*) FILTER (WHERE bonus IS NOT NULL) AS with_bonus,
    AVG(salary) FILTER (WHERE department = 'Engineering') AS eng_avg
FROM employees;
 
-- array_agg with NULL handling
SELECT 
    array_agg(name) AS all_names,  -- Includes NULLs
    array_agg(name) FILTER (WHERE name IS NOT NULL) AS non_null_names
FROM employees;
 
-- DISTINCT with NULL in arrays
SELECT array_agg(DISTINCT department_id) FROM employees;
-- NULLs may appear once in the array
 
-- IS DISTINCT FROM for NULL-safe comparison
SELECT * FROM employees 
WHERE department_id IS DISTINCT FROM 10;
-- Returns rows where dept ≠ 10 OR dept IS NULL

Best Practices and Decision Framework

Let's consolidate best practices for handling NULL in aggregation into a decision framework.

NULL Decision Framework

•What does NULL mean in this context? — 'Unknown,' 'Not applicable,' 'Zero,' or 'Not yet set'? This determines handling strategy.
•Should NULLs be excluded or converted? — If NULL means 'unknown,' exclude (default). If NULL means 'zero,' use COALESCE inside aggregate.
•What should happen with empty results? — If empty set should return a value (not NULL), use COALESCE outside aggregate.
•Are expressions involved? — Remember NULL propagates: A + NULL = NULL. Use COALESCE on individual terms if needed.
•Is GROUP BY involved? — Remember NULLs form one group. Decide if you want to exclude, display, or relabel this group.

null_patterns_summary.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Pattern: Count NULL values
SELECT COUNT(*) - COUNT(column) AS null_count FROM table;
 
-- Pattern: Percentage of non-NULL
SELECT 
    COUNT(column) * 100.0 / NULLIF(COUNT(*), 0) AS fill_rate_pct
FROM table;
 
-- Pattern: Safe aggregation for reports
SELECT 
    COALESCE(SUM(amount), 0) AS total,
    COALESCE(AVG(amount), 0) AS average,
    COALESCE(MIN(amount), 0) AS minimum,    
    COALESCE(MAX(amount), 0) AS maximum,
    COUNT(*) AS row_count,
    COUNT(amount) AS non_null_count
FROM orders
WHERE customer_id = :customer_id;
 
-- Pattern: Conditional aggregation handling NULL outcomes
SELECT 
    department_id,
    AVG(salary) AS avg_all,
    AVG(CASE WHEN performance = 'high' THEN salary END) AS avg_high_perf,
    -- avg_high_perf will be NULL for depts with no high performers
    COALESCE(
        AVG(CASE WHEN performance = 'high' THEN salary END),
        AVG(salary)
    ) AS avg_high_or_overall
FROM employees
GROUP BY department_id;

Key Best Practices

•Be explicit about NULL intent: Always document whether NULL exclusion is intentional or if COALESCE should be used.
•Test with NULL data: Include NULL cases in your test data to verify correct behavior.
•Use COUNT(*) − COUNT(col) for NULL counts: More efficient than COUNT(CASE WHEN col IS NULL ...).
•Prefer COALESCE over ISNULL/NVL: COALESCE is SQL standard and supports multiple arguments.
•Handle empty sets explicitly: COALESCE(AGG(...), default) prevents NULL surprises in reports.
•Document GROUP BY NULL behavior: Note that NULL forms its own group in comments for maintainability.

Summary: Mastering NULL in Aggregation

NULL handling in aggregation follows consistent patterns but requires careful thought. Let's consolidate the key concepts:

Key Takeaways

•All aggregates except COUNT(*) exclude NULLs — This is consistent SQL behavior across all database systems.
•Empty sets return NULL for all aggregates except COUNT(*) — Use COALESCE for safe defaults.
•COALESCE placement matters — Inside converts values before aggregation; outside handles NULL results.
•NULL propagates in expressions — SUM(A + B) excludes rows where either A or B is NULL.
•GROUP BY treats all NULLs as one group — An exception to 'NULL ≠ NULL' for practical grouping.
•Semantic meaning determines handling — NULL as 'unknown' should be excluded; NULL as 'zero' should be converted.

Module Complete:

You've now mastered all five standard aggregate functions—COUNT, SUM, AVG, MIN, and MAX—along with their comprehensive NULL handling behaviors. This foundation prepares you for the next module: GROUP BY, where we'll explore how aggregation works across segmented data.

Module Complete: Aggregate Functions

You now have complete mastery of SQL aggregate functions: understanding how each function processes data, handles NULLs, and integrates with grouping operations. These skills form the foundation for all SQL data analysis and reporting.