Database Management SystemsAggregate Functions

SQL Aggregate Functions: Summarizing Data with Precision

LevelIntermediate

Duration75 mins

TopicAggregate Functions

3 / 5

AVG: Computing Means with Semantic Precision

The Art of Finding the Middle

The average—or arithmetic mean—is perhaps the most intuitive statistical measure. We use averages daily: average temperature, average salary, average order value. The SQL AVG() function seems straightforward, yet it conceals subtle behaviors that can produce unexpectedly wrong results if not understood.

At its core, AVG() computes SUM(values) / COUNT(values). Simple, right? But what happens when values are NULL? When you compare AVG(column) to SUM(column) / COUNT(*)? When precision is critical? Understanding these nuances transforms AVG from a black box into a precision tool.

What You Will Learn

By the end of this page, you will understand AVG's precise NULL handling semantics, why AVG(column) and SUM(column)/COUNT(*) can give completely different results, how to compute weighted averages, and how to handle edge cases like empty sets and division by zero safely.

AVG Fundamentals

The AVG() function computes the arithmetic mean of a set of numeric values. It operates by summing all non-NULL values and dividing by the count of non-NULL values.

Basic syntax:

AVG(expression)
AVG(DISTINCT expression)
AVG(ALL expression)  -- Default behavior

The expression must yield a numeric value. Attempting to average strings or other non-numeric types produces an error.

avg_basics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Average salary across all employees
SELECT AVG(salary) AS average_salary
FROM employees;
 
-- Average with WHERE clause
SELECT AVG(salary) AS avg_engineering_salary
FROM employees
WHERE department = 'Engineering';
 
-- Multiple averages in one query
SELECT 
    AVG(salary) AS avg_salary,
    AVG(bonus) AS avg_bonus,
    AVG(salary + COALESCE(bonus, 0)) AS avg_total_comp
FROM employees;
 
-- Average of an expression
SELECT AVG(quantity * unit_price) AS avg_order_value
FROM order_items;

The mathematical definition:

AVG(column) = SUM(column) / COUNT(column)

Note carefully: it's COUNT(column), not COUNT(*). This distinction is crucial for understanding NULL handling, which we'll explore in depth.

Return Type

AVG typically returns a floating-point or decimal type, even when averaging integers. This is mathematically necessary—the average of 1 and 2 is 1.5, not 1. Check your database's documentation for exact return type behavior.

NULL Handling: The Critical Distinction

AVG's NULL handling is where most misunderstandings occur. Like SUM and COUNT(column), AVG ignores NULL values completely. But the implications are more significant for averaging than for summing.

Consider what "ignoring NULLs" means for AVG:

NULL values don't contribute to the sum
NULL values don't increment the count
The average is computed over non-NULL values only

This behavior has profound implications when comparing AVG(column) to manual calculation.

Sample Data: Employee Salaries with NULLs
Employee	Salary
Alice	100
Bob	200
Carol	NULL
David	300
Eve	NULL

avg_null_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Using the sample data above:
 
-- AVG ignores NULLs:
SELECT AVG(salary) AS avg_salary
FROM employees;
-- Result: (100 + 200 + 300) / 3 = 200
-- NULLs excluded from both sum and count
 
-- Manual calculation with COUNT(*) gives different result:
SELECT SUM(salary) / COUNT(*) AS manual_avg
FROM employees;
-- Result: (100 + 200 + 300) / 5 = 120
-- Divides by ALL rows, including those with NULL salary
 
-- Manual calculation with COUNT(column) matches AVG:
SELECT SUM(salary) / COUNT(salary) AS manual_avg_correct
FROM employees;
-- Result: (100 + 200 + 300) / 3 = 200
-- COUNT(salary) excludes NULLs, matching AVG behavior

The Difference Can Be Dramatic

In the example, AVG gives 200 while SUM/COUNT() gives 120—a 67% difference! This isn't a bug; it reflects different questions. AVG asks "what's the average among those with salaries?" while SUM/COUNT() asks "what's the average across all employees?" (treating missing data as zero). Know which question you're answering.

When to treat NULL as zero:

If NULL conceptually means "zero" in your domain (e.g., no bonus received = $0 bonus), convert NULLs before averaging:

avg_coalesce.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Treat NULL as zero for bonus calculation
SELECT AVG(COALESCE(bonus, 0)) AS avg_bonus_with_zeros
FROM employees;
-- Now employees with NULL bonus contribute 0 to the average
 
-- Compare the two approaches:
SELECT 
    AVG(bonus) AS avg_excluding_null,           -- Ignores NULL
    AVG(COALESCE(bonus, 0)) AS avg_with_zeros,  -- Treats NULL as 0
    COUNT(*) AS total_employees,
    COUNT(bonus) AS employees_with_bonus
FROM employees;
 
-- The results reveal the difference:
-- If 10 employees, 5 have bonuses averaging $2000:
-- avg_excluding_null = 2000 (average of those receiving bonuses)
-- avg_with_zeros = 1000 (accounting for 5 employees with $0)

Empty set behavior:

Like other aggregates, AVG returns NULL when applied to an empty set or when all values are NULL. This makes mathematical sense—averaging over nothing is undefined, not zero.

avg_empty_set.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Empty result set
SELECT AVG(salary) FROM employees WHERE 1 = 0;
-- Result: NULL (no rows to average)
 
-- All NULL values
SELECT AVG(salary) FROM employees WHERE salary IS NULL;
-- Result: NULL (no non-NULL values to average)
 
-- Safe handling with COALESCE
SELECT COALESCE(AVG(salary), 0) AS safe_avg
FROM employees
WHERE department = 'NonExistent';
-- Result: 0 (instead of NULL)

AVG(DISTINCT): Averaging Unique Values

AVG(DISTINCT expression) computes the average of unique values only. If the same value appears multiple times, it's counted once in both the sum and the count.

This is less commonly used than AVG(ALL) but has specific applications in data analysis.

avg_distinct.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Sample values: 10, 20, 20, 30
SELECT 
    AVG(value) AS regular_avg,           -- (10+20+20+30)/4 = 20
    AVG(DISTINCT value) AS distinct_avg  -- (10+20+30)/3 = 20
FROM sample;
-- In this case, they happen to match, but that's coincidental
 
-- Another example: 10, 10, 10, 40
SELECT 
    AVG(value) AS regular_avg,           -- (10+10+10+40)/4 = 17.5
    AVG(DISTINCT value) AS distinct_avg  -- (10+40)/2 = 25
FROM sample;
-- Now they differ significantly
 
-- Practical use: Average of distinct price points
SELECT AVG(DISTINCT unit_price) AS avg_distinct_price
FROM products;
-- Answers: "What's the average of our unique price points?"
-- vs AVG(unit_price): "What's the average price weighted by product count?"

When to Use AVG(DISTINCT)

Use AVG(DISTINCT) when you want each unique value to contribute equally regardless of frequency. This is useful for analyzing unique price tiers, score levels, or rating values where repeated values shouldn't skew the mean.

AVG(DISTINCT) vs Weighted Averages:

AVG(DISTINCT) is essentially an unweighted average of unique values. Regular AVG is naturally weighted by frequency. Neither is inherently "correct"—they answer different questions:

AVG(salary): What's the average salary, weighted by employee count?
AVG(DISTINCT salary): What's the average of distinct salary levels?

Weighted Averages: Beyond Simple Means

Real-world analysis often requires weighted averages rather than simple means. A weighted average assigns different importance to different values based on some weight factor.

Weighted average formula:

Weighted AVG = SUM(value × weight) / SUM(weight)

SQL doesn't have a built-in WEIGHTED_AVG function, but we can construct it using SUM.

weighted_avg.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Simple average of unit prices
SELECT AVG(unit_price) AS simple_avg_price
FROM products;
-- Each product type weighted equally
 
-- Weighted average by quantity sold
SELECT 
    SUM(unit_price * quantity_sold) / NULLIF(SUM(quantity_sold), 0) 
        AS weighted_avg_price
FROM products;
-- Higher-selling products contribute more to the average
 
-- Weighted average with explicit weights
-- Example: GPA calculation with credit hours as weights
SELECT 
    SUM(grade_points * credit_hours) / NULLIF(SUM(credit_hours), 0) 
        AS weighted_gpa
FROM student_grades
WHERE student_id = 12345;

Practical weighted average examples:

weighted_avg_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Weighted average cost (inventory costing)
SELECT 
    product_id,
    SUM(quantity * unit_cost) / NULLIF(SUM(quantity), 0) AS wac
FROM inventory_transactions
GROUP BY product_id;
 
-- Portfolio-weighted return
SELECT 
    portfolio_id,
    SUM(position_value * return_pct) / NULLIF(SUM(position_value), 0) 
        AS portfolio_return
FROM positions
GROUP BY portfolio_id;
 
-- Survey weighted average (demographic weighting)
SELECT 
    question_id,
    SUM(response_value * demographic_weight) / NULLIF(SUM(demographic_weight), 0) 
        AS weighted_response
FROM survey_responses
GROUP BY question_id;
 
-- Time-weighted average (values sampled at different intervals)
SELECT 
    sensor_id,
    SUM(reading * duration_seconds) / NULLIF(SUM(duration_seconds), 0) 
        AS time_weighted_avg
FROM sensor_readings
GROUP BY sensor_id;

Division by Zero Protection

Always use NULLIF(SUM(weight), 0) or equivalent protection in weighted average calculations. If all weights are zero or all records have NULL weights, you'd divide by zero, causing an error in most databases.

NULL handling in weighted averages:

When computing weighted averages, NULLs can appear in values, weights, or both. The standard SUM() behavior excludes NULLs, which may not be desired:

weighted_avg_nulls.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- If either value or weight is NULL, that row is excluded
-- (because NULL * anything = NULL, and SUM ignores NULL)
 
-- Explicit NULL handling
SELECT 
    SUM(COALESCE(value, 0) * COALESCE(weight, 1)) / 
    NULLIF(SUM(COALESCE(weight, 1)), 0) AS weighted_avg_with_defaults
FROM data;
-- Treats NULL value as 0, NULL weight as 1
 
-- Or exclude rows with any NULL explicitly
SELECT 
    SUM(value * weight) / NULLIF(SUM(weight), 0) AS weighted_avg
FROM data
WHERE value IS NOT NULL AND weight IS NOT NULL;

AVG with GROUP BY: Segmented Averages

Combining AVG with GROUP BY enables powerful segmented analysis. You can compute averages by category, time period, demographic, or any other dimension.

avg_group_by.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Average salary by department
SELECT 
    department,
    AVG(salary) AS avg_salary,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
 
-- Average order value by customer segment
SELECT 
    customer_segment,
    AVG(order_total) AS avg_order_value,
    SUM(order_total) AS total_revenue,
    COUNT(*) AS order_count
FROM orders o
JOIN customers c ON o.customer_id = c.id
GROUP BY customer_segment;
 
-- Monthly average with trend analysis
SELECT 
    DATE_TRUNC('month', order_date) AS month,
    AVG(order_total) AS avg_order_value,
    AVG(AVG(order_total)) OVER (
        ORDER BY DATE_TRUNC('month', order_date)
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3mo
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Combining AVG with HAVING:

Use HAVING to filter groups based on their aggregate values:

avg_having.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Departments with high average salaries
SELECT 
    department,
    AVG(salary) AS avg_salary,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000
ORDER BY avg_salary DESC;
 
-- Products with low average rating
SELECT 
    product_id,
    AVG(rating) AS avg_rating,
    COUNT(*) AS review_count
FROM reviews
GROUP BY product_id
HAVING AVG(rating) < 3.0 AND COUNT(*) >= 10  -- Minimum reviews threshold
ORDER BY avg_rating;
 
-- Categories significantly above overall average
SELECT 
    category,
    AVG(price) AS category_avg
FROM products
GROUP BY category
HAVING AVG(price) > (SELECT AVG(price) * 1.2 FROM products);  -- 20% above overall

Minimum Sample Size

When comparing averages across groups, include a minimum count threshold. An average based on 2 data points is far less reliable than one based on 1000. Use HAVING COUNT(*) >= n to exclude statistically unreliable groups.

Precision, Rounding, and Edge Cases

AVG involves division, which introduces precision considerations that SUM and COUNT don't face directly. Understanding these nuances is essential for financial calculations and high-precision analysis.

avg_precision.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Integer division trap (some databases)
-- If salary is INTEGER, division might truncate
SELECT AVG(salary) FROM employees;  -- Usually returns DECIMAL/FLOAT
 
-- Explicit precision control
SELECT ROUND(AVG(salary), 2) AS avg_salary_rounded
FROM employees;
 
-- Cast for specific precision
SELECT CAST(AVG(salary) AS DECIMAL(15,2)) AS avg_salary_precise
FROM employees;
 
-- Compare different precision approaches
SELECT 
    AVG(price) AS default_avg,
    ROUND(AVG(price), 4) AS rounded_4,
    TRUNC(AVG(price), 2) AS truncated_2,
    CAST(AVG(price) AS DECIMAL(10,2)) AS cast_decimal
FROM products;

Floating-point precision issues:

When averaging floating-point numbers, small precision errors can accumulate:

AVG(0.1, 0.2) might return 0.15000000000000002 instead of exactly 0.15
These errors are usually negligible but can cause issues in equality comparisons

For financial data, use DECIMAL/NUMERIC types and round appropriately.

avg_edge_cases.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Edge case: Single value
SELECT AVG(salary) FROM employees WHERE id = 1;
-- Returns that one salary value (average of one = itself)
 
-- Edge case: All same values
SELECT AVG(salary) FROM employees WHERE salary = 50000;
-- Returns 50000 (average of identical values = that value)
 
-- Edge case: Empty set
SELECT AVG(salary) FROM employees WHERE 1 = 0;
-- Returns NULL
 
-- Edge case: Division precision
SELECT 
    1/3 AS integer_division,           -- 0 (integer division)
    1.0/3 AS float_division,           -- 0.333...
    CAST(1 AS DECIMAL(10,6))/3 AS decimal_division  -- 0.333333
;
 
-- Avoiding comparison issues with floating point averages
-- DON'T:
SELECT * FROM products WHERE AVG(price) OVER () = 19.99;
-- DO:
SELECT * FROM products WHERE ABS(AVG(price) OVER () - 19.99) < 0.001;

Never Compare Floats for Equality

Due to floating-point representation, comparing AVG results directly to expected values may fail. Use range comparisons (ABS(val - expected) < epsilon) or ROUND to a fixed precision before comparison.

Common Patterns and Best Practices

Let's consolidate the most useful AVG patterns for production use.

avg_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Pattern: Moving average (rolling average)
SELECT 
    date,
    value,
    AVG(value) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7day
FROM daily_metrics;
 
-- Pattern: Conditional average
SELECT 
    AVG(CASE WHEN status = 'completed' THEN order_total END) AS avg_completed,
    AVG(CASE WHEN status = 'cancelled' THEN order_total END) AS avg_cancelled
FROM orders;
 
-- PostgreSQL FILTER syntax (cleaner)
SELECT 
    AVG(order_total) FILTER (WHERE status = 'completed') AS avg_completed,
    AVG(order_total) FILTER (WHERE status = 'cancelled') AS avg_cancelled
FROM orders;
 
-- Pattern: Average with NULL handling decision
SELECT 
    AVG(rating) AS avg_rated_only,           -- Excludes unrated (NULL)
    AVG(COALESCE(rating, 0)) AS avg_all,     -- Treats unrated as 0
    AVG(COALESCE(rating, 3)) AS avg_default  -- Treats unrated as neutral (3)
FROM products;
 
-- Pattern: Percentile comparison
SELECT 
    employee_id,
    salary,
    AVG(salary) OVER () AS company_avg,
    salary - AVG(salary) OVER () AS diff_from_avg,
    (salary - AVG(salary) OVER ()) / AVG(salary) OVER () * 100 AS pct_diff
FROM employees;

AVG Best Practices

•Understand NULL semantics: AVG excludes NULLs. If you want NULL treated as zero, use COALESCE inside AVG.
•Use COALESCE for empty sets: COALESCE(AVG(col), 0) returns 0 instead of NULL for empty results.
•Set minimum sample sizes: Filter groups with HAVING COUNT(*) >= n to ensure statistical validity.
•Round for display: Use ROUND(AVG(...), n) for clean output in reports.
•Use DECIMAL for money: AVG with floating-point introduces precision errors in financial calculations.
•Protect weighted averages from division by zero: Use NULLIF(SUM(weight), 0) in the denominator.

Common Mistakes to Avoid

•Confusing AVG(col) with SUM(col)/COUNT(*): They differ when NULLs exist!
•Averaging averages incorrectly: AVG(group_average) ≠ overall average unless groups are equal size. Use weighted average.
•Ignoring sample size: An average of 2 reviews is meaningless compared to 2000 reviews.
•Using AVG for non-numeric intent: Averaging categorical codes or IDs is usually meaningless.
•Expecting zero from empty sets: AVG returns NULL, not 0, when no rows match.

The Averaging Averages Trap

If Department A has 10 employees averaging $50K and Department B has 100 employees averaging $80K, the company average is NOT ($50K + $80K) / 2 = $65K. It's (10×$50K + 100×$80K) / 110 = $77.3K. Always use weighted averages when combining group averages!

Summary: Mastering AVG

The AVG function appears simple but carries important subtleties. Let's consolidate the key concepts:

Key Takeaways

•AVG = SUM(column) / COUNT(column) — Both sum and count exclude NULLs, which can differ from SUM(column)/COUNT(*).
•NULL handling is critical — AVG ignores NULLs completely. Use COALESCE inside AVG if NULLs should count as zero.
•AVG(DISTINCT) averages unique values — Each distinct value counts once, regardless of how many times it appears.
•Weighted averages require manual construction — Use SUM(value × weight) / NULLIF(SUM(weight), 0) for weighted calculations.
•Empty sets return NULL — Use COALESCE(AVG(...), default) when you need a numeric result for empty data.
•Don't average averages directly — Use weighted averages when combining group-level averages.

What's next:

We've covered counting, summing, and averaging. Now we'll explore MIN and MAX—functions that find extreme values in datasets. You'll learn about their behavior with NULL, ordering semantics, and use cases including range analysis and boundary detection.

Page Complete

You now understand AVG with precision—its NULL semantics, relationship to SUM/COUNT, weighted average construction, and common pitfalls. These skills enable accurate statistical analysis and reporting in SQL.

3 / 5

Loading learning content...

Database Management SystemsAggregate Functions

SQL Aggregate Functions: Summarizing Data with Precision

LevelIntermediate

Duration75 mins

TopicAggregate Functions

3 / 5

AVG: Computing Means with Semantic Precision

The Art of Finding the Middle

What You Will Learn

AVG Fundamentals

The AVG() function computes the arithmetic mean of a set of numeric values. It operates by summing all non-NULL values and dividing by the count of non-NULL values.

Basic syntax:

AVG(expression)
AVG(DISTINCT expression)
AVG(ALL expression)  -- Default behavior

The expression must yield a numeric value. Attempting to average strings or other non-numeric types produces an error.

avg_basics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Average salary across all employees
SELECT AVG(salary) AS average_salary
FROM employees;
 
-- Average with WHERE clause
SELECT AVG(salary) AS avg_engineering_salary
FROM employees
WHERE department = 'Engineering';
 
-- Multiple averages in one query
SELECT 
    AVG(salary) AS avg_salary,
    AVG(bonus) AS avg_bonus,
    AVG(salary + COALESCE(bonus, 0)) AS avg_total_comp
FROM employees;
 
-- Average of an expression
SELECT AVG(quantity * unit_price) AS avg_order_value
FROM order_items;

The mathematical definition:

AVG(column) = SUM(column) / COUNT(column)

Note carefully: it's COUNT(column), not COUNT(*). This distinction is crucial for understanding NULL handling, which we'll explore in depth.

Return Type

NULL Handling: The Critical Distinction

Consider what "ignoring NULLs" means for AVG:

NULL values don't contribute to the sum
NULL values don't increment the count
The average is computed over non-NULL values only

This behavior has profound implications when comparing AVG(column) to manual calculation.

Sample Data: Employee Salaries with NULLs
Employee	Salary
Alice	100
Bob	200
Carol	NULL
David	300
Eve	NULL

avg_null_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Using the sample data above:
 
-- AVG ignores NULLs:
SELECT AVG(salary) AS avg_salary
FROM employees;
-- Result: (100 + 200 + 300) / 3 = 200
-- NULLs excluded from both sum and count
 
-- Manual calculation with COUNT(*) gives different result:
SELECT SUM(salary) / COUNT(*) AS manual_avg
FROM employees;
-- Result: (100 + 200 + 300) / 5 = 120
-- Divides by ALL rows, including those with NULL salary
 
-- Manual calculation with COUNT(column) matches AVG:
SELECT SUM(salary) / COUNT(salary) AS manual_avg_correct
FROM employees;
-- Result: (100 + 200 + 300) / 3 = 200
-- COUNT(salary) excludes NULLs, matching AVG behavior

The Difference Can Be Dramatic

When to treat NULL as zero:

If NULL conceptually means "zero" in your domain (e.g., no bonus received = $0 bonus), convert NULLs before averaging:

avg_coalesce.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Treat NULL as zero for bonus calculation
SELECT AVG(COALESCE(bonus, 0)) AS avg_bonus_with_zeros
FROM employees;
-- Now employees with NULL bonus contribute 0 to the average
 
-- Compare the two approaches:
SELECT 
    AVG(bonus) AS avg_excluding_null,           -- Ignores NULL
    AVG(COALESCE(bonus, 0)) AS avg_with_zeros,  -- Treats NULL as 0
    COUNT(*) AS total_employees,
    COUNT(bonus) AS employees_with_bonus
FROM employees;
 
-- The results reveal the difference:
-- If 10 employees, 5 have bonuses averaging $2000:
-- avg_excluding_null = 2000 (average of those receiving bonuses)
-- avg_with_zeros = 1000 (accounting for 5 employees with $0)

Empty set behavior:

Like other aggregates, AVG returns NULL when applied to an empty set or when all values are NULL. This makes mathematical sense—averaging over nothing is undefined, not zero.

avg_empty_set.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Empty result set
SELECT AVG(salary) FROM employees WHERE 1 = 0;
-- Result: NULL (no rows to average)
 
-- All NULL values
SELECT AVG(salary) FROM employees WHERE salary IS NULL;
-- Result: NULL (no non-NULL values to average)
 
-- Safe handling with COALESCE
SELECT COALESCE(AVG(salary), 0) AS safe_avg
FROM employees
WHERE department = 'NonExistent';
-- Result: 0 (instead of NULL)

AVG(DISTINCT): Averaging Unique Values

AVG(DISTINCT expression) computes the average of unique values only. If the same value appears multiple times, it's counted once in both the sum and the count.

This is less commonly used than AVG(ALL) but has specific applications in data analysis.

avg_distinct.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Sample values: 10, 20, 20, 30
SELECT 
    AVG(value) AS regular_avg,           -- (10+20+20+30)/4 = 20
    AVG(DISTINCT value) AS distinct_avg  -- (10+20+30)/3 = 20
FROM sample;
-- In this case, they happen to match, but that's coincidental
 
-- Another example: 10, 10, 10, 40
SELECT 
    AVG(value) AS regular_avg,           -- (10+10+10+40)/4 = 17.5
    AVG(DISTINCT value) AS distinct_avg  -- (10+40)/2 = 25
FROM sample;
-- Now they differ significantly
 
-- Practical use: Average of distinct price points
SELECT AVG(DISTINCT unit_price) AS avg_distinct_price
FROM products;
-- Answers: "What's the average of our unique price points?"
-- vs AVG(unit_price): "What's the average price weighted by product count?"

When to Use AVG(DISTINCT)

AVG(DISTINCT) vs Weighted Averages:

AVG(DISTINCT) is essentially an unweighted average of unique values. Regular AVG is naturally weighted by frequency. Neither is inherently "correct"—they answer different questions:

AVG(salary): What's the average salary, weighted by employee count?
AVG(DISTINCT salary): What's the average of distinct salary levels?

Weighted Averages: Beyond Simple Means

Real-world analysis often requires weighted averages rather than simple means. A weighted average assigns different importance to different values based on some weight factor.

Weighted average formula:

Weighted AVG = SUM(value × weight) / SUM(weight)

SQL doesn't have a built-in WEIGHTED_AVG function, but we can construct it using SUM.

weighted_avg.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Simple average of unit prices
SELECT AVG(unit_price) AS simple_avg_price
FROM products;
-- Each product type weighted equally
 
-- Weighted average by quantity sold
SELECT 
    SUM(unit_price * quantity_sold) / NULLIF(SUM(quantity_sold), 0) 
        AS weighted_avg_price
FROM products;
-- Higher-selling products contribute more to the average
 
-- Weighted average with explicit weights
-- Example: GPA calculation with credit hours as weights
SELECT 
    SUM(grade_points * credit_hours) / NULLIF(SUM(credit_hours), 0) 
        AS weighted_gpa
FROM student_grades
WHERE student_id = 12345;

Practical weighted average examples:

weighted_avg_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Weighted average cost (inventory costing)
SELECT 
    product_id,
    SUM(quantity * unit_cost) / NULLIF(SUM(quantity), 0) AS wac
FROM inventory_transactions
GROUP BY product_id;
 
-- Portfolio-weighted return
SELECT 
    portfolio_id,
    SUM(position_value * return_pct) / NULLIF(SUM(position_value), 0) 
        AS portfolio_return
FROM positions
GROUP BY portfolio_id;
 
-- Survey weighted average (demographic weighting)
SELECT 
    question_id,
    SUM(response_value * demographic_weight) / NULLIF(SUM(demographic_weight), 0) 
        AS weighted_response
FROM survey_responses
GROUP BY question_id;
 
-- Time-weighted average (values sampled at different intervals)
SELECT 
    sensor_id,
    SUM(reading * duration_seconds) / NULLIF(SUM(duration_seconds), 0) 
        AS time_weighted_avg
FROM sensor_readings
GROUP BY sensor_id;

Division by Zero Protection

NULL handling in weighted averages:

When computing weighted averages, NULLs can appear in values, weights, or both. The standard SUM() behavior excludes NULLs, which may not be desired:

weighted_avg_nulls.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- If either value or weight is NULL, that row is excluded
-- (because NULL * anything = NULL, and SUM ignores NULL)
 
-- Explicit NULL handling
SELECT 
    SUM(COALESCE(value, 0) * COALESCE(weight, 1)) / 
    NULLIF(SUM(COALESCE(weight, 1)), 0) AS weighted_avg_with_defaults
FROM data;
-- Treats NULL value as 0, NULL weight as 1
 
-- Or exclude rows with any NULL explicitly
SELECT 
    SUM(value * weight) / NULLIF(SUM(weight), 0) AS weighted_avg
FROM data
WHERE value IS NOT NULL AND weight IS NOT NULL;

AVG with GROUP BY: Segmented Averages

Combining AVG with GROUP BY enables powerful segmented analysis. You can compute averages by category, time period, demographic, or any other dimension.

avg_group_by.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Average salary by department
SELECT 
    department,
    AVG(salary) AS avg_salary,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
 
-- Average order value by customer segment
SELECT 
    customer_segment,
    AVG(order_total) AS avg_order_value,
    SUM(order_total) AS total_revenue,
    COUNT(*) AS order_count
FROM orders o
JOIN customers c ON o.customer_id = c.id
GROUP BY customer_segment;
 
-- Monthly average with trend analysis
SELECT 
    DATE_TRUNC('month', order_date) AS month,
    AVG(order_total) AS avg_order_value,
    AVG(AVG(order_total)) OVER (
        ORDER BY DATE_TRUNC('month', order_date)
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3mo
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Combining AVG with HAVING:

Use HAVING to filter groups based on their aggregate values:

avg_having.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Departments with high average salaries
SELECT 
    department,
    AVG(salary) AS avg_salary,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000
ORDER BY avg_salary DESC;
 
-- Products with low average rating
SELECT 
    product_id,
    AVG(rating) AS avg_rating,
    COUNT(*) AS review_count
FROM reviews
GROUP BY product_id
HAVING AVG(rating) < 3.0 AND COUNT(*) >= 10  -- Minimum reviews threshold
ORDER BY avg_rating;
 
-- Categories significantly above overall average
SELECT 
    category,
    AVG(price) AS category_avg
FROM products
GROUP BY category
HAVING AVG(price) > (SELECT AVG(price) * 1.2 FROM products);  -- 20% above overall

Minimum Sample Size

Precision, Rounding, and Edge Cases

avg_precision.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Integer division trap (some databases)
-- If salary is INTEGER, division might truncate
SELECT AVG(salary) FROM employees;  -- Usually returns DECIMAL/FLOAT
 
-- Explicit precision control
SELECT ROUND(AVG(salary), 2) AS avg_salary_rounded
FROM employees;
 
-- Cast for specific precision
SELECT CAST(AVG(salary) AS DECIMAL(15,2)) AS avg_salary_precise
FROM employees;
 
-- Compare different precision approaches
SELECT 
    AVG(price) AS default_avg,
    ROUND(AVG(price), 4) AS rounded_4,
    TRUNC(AVG(price), 2) AS truncated_2,
    CAST(AVG(price) AS DECIMAL(10,2)) AS cast_decimal
FROM products;

Floating-point precision issues:

When averaging floating-point numbers, small precision errors can accumulate:

AVG(0.1, 0.2) might return 0.15000000000000002 instead of exactly 0.15
These errors are usually negligible but can cause issues in equality comparisons

For financial data, use DECIMAL/NUMERIC types and round appropriately.

avg_edge_cases.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Edge case: Single value
SELECT AVG(salary) FROM employees WHERE id = 1;
-- Returns that one salary value (average of one = itself)
 
-- Edge case: All same values
SELECT AVG(salary) FROM employees WHERE salary = 50000;
-- Returns 50000 (average of identical values = that value)
 
-- Edge case: Empty set
SELECT AVG(salary) FROM employees WHERE 1 = 0;
-- Returns NULL
 
-- Edge case: Division precision
SELECT 
    1/3 AS integer_division,           -- 0 (integer division)
    1.0/3 AS float_division,           -- 0.333...
    CAST(1 AS DECIMAL(10,6))/3 AS decimal_division  -- 0.333333
;
 
-- Avoiding comparison issues with floating point averages
-- DON'T:
SELECT * FROM products WHERE AVG(price) OVER () = 19.99;
-- DO:
SELECT * FROM products WHERE ABS(AVG(price) OVER () - 19.99) < 0.001;

Never Compare Floats for Equality

Common Patterns and Best Practices

Let's consolidate the most useful AVG patterns for production use.

avg_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Pattern: Moving average (rolling average)
SELECT 
    date,
    value,
    AVG(value) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7day
FROM daily_metrics;
 
-- Pattern: Conditional average
SELECT 
    AVG(CASE WHEN status = 'completed' THEN order_total END) AS avg_completed,
    AVG(CASE WHEN status = 'cancelled' THEN order_total END) AS avg_cancelled
FROM orders;
 
-- PostgreSQL FILTER syntax (cleaner)
SELECT 
    AVG(order_total) FILTER (WHERE status = 'completed') AS avg_completed,
    AVG(order_total) FILTER (WHERE status = 'cancelled') AS avg_cancelled
FROM orders;
 
-- Pattern: Average with NULL handling decision
SELECT 
    AVG(rating) AS avg_rated_only,           -- Excludes unrated (NULL)
    AVG(COALESCE(rating, 0)) AS avg_all,     -- Treats unrated as 0
    AVG(COALESCE(rating, 3)) AS avg_default  -- Treats unrated as neutral (3)
FROM products;
 
-- Pattern: Percentile comparison
SELECT 
    employee_id,
    salary,
    AVG(salary) OVER () AS company_avg,
    salary - AVG(salary) OVER () AS diff_from_avg,
    (salary - AVG(salary) OVER ()) / AVG(salary) OVER () * 100 AS pct_diff
FROM employees;

AVG Best Practices

•Understand NULL semantics: AVG excludes NULLs. If you want NULL treated as zero, use COALESCE inside AVG.
•Use COALESCE for empty sets: COALESCE(AVG(col), 0) returns 0 instead of NULL for empty results.
•Set minimum sample sizes: Filter groups with HAVING COUNT(*) >= n to ensure statistical validity.
•Round for display: Use ROUND(AVG(...), n) for clean output in reports.
•Use DECIMAL for money: AVG with floating-point introduces precision errors in financial calculations.
•Protect weighted averages from division by zero: Use NULLIF(SUM(weight), 0) in the denominator.

Common Mistakes to Avoid

•Confusing AVG(col) with SUM(col)/COUNT(*): They differ when NULLs exist!
•Averaging averages incorrectly: AVG(group_average) ≠ overall average unless groups are equal size. Use weighted average.
•Ignoring sample size: An average of 2 reviews is meaningless compared to 2000 reviews.
•Using AVG for non-numeric intent: Averaging categorical codes or IDs is usually meaningless.
•Expecting zero from empty sets: AVG returns NULL, not 0, when no rows match.

The Averaging Averages Trap

Summary: Mastering AVG

The AVG function appears simple but carries important subtleties. Let's consolidate the key concepts:

Key Takeaways

•AVG = SUM(column) / COUNT(column) — Both sum and count exclude NULLs, which can differ from SUM(column)/COUNT(*).
•NULL handling is critical — AVG ignores NULLs completely. Use COALESCE inside AVG if NULLs should count as zero.
•AVG(DISTINCT) averages unique values — Each distinct value counts once, regardless of how many times it appears.
•Weighted averages require manual construction — Use SUM(value × weight) / NULLIF(SUM(weight), 0) for weighted calculations.
•Empty sets return NULL — Use COALESCE(AVG(...), default) when you need a numeric result for empty data.
•Don't average averages directly — Use weighted averages when combining group-level averages.

What's next:

Page Complete

3 / 5