Loading learning content...
The average—or arithmetic mean—is perhaps the most intuitive statistical measure. We use averages daily: average temperature, average salary, average order value. The SQL AVG() function seems straightforward, yet it conceals subtle behaviors that can produce unexpectedly wrong results if not understood.
At its core, AVG() computes SUM(values) / COUNT(values). Simple, right? But what happens when values are NULL? When you compare AVG(column) to SUM(column) / COUNT(*)? When precision is critical? Understanding these nuances transforms AVG from a black box into a precision tool.
By the end of this page, you will understand AVG's precise NULL handling semantics, why AVG(column) and SUM(column)/COUNT(*) can give completely different results, how to compute weighted averages, and how to handle edge cases like empty sets and division by zero safely.
The AVG() function computes the arithmetic mean of a set of numeric values. It operates by summing all non-NULL values and dividing by the count of non-NULL values.
Basic syntax:
AVG(expression)
AVG(DISTINCT expression)
AVG(ALL expression) -- Default behavior
The expression must yield a numeric value. Attempting to average strings or other non-numeric types produces an error.
12345678910111213141516171819
-- Average salary across all employeesSELECT AVG(salary) AS average_salaryFROM employees; -- Average with WHERE clauseSELECT AVG(salary) AS avg_engineering_salaryFROM employeesWHERE department = 'Engineering'; -- Multiple averages in one querySELECT AVG(salary) AS avg_salary, AVG(bonus) AS avg_bonus, AVG(salary + COALESCE(bonus, 0)) AS avg_total_compFROM employees; -- Average of an expressionSELECT AVG(quantity * unit_price) AS avg_order_valueFROM order_items;The mathematical definition:
AVG(column) = SUM(column) / COUNT(column)
Note carefully: it's COUNT(column), not COUNT(*). This distinction is crucial for understanding NULL handling, which we'll explore in depth.
AVG typically returns a floating-point or decimal type, even when averaging integers. This is mathematically necessary—the average of 1 and 2 is 1.5, not 1. Check your database's documentation for exact return type behavior.
AVG's NULL handling is where most misunderstandings occur. Like SUM and COUNT(column), AVG ignores NULL values completely. But the implications are more significant for averaging than for summing.
Consider what "ignoring NULLs" means for AVG:
This behavior has profound implications when comparing AVG(column) to manual calculation.
| Employee | Salary |
|---|---|
| Alice | 100 |
| Bob | 200 |
| Carol | NULL |
| David | 300 |
| Eve | NULL |
12345678910111213141516171819
-- Using the sample data above: -- AVG ignores NULLs:SELECT AVG(salary) AS avg_salaryFROM employees;-- Result: (100 + 200 + 300) / 3 = 200-- NULLs excluded from both sum and count -- Manual calculation with COUNT(*) gives different result:SELECT SUM(salary) / COUNT(*) AS manual_avgFROM employees;-- Result: (100 + 200 + 300) / 5 = 120-- Divides by ALL rows, including those with NULL salary -- Manual calculation with COUNT(column) matches AVG:SELECT SUM(salary) / COUNT(salary) AS manual_avg_correctFROM employees;-- Result: (100 + 200 + 300) / 3 = 200-- COUNT(salary) excludes NULLs, matching AVG behaviorIn the example, AVG gives 200 while SUM/COUNT() gives 120—a 67% difference! This isn't a bug; it reflects different questions. AVG asks "what's the average among those with salaries?" while SUM/COUNT() asks "what's the average across all employees?" (treating missing data as zero). Know which question you're answering.
When to treat NULL as zero:
If NULL conceptually means "zero" in your domain (e.g., no bonus received = $0 bonus), convert NULLs before averaging:
1234567891011121314151617
-- Treat NULL as zero for bonus calculationSELECT AVG(COALESCE(bonus, 0)) AS avg_bonus_with_zerosFROM employees;-- Now employees with NULL bonus contribute 0 to the average -- Compare the two approaches:SELECT AVG(bonus) AS avg_excluding_null, -- Ignores NULL AVG(COALESCE(bonus, 0)) AS avg_with_zeros, -- Treats NULL as 0 COUNT(*) AS total_employees, COUNT(bonus) AS employees_with_bonusFROM employees; -- The results reveal the difference:-- If 10 employees, 5 have bonuses averaging $2000:-- avg_excluding_null = 2000 (average of those receiving bonuses)-- avg_with_zeros = 1000 (accounting for 5 employees with $0)Empty set behavior:
Like other aggregates, AVG returns NULL when applied to an empty set or when all values are NULL. This makes mathematical sense—averaging over nothing is undefined, not zero.
12345678910111213
-- Empty result setSELECT AVG(salary) FROM employees WHERE 1 = 0;-- Result: NULL (no rows to average) -- All NULL valuesSELECT AVG(salary) FROM employees WHERE salary IS NULL;-- Result: NULL (no non-NULL values to average) -- Safe handling with COALESCESELECT COALESCE(AVG(salary), 0) AS safe_avgFROM employeesWHERE department = 'NonExistent';-- Result: 0 (instead of NULL)AVG(DISTINCT expression) computes the average of unique values only. If the same value appears multiple times, it's counted once in both the sum and the count.
This is less commonly used than AVG(ALL) but has specific applications in data analysis.
12345678910111213141516171819
-- Sample values: 10, 20, 20, 30SELECT AVG(value) AS regular_avg, -- (10+20+20+30)/4 = 20 AVG(DISTINCT value) AS distinct_avg -- (10+20+30)/3 = 20FROM sample;-- In this case, they happen to match, but that's coincidental -- Another example: 10, 10, 10, 40SELECT AVG(value) AS regular_avg, -- (10+10+10+40)/4 = 17.5 AVG(DISTINCT value) AS distinct_avg -- (10+40)/2 = 25FROM sample;-- Now they differ significantly -- Practical use: Average of distinct price pointsSELECT AVG(DISTINCT unit_price) AS avg_distinct_priceFROM products;-- Answers: "What's the average of our unique price points?"-- vs AVG(unit_price): "What's the average price weighted by product count?"Use AVG(DISTINCT) when you want each unique value to contribute equally regardless of frequency. This is useful for analyzing unique price tiers, score levels, or rating values where repeated values shouldn't skew the mean.
AVG(DISTINCT) vs Weighted Averages:
AVG(DISTINCT) is essentially an unweighted average of unique values. Regular AVG is naturally weighted by frequency. Neither is inherently "correct"—they answer different questions:
AVG(salary): What's the average salary, weighted by employee count?AVG(DISTINCT salary): What's the average of distinct salary levels?Real-world analysis often requires weighted averages rather than simple means. A weighted average assigns different importance to different values based on some weight factor.
Weighted average formula:
Weighted AVG = SUM(value × weight) / SUM(weight)
SQL doesn't have a built-in WEIGHTED_AVG function, but we can construct it using SUM.
12345678910111213141516171819
-- Simple average of unit pricesSELECT AVG(unit_price) AS simple_avg_priceFROM products;-- Each product type weighted equally -- Weighted average by quantity soldSELECT SUM(unit_price * quantity_sold) / NULLIF(SUM(quantity_sold), 0) AS weighted_avg_priceFROM products;-- Higher-selling products contribute more to the average -- Weighted average with explicit weights-- Example: GPA calculation with credit hours as weightsSELECT SUM(grade_points * credit_hours) / NULLIF(SUM(credit_hours), 0) AS weighted_gpaFROM student_gradesWHERE student_id = 12345;Practical weighted average examples:
123456789101112131415161718192021222324252627282930
-- Weighted average cost (inventory costing)SELECT product_id, SUM(quantity * unit_cost) / NULLIF(SUM(quantity), 0) AS wacFROM inventory_transactionsGROUP BY product_id; -- Portfolio-weighted returnSELECT portfolio_id, SUM(position_value * return_pct) / NULLIF(SUM(position_value), 0) AS portfolio_returnFROM positionsGROUP BY portfolio_id; -- Survey weighted average (demographic weighting)SELECT question_id, SUM(response_value * demographic_weight) / NULLIF(SUM(demographic_weight), 0) AS weighted_responseFROM survey_responsesGROUP BY question_id; -- Time-weighted average (values sampled at different intervals)SELECT sensor_id, SUM(reading * duration_seconds) / NULLIF(SUM(duration_seconds), 0) AS time_weighted_avgFROM sensor_readingsGROUP BY sensor_id;Always use NULLIF(SUM(weight), 0) or equivalent protection in weighted average calculations. If all weights are zero or all records have NULL weights, you'd divide by zero, causing an error in most databases.
NULL handling in weighted averages:
When computing weighted averages, NULLs can appear in values, weights, or both. The standard SUM() behavior excludes NULLs, which may not be desired:
123456789101112131415
-- If either value or weight is NULL, that row is excluded-- (because NULL * anything = NULL, and SUM ignores NULL) -- Explicit NULL handlingSELECT SUM(COALESCE(value, 0) * COALESCE(weight, 1)) / NULLIF(SUM(COALESCE(weight, 1)), 0) AS weighted_avg_with_defaultsFROM data;-- Treats NULL value as 0, NULL weight as 1 -- Or exclude rows with any NULL explicitlySELECT SUM(value * weight) / NULLIF(SUM(weight), 0) AS weighted_avgFROM dataWHERE value IS NOT NULL AND weight IS NOT NULL;Combining AVG with GROUP BY enables powerful segmented analysis. You can compute averages by category, time period, demographic, or any other dimension.
123456789101112131415161718192021222324252627282930
-- Average salary by departmentSELECT department, AVG(salary) AS avg_salary, COUNT(*) AS employee_countFROM employeesGROUP BY departmentORDER BY avg_salary DESC; -- Average order value by customer segmentSELECT customer_segment, AVG(order_total) AS avg_order_value, SUM(order_total) AS total_revenue, COUNT(*) AS order_countFROM orders oJOIN customers c ON o.customer_id = c.idGROUP BY customer_segment; -- Monthly average with trend analysisSELECT DATE_TRUNC('month', order_date) AS month, AVG(order_total) AS avg_order_value, AVG(AVG(order_total)) OVER ( ORDER BY DATE_TRUNC('month', order_date) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS moving_avg_3moFROM ordersGROUP BY DATE_TRUNC('month', order_date)ORDER BY month;Combining AVG with HAVING:
Use HAVING to filter groups based on their aggregate values:
123456789101112131415161718192021222324252627
-- Departments with high average salariesSELECT department, AVG(salary) AS avg_salary, COUNT(*) AS employee_countFROM employeesGROUP BY departmentHAVING AVG(salary) > 75000ORDER BY avg_salary DESC; -- Products with low average ratingSELECT product_id, AVG(rating) AS avg_rating, COUNT(*) AS review_countFROM reviewsGROUP BY product_idHAVING AVG(rating) < 3.0 AND COUNT(*) >= 10 -- Minimum reviews thresholdORDER BY avg_rating; -- Categories significantly above overall averageSELECT category, AVG(price) AS category_avgFROM productsGROUP BY categoryHAVING AVG(price) > (SELECT AVG(price) * 1.2 FROM products); -- 20% above overallWhen comparing averages across groups, include a minimum count threshold. An average based on 2 data points is far less reliable than one based on 1000. Use HAVING COUNT(*) >= n to exclude statistically unreliable groups.
AVG involves division, which introduces precision considerations that SUM and COUNT don't face directly. Understanding these nuances is essential for financial calculations and high-precision analysis.
12345678910111213141516171819
-- Integer division trap (some databases)-- If salary is INTEGER, division might truncateSELECT AVG(salary) FROM employees; -- Usually returns DECIMAL/FLOAT -- Explicit precision controlSELECT ROUND(AVG(salary), 2) AS avg_salary_roundedFROM employees; -- Cast for specific precisionSELECT CAST(AVG(salary) AS DECIMAL(15,2)) AS avg_salary_preciseFROM employees; -- Compare different precision approachesSELECT AVG(price) AS default_avg, ROUND(AVG(price), 4) AS rounded_4, TRUNC(AVG(price), 2) AS truncated_2, CAST(AVG(price) AS DECIMAL(10,2)) AS cast_decimalFROM products;Floating-point precision issues:
When averaging floating-point numbers, small precision errors can accumulate:
AVG(0.1, 0.2) might return 0.15000000000000002 instead of exactly 0.15For financial data, use DECIMAL/NUMERIC types and round appropriately.
123456789101112131415161718192021222324
-- Edge case: Single valueSELECT AVG(salary) FROM employees WHERE id = 1;-- Returns that one salary value (average of one = itself) -- Edge case: All same valuesSELECT AVG(salary) FROM employees WHERE salary = 50000;-- Returns 50000 (average of identical values = that value) -- Edge case: Empty setSELECT AVG(salary) FROM employees WHERE 1 = 0;-- Returns NULL -- Edge case: Division precisionSELECT 1/3 AS integer_division, -- 0 (integer division) 1.0/3 AS float_division, -- 0.333... CAST(1 AS DECIMAL(10,6))/3 AS decimal_division -- 0.333333; -- Avoiding comparison issues with floating point averages-- DON'T:SELECT * FROM products WHERE AVG(price) OVER () = 19.99;-- DO:SELECT * FROM products WHERE ABS(AVG(price) OVER () - 19.99) < 0.001;Due to floating-point representation, comparing AVG results directly to expected values may fail. Use range comparisons (ABS(val - expected) < epsilon) or ROUND to a fixed precision before comparison.
Let's consolidate the most useful AVG patterns for production use.
12345678910111213141516171819202122232425262728293031323334353637
-- Pattern: Moving average (rolling average)SELECT date, value, AVG(value) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS moving_avg_7dayFROM daily_metrics; -- Pattern: Conditional averageSELECT AVG(CASE WHEN status = 'completed' THEN order_total END) AS avg_completed, AVG(CASE WHEN status = 'cancelled' THEN order_total END) AS avg_cancelledFROM orders; -- PostgreSQL FILTER syntax (cleaner)SELECT AVG(order_total) FILTER (WHERE status = 'completed') AS avg_completed, AVG(order_total) FILTER (WHERE status = 'cancelled') AS avg_cancelledFROM orders; -- Pattern: Average with NULL handling decisionSELECT AVG(rating) AS avg_rated_only, -- Excludes unrated (NULL) AVG(COALESCE(rating, 0)) AS avg_all, -- Treats unrated as 0 AVG(COALESCE(rating, 3)) AS avg_default -- Treats unrated as neutral (3)FROM products; -- Pattern: Percentile comparisonSELECT employee_id, salary, AVG(salary) OVER () AS company_avg, salary - AVG(salary) OVER () AS diff_from_avg, (salary - AVG(salary) OVER ()) / AVG(salary) OVER () * 100 AS pct_diffFROM employees;COALESCE(AVG(col), 0) returns 0 instead of NULL for empty results.HAVING COUNT(*) >= n to ensure statistical validity.If Department A has 10 employees averaging $50K and Department B has 100 employees averaging $80K, the company average is NOT ($50K + $80K) / 2 = $65K. It's (10×$50K + 100×$80K) / 110 = $77.3K. Always use weighted averages when combining group averages!
The AVG function appears simple but carries important subtleties. Let's consolidate the key concepts:
What's next:
We've covered counting, summing, and averaging. Now we'll explore MIN and MAX—functions that find extreme values in datasets. You'll learn about their behavior with NULL, ordering semantics, and use cases including range analysis and boundary detection.
You now understand AVG with precision—its NULL semantics, relationship to SUM/COUNT, weighted average construction, and common pitfalls. These skills enable accurate statistical analysis and reporting in SQL.