Database Management SystemHAVING Clause

HAVING Clause: Filtering Grouped Data

LevelIntermediate

Duration60 mins

TopicHAVING Clause

2 / 5

HAVING vs WHERE

Two Filters, Two Purposes

SQL provides two filtering mechanisms: WHERE and HAVING. On the surface, they seem similar—both eliminate data from query results. But this surface similarity masks a fundamental architectural difference that every database professional must understand deeply.

Confusing WHERE and HAVING leads to three categories of problems:

Syntax errors — Using WHERE where HAVING is required (and vice versa)
Incorrect results — Filters applied at the wrong stage producing wrong data
Performance problems — Filters placed inefficiently, processing more data than necessary

This page eliminates any confusion by examining both clauses from every angle: timing, operands, performance implications, and canonical use cases.

What You Will Learn

By the end of this page, you will have crystal-clear mental models for WHERE and HAVING. You'll know exactly when to use each, understand performance implications of placement, and be able to craft complex queries that leverage both clauses optimally.

The Core Distinction: Rows vs Groups

The fundamental difference between WHERE and HAVING is what they filter:

WHERE filters individual rows before any grouping occurs
HAVING filters entire groups after grouping and aggregation

This isn't a minor implementation detail—it reflects fundamentally different operations on different data structures. Let's visualize this:

Converting Mermaid diagram...

Key observations from this flow:

WHERE executes early — Before grouping happens. It determines which rows get grouped at all.
HAVING executes late — After groups form and aggregates compute. It determines which groups appear in output.
Discarded data differs — WHERE discards rows. HAVING discards groups (which may represent many underlying rows).
Different granularities — At the WHERE stage, data is row-level (individual records). At the HAVING stage, data is group-level (summary records).

The Granularity Shift

Between WHERE and HAVING, a fundamental transformation occurs: the data changes from many rows to few groups. After GROUP BY, each 'row' in your conceptual data stream represents an entire group's summary. HAVING filters at this coarser granularity.

Execution Timing: When Each Clause Runs

Understanding SQL's logical execution order is essential for mastering WHERE vs HAVING. SQL statements are not processed in the order you write them. Instead, they follow a strict logical sequence:

SQL Logical Execution Order
Order	Clause	Purpose	Filter Type Available
1	FROM (+ JOINs)	Identify source tables and combine them	—
2	WHERE	Filter individual rows from source	Row-level conditions
3	GROUP BY	Organize remaining rows into groups	—
4	Aggregate Functions	Compute summaries per group	—
5	HAVING	Filter groups based on aggregates	Aggregate conditions
6	SELECT	Choose and compute output columns	—
7	DISTINCT	Remove duplicate result rows	—
8	ORDER BY	Sort final result set	—
9	LIMIT/OFFSET	Restrict number of rows returned	—

Critical implications of this order:

Execution Order Implications

•WHERE cannot reference aggregates — Aggregates don't exist yet at step 2. The groups haven't formed, so there's nothing to COUNT() or SUM().
•HAVING cannot efficiently filter raw rows — At step 5, rows have already been grouped. 'Ungrouping' to filter individual rows would break the model.
•WHERE affects what gets aggregated — Rows eliminated by WHERE don't contribute to any group's aggregate. This can dramatically change aggregate values.
•HAVING affects what gets displayed — Groups eliminated by HAVING don't appear in output. Their aggregates were computed but discarded.
•SELECT aliases aren't available to HAVING (in standard SQL) — SELECT runs at step 6; HAVING at step 5. Some databases allow this as an extension.

Physical vs Logical Order

This is the logical execution order—how results must behave conceptually. The actual physical execution may differ (query optimizers reorder operations for efficiency). But results must be equivalent to this logical order. Understanding the logical model tells you what's valid SQL.

What Each Clause Can Reference

The timing difference translates directly into what expressions are valid in each clause:

WHERE Can Reference

•Any column from source tables — Raw row data is available
•Constants and literals — WHERE status = 'active'
•Functions of columns — WHERE UPPER(name) = 'JOHN'
•Arithmetic expressions — WHERE price * quantity > 100
•Subqueries returning single values — WHERE id IN (SELECT ...)
•Pattern matching — WHERE name LIKE 'A%'

WHERE Cannot Reference

Aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.) — They don't exist yet. Column aliases from SELECT — SELECT hasn't run yet.

HAVING Can Reference

•Aggregate functions — HAVING COUNT(*) > 5
•Columns in GROUP BY — These have single values per group
•Constants and literals — HAVING SUM(x) > 1000
•Expressions combining the above — HAVING AVG(price) > MIN(price) * 2
•Subqueries returning aggregates — HAVING SUM(x) > (SELECT AVG(y) FROM ...)

HAVING Cannot Reference

Non-grouped, non-aggregated columns — Ambiguous (multiple values per group). SELECT aliases in standard SQL — SELECT hasn't run yet.

The grey area in HAVING:

Interestingly, HAVING can technically filter on GROUP BY columns without aggregates:

SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING department = 'Engineering';  -- Valid but unusual

This works because department has a single value per group. However, this is almost always better written with WHERE:

SELECT department, COUNT(*)
FROM employees
WHERE department = 'Engineering'  -- Better: filters earlier
GROUP BY department;

The WHERE version is more efficient—it eliminates non-Engineering rows before grouping.

Side-by-Side Comparison

Let's consolidate the key differences in a comprehensive comparison table:

WHERE vs HAVING: Complete Comparison
Aspect	WHERE	HAVING
Filters what?	Individual rows	Groups (after aggregation)
Execution phase	Step 2 (before GROUP BY)	Step 5 (after aggregates)
Can use aggregates?	❌ No — don't exist yet	✅ Yes — primary purpose
Can use raw columns?	✅ Yes — all columns available	⚠️ Only if in GROUP BY
Affects aggregates?	✅ Yes — excluded rows don't contribute	❌ No — aggregates already computed
Works without GROUP BY?	✅ Yes — standard row filtering	⚠️ Yes, but treats table as one group
Performance impact	Filters early = less data to process	Filters late = all groups computed first
Use case	Row-level conditions on source data	Aggregate thresholds on grouped data

The Decision Heuristic

Ask yourself: 'Does my condition involve an aggregate function?' If yes → HAVING. If no → WHERE. This simple rule handles 99% of cases correctly.

How WHERE Affects Aggregate Results

A critical but often overlooked point: WHERE changes what gets aggregated. Because WHERE filters rows before grouping, excluded rows don't contribute to any group's aggregate.

Demonstration:

Consider this sales table:

sale_id	product	amount	year
1	Widget	100	2023
2	Widget	150	2023
3	Widget	200	2024
4	Gadget	300	2023
5	Gadget	250	2024

no_where_filter.sql

-- Aggregate all sales by product
SELECT product, SUM(amount) AS total_sales
FROM sales
GROUP BY product;

Result:

product	total_sales
Widget	450
Gadget	550

All 5 rows contribute to their respective groups.

Key insight:

The WHERE clause didn't just remove rows from output—it changed the aggregate values themselves. Widget's total dropped from 450 to 250 because the 2024 sale (200) was excluded before the SUM() was computed.

This is fundamentally different from HAVING, which would filter the group after SUM() was calculated.

Understand the Difference

Filtering with WHERE year = 2023 gives '2023 sales by product.' A HAVING on year (if possible) would give 'all sales for products active in 2023'—a completely different meaning. The placement of your filter fundamentally changes query semantics.

Performance Implications: Filter Early When Possible

The placement of filters has significant performance implications. The general principle: filter as early as possible.

Why WHERE is often more efficient:

Performance Considerations

•Less data to group — WHERE reduces the row count before GROUP BY. Fewer rows = smaller groups = faster aggregation.
•Index usage — WHERE conditions often use indexes for efficient filtering. HAVING runs after aggregation, where indexes on source data don't help.
•Memory efficiency — Grouping and aggregating fewer rows requires less working memory for intermediate results.
•I/O reduction — If WHERE eliminates rows early, those rows never need to be read from disk (with proper indexing).

Anti-pattern to avoid:

inefficient_having.sql
1
2
3
4
5
6
7
8
9
10
11
-- ❌ INEFFICIENT: Using HAVING for row-level filter
SELECT region, SUM(sales) AS total_sales
FROM sales_data
GROUP BY region
HAVING region = 'North';  -- This could be in WHERE!
 
-- ✓ EFFICIENT: Using WHERE for row-level filter
SELECT region, SUM(sales) AS total_sales
FROM sales_data
WHERE region = 'North'    -- Filters early, reduces grouping work
GROUP BY region;

Both queries produce identical results, but the WHERE version:

Can use an index on region to avoid scanning the entire table
Groups only 'North' rows instead of all rows
Allocates memory for one group instead of many

For a table with millions of rows and hundreds of regions, this difference can be orders of magnitude.

The Optimizer May Help—But Don't Rely on It

Modern query optimizers sometimes recognize that a HAVING condition on a GROUP BY column can be pushed down to WHERE. But this optimization isn't guaranteed. Writing correct, efficient SQL from the start is better than hoping the optimizer fixes your mistakes.

WHERE and HAVING Working Together

The most powerful analytical queries use both WHERE and HAVING, each serving its proper role. WHERE handles row-level business rules; HAVING enforces aggregate thresholds.

Example: Customer Segmentation Analysis

Find high-value customers in the last year who made at least 5 purchases with average order value over $200:

combined_where_having.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(order_total) AS total_spent,
    AVG(order_total) AS avg_order_value
FROM orders
-- WHERE: Row-level filters (applied before grouping)
WHERE 
    order_date >= CURRENT_DATE - INTERVAL '1 year'  -- Only recent orders
    AND status = 'completed'                         -- Only successful orders
    AND store_region = 'East Coast'                  -- Specific region
-- GROUP BY: Form customer groups
GROUP BY customer_id
-- HAVING: Aggregate thresholds (applied after grouping)
HAVING 
    COUNT(*) >= 5                   -- At least 5 orders
    AND AVG(order_total) > 200      -- Average order > $200
-- ORDER BY: Sort for presentation
ORDER BY total_spent DESC
LIMIT 100;

Execution walkthrough:

FROM: Start with the orders table
WHERE: Keep only rows that are:
- From the last year
- Completed (not cancelled or pending)
- From the East Coast region
GROUP BY: Group remaining rows by customer_id
Aggregates: For each customer group, compute:
- COUNT(*) of their orders
- SUM(order_total) of their spending
- AVG(order_total) of their order values
HAVING: Keep only groups where:
- The customer placed 5+ orders
- Their average order exceeds $200
SELECT: Output the columns we specified
ORDER BY: Sort by total spending (descending)
LIMIT: Return top 100 customers

Each Filter Serves Its Purpose

Moving any WHERE condition to HAVING would change the query's meaning or efficiency. The date, status, and region filters define which orders count toward aggregates. The HAVING conditions define which customers qualify based on their aggregate behavior. Both are essential; neither can substitute for the other.

Common Errors and How to Fix Them

Let's examine the most common errors when using WHERE and HAVING, with clear explanations of why they occur and how to fix them:

error_aggregate_in_where.sql
1
2
3
4
5
6
7
-- ❌ ERROR: Invalid use of aggregate function in WHERE
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 50000  -- Cannot use aggregate here!
GROUP BY department;
 
-- ERROR: "aggregate functions are not allowed in WHERE"

Why it fails: WHERE executes before GROUP BY. No groups exist yet, so AVG(salary) can't be computed.

Fix: Move the condition to HAVING:

fix_1.sql
1
2
3
4
5
-- ✓ CORRECT: Use HAVING for aggregate conditions
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

Summary: WHERE vs HAVING

The WHERE and HAVING clauses serve distinct, complementary purposes in SQL. Mastering their differences is essential for writing correct, efficient queries.

Key Takeaways

•WHERE filters rows before grouping — Use for row-level conditions
•HAVING filters groups after aggregation — Use for aggregate conditions
•WHERE cannot use aggregates — They don't exist at WHERE's execution stage
•HAVING should only use group-level data — Aggregates or GROUP BY columns
•WHERE affects aggregate values — Excluded rows don't contribute to any group
•HAVING affects which groups appear — Aggregates are computed, then groups are filtered
•Filter early when possible — WHERE is generally more efficient than HAVING for equivalent conditions
•Both clauses work together — Complex queries use WHERE for row conditions and HAVING for aggregate thresholds

What's next:

With a solid understanding of HAVING vs WHERE, we'll explore the types of conditions you can express in HAVING—from simple comparisons to complex multi-aggregate expressions that enable sophisticated data analysis.

Page Complete

You now have a clear mental model distinguishing WHERE from HAVING. You understand their execution timing, valid references, performance implications, and how they cooperate in complex queries. The next page explores the rich variety of aggregate conditions possible with HAVING.

2 / 5

Loading learning content...

Database Management SystemHAVING Clause

HAVING Clause: Filtering Grouped Data

LevelIntermediate

Duration60 mins

TopicHAVING Clause

2 / 5

HAVING vs WHERE

Two Filters, Two Purposes

Confusing WHERE and HAVING leads to three categories of problems:

Syntax errors — Using WHERE where HAVING is required (and vice versa)
Incorrect results — Filters applied at the wrong stage producing wrong data
Performance problems — Filters placed inefficiently, processing more data than necessary

This page eliminates any confusion by examining both clauses from every angle: timing, operands, performance implications, and canonical use cases.

What You Will Learn

The Core Distinction: Rows vs Groups

The fundamental difference between WHERE and HAVING is what they filter:

WHERE filters individual rows before any grouping occurs
HAVING filters entire groups after grouping and aggregation

This isn't a minor implementation detail—it reflects fundamentally different operations on different data structures. Let's visualize this:

Converting Mermaid diagram...

Key observations from this flow:

WHERE executes early — Before grouping happens. It determines which rows get grouped at all.
HAVING executes late — After groups form and aggregates compute. It determines which groups appear in output.
Discarded data differs — WHERE discards rows. HAVING discards groups (which may represent many underlying rows).
Different granularities — At the WHERE stage, data is row-level (individual records). At the HAVING stage, data is group-level (summary records).

The Granularity Shift

Execution Timing: When Each Clause Runs

Understanding SQL's logical execution order is essential for mastering WHERE vs HAVING. SQL statements are not processed in the order you write them. Instead, they follow a strict logical sequence:

SQL Logical Execution Order
Order	Clause	Purpose	Filter Type Available
1	FROM (+ JOINs)	Identify source tables and combine them	—
2	WHERE	Filter individual rows from source	Row-level conditions
3	GROUP BY	Organize remaining rows into groups	—
4	Aggregate Functions	Compute summaries per group	—
5	HAVING	Filter groups based on aggregates	Aggregate conditions
6	SELECT	Choose and compute output columns	—
7	DISTINCT	Remove duplicate result rows	—
8	ORDER BY	Sort final result set	—
9	LIMIT/OFFSET	Restrict number of rows returned	—

Critical implications of this order:

Execution Order Implications

•WHERE cannot reference aggregates — Aggregates don't exist yet at step 2. The groups haven't formed, so there's nothing to COUNT() or SUM().
•HAVING cannot efficiently filter raw rows — At step 5, rows have already been grouped. 'Ungrouping' to filter individual rows would break the model.
•WHERE affects what gets aggregated — Rows eliminated by WHERE don't contribute to any group's aggregate. This can dramatically change aggregate values.
•HAVING affects what gets displayed — Groups eliminated by HAVING don't appear in output. Their aggregates were computed but discarded.
•SELECT aliases aren't available to HAVING (in standard SQL) — SELECT runs at step 6; HAVING at step 5. Some databases allow this as an extension.

Physical vs Logical Order

What Each Clause Can Reference

The timing difference translates directly into what expressions are valid in each clause:

WHERE Can Reference

•Any column from source tables — Raw row data is available
•Constants and literals — WHERE status = 'active'
•Functions of columns — WHERE UPPER(name) = 'JOHN'
•Arithmetic expressions — WHERE price * quantity > 100
•Subqueries returning single values — WHERE id IN (SELECT ...)
•Pattern matching — WHERE name LIKE 'A%'

WHERE Cannot Reference

Aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.) — They don't exist yet. Column aliases from SELECT — SELECT hasn't run yet.

HAVING Can Reference

•Aggregate functions — HAVING COUNT(*) > 5
•Columns in GROUP BY — These have single values per group
•Constants and literals — HAVING SUM(x) > 1000
•Expressions combining the above — HAVING AVG(price) > MIN(price) * 2
•Subqueries returning aggregates — HAVING SUM(x) > (SELECT AVG(y) FROM ...)

HAVING Cannot Reference

Non-grouped, non-aggregated columns — Ambiguous (multiple values per group). SELECT aliases in standard SQL — SELECT hasn't run yet.

The grey area in HAVING:

Interestingly, HAVING can technically filter on GROUP BY columns without aggregates:

SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING department = 'Engineering';  -- Valid but unusual

This works because department has a single value per group. However, this is almost always better written with WHERE:

SELECT department, COUNT(*)
FROM employees
WHERE department = 'Engineering'  -- Better: filters earlier
GROUP BY department;

The WHERE version is more efficient—it eliminates non-Engineering rows before grouping.

Side-by-Side Comparison

Let's consolidate the key differences in a comprehensive comparison table:

WHERE vs HAVING: Complete Comparison
Aspect	WHERE	HAVING
Filters what?	Individual rows	Groups (after aggregation)
Execution phase	Step 2 (before GROUP BY)	Step 5 (after aggregates)
Can use aggregates?	❌ No — don't exist yet	✅ Yes — primary purpose
Can use raw columns?	✅ Yes — all columns available	⚠️ Only if in GROUP BY
Affects aggregates?	✅ Yes — excluded rows don't contribute	❌ No — aggregates already computed
Works without GROUP BY?	✅ Yes — standard row filtering	⚠️ Yes, but treats table as one group
Performance impact	Filters early = less data to process	Filters late = all groups computed first
Use case	Row-level conditions on source data	Aggregate thresholds on grouped data

The Decision Heuristic

Ask yourself: 'Does my condition involve an aggregate function?' If yes → HAVING. If no → WHERE. This simple rule handles 99% of cases correctly.

How WHERE Affects Aggregate Results

A critical but often overlooked point: WHERE changes what gets aggregated. Because WHERE filters rows before grouping, excluded rows don't contribute to any group's aggregate.

Demonstration:

Consider this sales table:

sale_id	product	amount	year
1	Widget	100	2023
2	Widget	150	2023
3	Widget	200	2024
4	Gadget	300	2023
5	Gadget	250	2024

no_where_filter.sql

-- Aggregate all sales by product
SELECT product, SUM(amount) AS total_sales
FROM sales
GROUP BY product;

Result:

product	total_sales
Widget	450
Gadget	550

All 5 rows contribute to their respective groups.

Key insight:

This is fundamentally different from HAVING, which would filter the group after SUM() was calculated.

Understand the Difference

Performance Implications: Filter Early When Possible

The placement of filters has significant performance implications. The general principle: filter as early as possible.

Why WHERE is often more efficient:

Performance Considerations

•Less data to group — WHERE reduces the row count before GROUP BY. Fewer rows = smaller groups = faster aggregation.
•Index usage — WHERE conditions often use indexes for efficient filtering. HAVING runs after aggregation, where indexes on source data don't help.
•Memory efficiency — Grouping and aggregating fewer rows requires less working memory for intermediate results.
•I/O reduction — If WHERE eliminates rows early, those rows never need to be read from disk (with proper indexing).

Anti-pattern to avoid:

inefficient_having.sql
1
2
3
4
5
6
7
8
9
10
11
-- ❌ INEFFICIENT: Using HAVING for row-level filter
SELECT region, SUM(sales) AS total_sales
FROM sales_data
GROUP BY region
HAVING region = 'North';  -- This could be in WHERE!
 
-- ✓ EFFICIENT: Using WHERE for row-level filter
SELECT region, SUM(sales) AS total_sales
FROM sales_data
WHERE region = 'North'    -- Filters early, reduces grouping work
GROUP BY region;

Both queries produce identical results, but the WHERE version:

Can use an index on region to avoid scanning the entire table
Groups only 'North' rows instead of all rows
Allocates memory for one group instead of many

For a table with millions of rows and hundreds of regions, this difference can be orders of magnitude.

The Optimizer May Help—But Don't Rely on It

WHERE and HAVING Working Together

The most powerful analytical queries use both WHERE and HAVING, each serving its proper role. WHERE handles row-level business rules; HAVING enforces aggregate thresholds.

Example: Customer Segmentation Analysis

Find high-value customers in the last year who made at least 5 purchases with average order value over $200:

combined_where_having.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(order_total) AS total_spent,
    AVG(order_total) AS avg_order_value
FROM orders
-- WHERE: Row-level filters (applied before grouping)
WHERE 
    order_date >= CURRENT_DATE - INTERVAL '1 year'  -- Only recent orders
    AND status = 'completed'                         -- Only successful orders
    AND store_region = 'East Coast'                  -- Specific region
-- GROUP BY: Form customer groups
GROUP BY customer_id
-- HAVING: Aggregate thresholds (applied after grouping)
HAVING 
    COUNT(*) >= 5                   -- At least 5 orders
    AND AVG(order_total) > 200      -- Average order > $200
-- ORDER BY: Sort for presentation
ORDER BY total_spent DESC
LIMIT 100;

Execution walkthrough:

FROM: Start with the orders table
WHERE: Keep only rows that are:
- From the last year
- Completed (not cancelled or pending)
- From the East Coast region
GROUP BY: Group remaining rows by customer_id
Aggregates: For each customer group, compute:
- COUNT(*) of their orders
- SUM(order_total) of their spending
- AVG(order_total) of their order values
HAVING: Keep only groups where:
- The customer placed 5+ orders
- Their average order exceeds $200
SELECT: Output the columns we specified
ORDER BY: Sort by total spending (descending)
LIMIT: Return top 100 customers

Each Filter Serves Its Purpose

Common Errors and How to Fix Them

Let's examine the most common errors when using WHERE and HAVING, with clear explanations of why they occur and how to fix them:

error_aggregate_in_where.sql
1
2
3
4
5
6
7
-- ❌ ERROR: Invalid use of aggregate function in WHERE
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 50000  -- Cannot use aggregate here!
GROUP BY department;
 
-- ERROR: "aggregate functions are not allowed in WHERE"

Why it fails: WHERE executes before GROUP BY. No groups exist yet, so AVG(salary) can't be computed.

Fix: Move the condition to HAVING:

fix_1.sql
1
2
3
4
5
-- ✓ CORRECT: Use HAVING for aggregate conditions
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

Summary: WHERE vs HAVING

The WHERE and HAVING clauses serve distinct, complementary purposes in SQL. Mastering their differences is essential for writing correct, efficient queries.

Key Takeaways

•WHERE filters rows before grouping — Use for row-level conditions
•HAVING filters groups after aggregation — Use for aggregate conditions
•WHERE cannot use aggregates — They don't exist at WHERE's execution stage
•HAVING should only use group-level data — Aggregates or GROUP BY columns
•WHERE affects aggregate values — Excluded rows don't contribute to any group
•HAVING affects which groups appear — Aggregates are computed, then groups are filtered
•Filter early when possible — WHERE is generally more efficient than HAVING for equivalent conditions
•Both clauses work together — Complex queries use WHERE for row conditions and HAVING for aggregate thresholds

What's next:

Page Complete

2 / 5