Database Management SystemHAVING Clause

HAVING Clause: Filtering Grouped Data

LevelIntermediate

Duration60 mins

TopicHAVING Clause

5 / 5

Order of Execution

The Hidden Sequence Behind Every Query

SQL is declarative—you describe what you want, not how to get it. But beneath this abstraction lies a precise logical sequence that determines what's valid, what's accessible, and what behaves correctly.

Understanding this execution order is essential for mastering HAVING. It explains:

Why HAVING can use aggregates but WHERE cannot
Why SELECT aliases aren't available to HAVING (in standard SQL)
Why moving conditions between WHERE and HAVING changes query behavior
How to predict which clauses can reference which columns

This page provides the complete picture of SQL's logical execution order, with special attention to HAVING's position and its consequences.

What You Will Learn

This page covers: the complete SQL logical execution order, what 'scope' each clause has access to, how data transforms at each stage, HAVING's specific position and implications, the difference between logical and physical execution, and practical examples demonstrating execution order effects.

The Complete Logical Execution Order

SQL clauses execute in a specific logical sequence, regardless of how they're written in your query. This order determines what each clause can 'see' and reference.

The standard logical execution order:

SQL Logical Execution Order
Step	Clause	Description	Data State After
1	FROM	Identify source table(s)	All rows from source table(s)
2	JOIN	Combine tables based on join conditions	Combined rows from all joined tables
3	WHERE	Filter individual rows	Subset of rows meeting conditions
4	GROUP BY	Organize rows into groups	Groups of rows, each group shares grouping key
5	Aggregates	Compute summary values per group	Each group has computed aggregate values
6	HAVING	Filter groups based on aggregate conditions	Subset of groups meeting conditions
7	SELECT	Choose/compute output columns and aliases	Result columns defined
8	DISTINCT	Remove duplicate result rows	Unique result rows
9	ORDER BY	Sort result set	Sorted result set
10	LIMIT/OFFSET	Restrict number of rows returned	Final result set

Visualizing the transformation flow:

Converting Mermaid diagram...

Written Order vs Execution Order

SQL syntax writes clauses as: SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY. But execution order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. This disconnect is a common source of confusion.

What Each Clause Can Access

Each clause can only reference what exists at its execution stage. This 'scope' determines valid expressions:

Scope rules by clause:

Clause Scope Reference
Clause	Can Access	Cannot Access
FROM/JOIN	Table names, ON conditions using table columns	Everything else (WHERE, GROUP BY haven't run)
WHERE	All columns from FROM/JOIN, subqueries	Aggregates, GROUP BY columns (not yet formed), SELECT aliases
GROUP BY	All columns from non-filtered rows, expressions	Aggregates (computed after), SELECT aliases
HAVING	Aggregates, GROUP BY columns, subqueries	Non-grouped non-aggregated columns, SELECT aliases*
SELECT	Aggregates, GROUP BY columns, expressions	Columns not in GROUP BY (unless aggregated)
ORDER BY	SELECT columns/aliases, GROUP BY columns, aggregates	Non-selected, non-grouped columns in most databases

*Some databases (MySQL, SQL Server) allow SELECT aliases in HAVING as an extension.

Why these rules exist:

The rules aren't arbitrary—they follow from the data state at each stage:

WHERE can't use aggregates: At step 3, rows haven't been grouped yet. There's nothing to aggregate.
HAVING can't use non-grouped columns: At step 6, data is group-level. Individual row values are gone (except those preserved in GROUP BY).
SELECT aliases aren't in HAVING: At step 6, SELECT (step 7) hasn't executed. Aliases don't exist yet.

scope_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- ❌ WHERE can't use aggregate (doesn't exist at step 3)
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 50000  -- ERROR!
GROUP BY department;
 
-- ❌ HAVING can't use non-grouped column (ambiguous at step 6)
SELECT department, MAX(salary)
FROM employees
GROUP BY department
HAVING employee_name = 'John';  -- ERROR! employee_name not in GROUP BY
 
-- ❌ SELECT can't include non-grouped, non-aggregated column
SELECT department, employee_name, AVG(salary)  -- ERROR! employee_name
FROM employees
GROUP BY department;
 
-- ⚠️ HAVING using SELECT alias (non-standard, works in some databases)
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
HAVING avg_sal > 50000;  -- May work in MySQL, errors in PostgreSQL
 
-- ✓ ORDER BY can use SELECT aliases (exception: runs after SELECT)
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
ORDER BY avg_sal DESC;  -- Works because ORDER BY runs after SELECT

HAVING's Position: After Groups, Before Output

HAVING occupies a critical position: after GROUP BY and aggregation, but before SELECT. This placement has specific implications:

What exists when HAVING runs:

Available at HAVING Stage

•Groups are formed — Each 'row' in the data stream represents a group
•Aggregates are computed — SUM, COUNT, AVG, etc. have been calculated for each group
•GROUP BY columns have single values per group — Safe to reference
•Individual row data is gone — Collapsed into groups, except GROUP BY keys
•Subqueries can be evaluated — Scalar subqueries return values for comparison

NOT Available at HAVING Stage

•SELECT aliases — SELECT hasn't executed yet (step 7)
•Non-grouped row columns — Individual row values are aggregated away
•Columns from tables not in FROM/JOIN — Never were available
•Results of DISTINCT, ORDER BY, LIMIT — Haven't executed (steps 8-10)

Practical demonstration: Trace through a query

Let's trace execution step by step:

execution_trace.sql
1
2
3
4
5
6
7
8
-- The Query
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department
HAVING COUNT(*) >= 5 AND AVG(salary) > 60000
ORDER BY avg_salary DESC
LIMIT 10;

Step-by-Step Execution Trace
Step	Clause	What Happens	Data Example
1	FROM	Load employees table	1000 rows, all columns
2	WHERE	Keep hire_date > 2020-01-01	400 rows remain
3	GROUP BY	Group by department	15 groups (e.g., Engineering: 50 rows, Sales: 30 rows...)
4	Aggregates	Compute COUNT(*), AVG(salary) per group	Each group now has: {dept, count, avg}
5	HAVING	Keep groups where COUNT(*)≥5 AND AVG>60000	8 groups remain
6	SELECT	Output department, emp_count, avg_salary columns	8 rows with 3 columns
7	ORDER BY	Sort by avg_salary DESC	8 sorted rows
8	LIMIT	Return top 10 (only 8 exist)	8 final rows

HAVING Sees Groups, Not Rows

At HAVING's execution stage, the 400 rows that passed WHERE have been collapsed into 15 groups. HAVING doesn't see the original 400 rows—it sees 15 groups, each with computed aggregate values. It then reduces 15 to 8.

Logical vs Physical Execution Order

An important distinction: the logical execution order describes what the results must behave like. The physical execution order is what the database actually does internally—which may differ significantly.

Why they can differ:

Query optimizers reorder operations for efficiency while guaranteeing logically equivalent results. Common optimizations:

Common Query Optimizer Transformations

•Predicate pushdown — WHERE conditions pushed into JOINs or even to storage layer
•Join reordering — Tables joined in optimal order regardless of FROM clause order
•Index-only scans — Skip actual table access if index contains needed columns
•Early aggregation — Aggregate before joining when possible
•LIMIT pushdown — Apply LIMIT early to avoid processing unnecessary rows
•Subquery flattening — Convert subqueries to JOINs

Example: HAVING condition pushed down

Some optimizers can push HAVING conditions on GROUP BY columns down to WHERE:

optimization_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Original query
SELECT region, SUM(sales)
FROM orders
GROUP BY region
HAVING region = 'East';
-- Logical execution: Group ALL regions, then filter
 
-- Optimized internal execution (equivalent results):
SELECT region, SUM(sales)
FROM orders
WHERE region = 'East'  -- Pushed down!
GROUP BY region;
-- Physical execution: Filter FIRST, then group only 'East'
 
-- The results are identical, but optimized version:
-- 1. Can use index on 'region'
-- 2. Groups 1 region instead of many
-- 3. Uses less memory for grouping

Don't Rely on Optimizer Fixes

While optimizers may fix inefficient queries, they don't always succeed. Write correct, efficient SQL from the start. Understanding logical execution order helps you place conditions correctly without hoping the optimizer will compensate.

Why logical order still matters:

Even though physical execution may differ:

Validity is determined by logical order — What's syntactically allowed in each clause
Results must match logical order — Optimizer can only make equivalent transformations
Query writing uses logical order — You design queries thinking in this sequence
Debugging uses logical order — Understanding how data transforms through stages

Execution Order Effects on HAVING

Let's explore specific consequences of HAVING's position in execution order:

Effect 1: WHERE changes what HAVING sees

Because WHERE filters rows before grouping, it changes the aggregate values that HAVING evaluates.

where_affects_having.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Query A: No WHERE filter
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000;
-- Returns: departments where overall average > $70,000
 
-- Query B: WITH WHERE filter
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE experience_years > 5  -- Only count senior employees!
GROUP BY department
HAVING AVG(salary) > 70000;
-- Returns: departments where senior employees' average > $70,000
-- HAVING sees DIFFERENT aggregate values because WHERE filtered first
 
-- These queries answer DIFFERENT questions!
-- A: "Which departments have high average salary?"
-- B: "Which departments have high average salary among seniors?"

Effect 2: HAVING can't undo WHERE

Once WHERE filters out rows, they're gone. HAVING cannot bring them back.

having_cant_undo_where.sql
1
2
3
4
5
6
7
8
-- This pattern is WRONG if you want all employees considered
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE department = 'Engineering'  -- Only Engineering rows remain
GROUP BY department
HAVING department IN ('Engineering', 'Sales', 'Marketing');
-- HAVING condition on department is pointless here!
-- Only 'Engineering' exists after WHERE. Sales/Marketing already filtered out.

Effect 3: Aggregates are fixed when HAVING runs

HAVING cannot change aggregate values—only decide whether to keep or discard groups based on those values.

aggregates_fixed.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- HAVING filters groups but doesn't recalculate aggregates
SELECT 
    department,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;
 
-- The avg_salary shown for each surviving group is the SAME as it would be
-- without HAVING. HAVING just removes groups; it doesn't change aggregates.
-- Groups with COUNT < 5 are removed, but the remaining groups' 
-- COUNT and AVG values are unchanged.

Aggregates Are Computed, Then Filtered

Think of it this way: By the time HAVING runs, every group has its aggregates fully computed. HAVING is simply a gatekeeper that checks these pre-computed values. It's like a bouncer checking IDs at the door—they don't change anyone's age, just decide who gets in.

The SELECT Alias Accessibility Issue

One of the most frequently encountered execution order issues: can HAVING reference column aliases defined in SELECT?

Standard SQL answer: No. HAVING executes at step 6; SELECT at step 7. Aliases don't exist yet.

Practical reality: It depends on your database.

SELECT Alias in HAVING: Database Support
Database	Alias in HAVING?	Notes
PostgreSQL	❌ No	Strictly follows standard SQL
MySQL	✅ Yes	Extension: allows SELECT aliases in HAVING
SQL Server	❌ No	Standard behavior; use full expression
Oracle	❌ No	Standard behavior
SQLite	✅ Yes	Permissive; allows aliases
MariaDB	✅ Yes	MySQL-compatible extension

alias_portability.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- ⚠️ NON-PORTABLE: Works in MySQL/SQLite, fails in PostgreSQL/SQL Server
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING emp_count >= 5;  -- Using SELECT alias
 
-- ✓ PORTABLE: Repeat the aggregate expression
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;  -- Using the expression directly
 
-- ✓ PORTABLE: Use a CTE to create intermediate table with aliases
WITH dept_counts AS (
    SELECT department, COUNT(*) AS emp_count
    FROM employees
    GROUP BY department
)
SELECT department, emp_count
FROM dept_counts
WHERE emp_count >= 5;  -- Now aliased column exists!

Best Practice: Repeat Expressions

For maximum portability, repeat the aggregate expression in HAVING rather than relying on aliases. Yes, it's a bit verbose, but it works everywhere. For complex expressions, use CTEs to pre-compute values, then filter with simple WHERE.

The ORDER BY Anomaly: The Exception to the Rule

Unlike HAVING, ORDER BY can reference SELECT aliases in most databases. This is because ORDER BY executes after SELECT.

Why this matters:

It creates an asymmetry that confuses developers. The same alias that fails in HAVING works fine in ORDER BY.

order_by_aliases.sql
1
2
3
4
5
6
7
-- HAVING fails, ORDER BY succeeds (in standard-compliant databases)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
-- HAVING avg_salary > 50000;  -- ❌ Would fail in PostgreSQL
HAVING AVG(salary) > 50000     -- ✓ Works: repeat expression
ORDER BY avg_salary DESC;       -- ✓ Works: alias available after SELECT

ORDER BY can also access more:

In some databases, ORDER BY can reference columns from the source tables even if they're not in SELECT (though this is implementation-dependent):

order_by_flexibility.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Some databases allow ordering by non-selected columns
-- when not using aggregation
SELECT employee_id, name
FROM employees
ORDER BY hire_date;  -- hire_date not in SELECT but may work
 
-- With GROUP BY, only grouped/aggregated columns available
SELECT department, COUNT(*)
FROM employees
GROUP BY department
ORDER BY department;             -- ✓ GROUP BY column
-- ORDER BY employee_name;        -- ❌ Not grouped, not aggregated
-- ORDER BY MAX(salary);          -- ✓ Aggregate (computed during grouping)

Execution Order Explains the Difference

HAVING runs at step 6, before SELECT (step 7), so aliases don't exist yet. ORDER BY runs at step 9, after SELECT, so aliases have been created. This is why the same alias works in ORDER BY but not HAVING.

Complete Execution Order Reference

Here's a comprehensive reference for SQL execution order, summarizing what each clause can access and what it produces:

Complete Clause Reference
Order	Clause	Input	Output	Can Reference
1	FROM	Table names	All rows from table(s)	Table columns, literals
2	JOIN	Two+ tables	Combined rows	Columns from all joined tables
3	WHERE	Joined rows	Filtered rows	All table columns, subqueries, NO aggregates
4	GROUP BY	Filtered rows	Groups of rows	All columns, expressions, NOT aliases
5	Aggregates	Groups	Groups with aggregates	Columns within each group
6	HAVING	Groups with aggregates	Filtered groups	GROUP BY cols, aggregates, subqueries, NOT aliases
7	SELECT	Filtered groups	Result columns	GROUP BY cols, aggregates, expressions
8	DISTINCT	Result rows	Unique rows	Result columns
9	ORDER BY	Unique rows	Sorted rows	SELECT cols/aliases, GROUP BY cols, aggregates
10	LIMIT	Sorted rows	Limited rows	Integer count, offset

Memorize the Key Points

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. WHERE filters rows, HAVING filters groups. Aggregates exist between GROUP BY and HAVING. ORDER BY is special—it runs after SELECT so can use aliases.

Practical Implications for Query Writing

Understanding execution order leads to better queries. Here are key practical implications:

Implication 1: Filter early when possible

Since WHERE runs before GROUP BY, filtering early reduces the data that needs to be grouped and aggregated:

filter_early.sql
1
2
3
4
5
6
7
8
9
10
-- ✓ GOOD: WHERE filters before grouping
SELECT department, AVG(salary)
FROM employees
WHERE hire_date > '2020-01-01'  -- 60% of rows eliminated early
GROUP BY department
HAVING AVG(salary) > 50000;
 
-- ✗ BAD: All rows grouped, then filter (if it were possible)
-- Hypothetically, if all data were grouped first, then filtered by date
-- You'd aggregate more data than necessary

Implication 2: Use HAVING only for aggregates

If a condition can go in WHERE, put it there. HAVING should be reserved for conditions that require aggregates:

having_for_aggregates.sql
1
2
3
4
5
6
7
8
9
10
11
-- ✗ WORKS but INEFFICIENT
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING department IN ('Engineering', 'Sales');  -- Could be WHERE!
 
-- ✓ EFFICIENT: Same result, better performance
SELECT department, COUNT(*)
FROM employees
WHERE department IN ('Engineering', 'Sales')  -- Filters first
GROUP BY department;

Implication 3: Repeat aggregate expressions in HAVING for portability

Don't rely on SELECT aliases in HAVING. Repeat the aggregate or use CTEs:

repeat_aggregates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- ✓ PORTABLE: Repeat aggregate in HAVING
SELECT 
    department, 
    SUM(CASE WHEN status='active' THEN salary ELSE 0 END) AS active_payroll
FROM employees
GROUP BY department
HAVING SUM(CASE WHEN status='active' THEN salary ELSE 0 END) > 100000;
 
-- ✓ CLEANER: Use CTE for complex expressions
WITH dept_payroll AS (
    SELECT 
        department,
        SUM(CASE WHEN status='active' THEN salary ELSE 0 END) AS active_payroll
    FROM employees
    GROUP BY department
)
SELECT * FROM dept_payroll
WHERE active_payroll > 100000;

Implication 4: Think in stages when debugging

When queries don't behave as expected, trace through the execution stages:

What rows does FROM produce?
Which rows survive WHERE?
How are rows grouped by GROUP BY?
What aggregates are computed?
Which groups survive HAVING?
What does SELECT output?

This systematic approach usually reveals where expectations diverge from reality.

Summary: Order of Execution

SQL's logical execution order underpins correct query writing. Understanding HAVING's position in this order is essential for masterful SQL. Let's consolidate:

Key Takeaways

•Logical order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
•WHERE filters rows — Executes before grouping, can't use aggregates
•HAVING filters groups — Executes after aggregation, designed for aggregate conditions
•Scope follows execution — Each clause can only access what exists at its stage
•SELECT aliases aren't in HAVING (standard SQL) — SELECT runs after HAVING
•ORDER BY is special — Runs after SELECT, so can use aliases
•Physical execution may differ — Optimizers reorder for efficiency, but results obey logical order
•Filter early — WHERE conditions are more efficient than equivalent HAVING conditions

Module Complete: HAVING Clause

Congratulations! You've completed the HAVING Clause module. You now understand: why HAVING exists (filtering groups), how it differs from WHERE, what aggregate conditions you can express, how to build complex multi-condition filters, and where HAVING fits in SQL's execution order. These skills enable sophisticated data analysis with SQL's aggregation and grouping capabilities.

Module recap:

Page 1 (Filtering Groups): Why group-level filtering requires HAVING, not WHERE
Page 2 (HAVING vs WHERE): Comprehensive comparison of the two filtering mechanisms
Page 3 (Aggregate Conditions): The full vocabulary of aggregate expressions in HAVING
Page 4 (Complex Filters): Multi-condition HAVING with logical operators
Page 5 (Order of Execution): HAVING's position and what it can access

With these foundations, you're ready to tackle the next modules on window functions—which provide even more powerful analytical capabilities building on your aggregation knowledge.

5 / 5

Loading learning content...

Database Management SystemHAVING Clause

HAVING Clause: Filtering Grouped Data

LevelIntermediate

Duration60 mins

TopicHAVING Clause

5 / 5

Order of Execution

The Hidden Sequence Behind Every Query

Understanding this execution order is essential for mastering HAVING. It explains:

Why HAVING can use aggregates but WHERE cannot
Why SELECT aliases aren't available to HAVING (in standard SQL)
Why moving conditions between WHERE and HAVING changes query behavior
How to predict which clauses can reference which columns

This page provides the complete picture of SQL's logical execution order, with special attention to HAVING's position and its consequences.

What You Will Learn

The Complete Logical Execution Order

SQL clauses execute in a specific logical sequence, regardless of how they're written in your query. This order determines what each clause can 'see' and reference.

The standard logical execution order:

SQL Logical Execution Order
Step	Clause	Description	Data State After
1	FROM	Identify source table(s)	All rows from source table(s)
2	JOIN	Combine tables based on join conditions	Combined rows from all joined tables
3	WHERE	Filter individual rows	Subset of rows meeting conditions
4	GROUP BY	Organize rows into groups	Groups of rows, each group shares grouping key
5	Aggregates	Compute summary values per group	Each group has computed aggregate values
6	HAVING	Filter groups based on aggregate conditions	Subset of groups meeting conditions
7	SELECT	Choose/compute output columns and aliases	Result columns defined
8	DISTINCT	Remove duplicate result rows	Unique result rows
9	ORDER BY	Sort result set	Sorted result set
10	LIMIT/OFFSET	Restrict number of rows returned	Final result set

Visualizing the transformation flow:

Converting Mermaid diagram...

Written Order vs Execution Order

What Each Clause Can Access

Each clause can only reference what exists at its execution stage. This 'scope' determines valid expressions:

Scope rules by clause:

Clause Scope Reference
Clause	Can Access	Cannot Access
FROM/JOIN	Table names, ON conditions using table columns	Everything else (WHERE, GROUP BY haven't run)
WHERE	All columns from FROM/JOIN, subqueries	Aggregates, GROUP BY columns (not yet formed), SELECT aliases
GROUP BY	All columns from non-filtered rows, expressions	Aggregates (computed after), SELECT aliases
HAVING	Aggregates, GROUP BY columns, subqueries	Non-grouped non-aggregated columns, SELECT aliases*
SELECT	Aggregates, GROUP BY columns, expressions	Columns not in GROUP BY (unless aggregated)
ORDER BY	SELECT columns/aliases, GROUP BY columns, aggregates	Non-selected, non-grouped columns in most databases

*Some databases (MySQL, SQL Server) allow SELECT aliases in HAVING as an extension.

Why these rules exist:

The rules aren't arbitrary—they follow from the data state at each stage:

WHERE can't use aggregates: At step 3, rows haven't been grouped yet. There's nothing to aggregate.
HAVING can't use non-grouped columns: At step 6, data is group-level. Individual row values are gone (except those preserved in GROUP BY).
SELECT aliases aren't in HAVING: At step 6, SELECT (step 7) hasn't executed. Aliases don't exist yet.

scope_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- ❌ WHERE can't use aggregate (doesn't exist at step 3)
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 50000  -- ERROR!
GROUP BY department;
 
-- ❌ HAVING can't use non-grouped column (ambiguous at step 6)
SELECT department, MAX(salary)
FROM employees
GROUP BY department
HAVING employee_name = 'John';  -- ERROR! employee_name not in GROUP BY
 
-- ❌ SELECT can't include non-grouped, non-aggregated column
SELECT department, employee_name, AVG(salary)  -- ERROR! employee_name
FROM employees
GROUP BY department;
 
-- ⚠️ HAVING using SELECT alias (non-standard, works in some databases)
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
HAVING avg_sal > 50000;  -- May work in MySQL, errors in PostgreSQL
 
-- ✓ ORDER BY can use SELECT aliases (exception: runs after SELECT)
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
ORDER BY avg_sal DESC;  -- Works because ORDER BY runs after SELECT

HAVING's Position: After Groups, Before Output

HAVING occupies a critical position: after GROUP BY and aggregation, but before SELECT. This placement has specific implications:

What exists when HAVING runs:

Available at HAVING Stage

•Groups are formed — Each 'row' in the data stream represents a group
•Aggregates are computed — SUM, COUNT, AVG, etc. have been calculated for each group
•GROUP BY columns have single values per group — Safe to reference
•Individual row data is gone — Collapsed into groups, except GROUP BY keys
•Subqueries can be evaluated — Scalar subqueries return values for comparison

NOT Available at HAVING Stage

•SELECT aliases — SELECT hasn't executed yet (step 7)
•Non-grouped row columns — Individual row values are aggregated away
•Columns from tables not in FROM/JOIN — Never were available
•Results of DISTINCT, ORDER BY, LIMIT — Haven't executed (steps 8-10)

Practical demonstration: Trace through a query

Let's trace execution step by step:

execution_trace.sql
1
2
3
4
5
6
7
8
-- The Query
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department
HAVING COUNT(*) >= 5 AND AVG(salary) > 60000
ORDER BY avg_salary DESC
LIMIT 10;

Step-by-Step Execution Trace
Step	Clause	What Happens	Data Example
1	FROM	Load employees table	1000 rows, all columns
2	WHERE	Keep hire_date > 2020-01-01	400 rows remain
3	GROUP BY	Group by department	15 groups (e.g., Engineering: 50 rows, Sales: 30 rows...)
4	Aggregates	Compute COUNT(*), AVG(salary) per group	Each group now has: {dept, count, avg}
5	HAVING	Keep groups where COUNT(*)≥5 AND AVG>60000	8 groups remain
6	SELECT	Output department, emp_count, avg_salary columns	8 rows with 3 columns
7	ORDER BY	Sort by avg_salary DESC	8 sorted rows
8	LIMIT	Return top 10 (only 8 exist)	8 final rows

HAVING Sees Groups, Not Rows

Logical vs Physical Execution Order

Why they can differ:

Query optimizers reorder operations for efficiency while guaranteeing logically equivalent results. Common optimizations:

Common Query Optimizer Transformations

•Predicate pushdown — WHERE conditions pushed into JOINs or even to storage layer
•Join reordering — Tables joined in optimal order regardless of FROM clause order
•Index-only scans — Skip actual table access if index contains needed columns
•Early aggregation — Aggregate before joining when possible
•LIMIT pushdown — Apply LIMIT early to avoid processing unnecessary rows
•Subquery flattening — Convert subqueries to JOINs

Example: HAVING condition pushed down

Some optimizers can push HAVING conditions on GROUP BY columns down to WHERE:

optimization_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Original query
SELECT region, SUM(sales)
FROM orders
GROUP BY region
HAVING region = 'East';
-- Logical execution: Group ALL regions, then filter
 
-- Optimized internal execution (equivalent results):
SELECT region, SUM(sales)
FROM orders
WHERE region = 'East'  -- Pushed down!
GROUP BY region;
-- Physical execution: Filter FIRST, then group only 'East'
 
-- The results are identical, but optimized version:
-- 1. Can use index on 'region'
-- 2. Groups 1 region instead of many
-- 3. Uses less memory for grouping

Don't Rely on Optimizer Fixes

Why logical order still matters:

Even though physical execution may differ:

Validity is determined by logical order — What's syntactically allowed in each clause
Results must match logical order — Optimizer can only make equivalent transformations
Query writing uses logical order — You design queries thinking in this sequence
Debugging uses logical order — Understanding how data transforms through stages

Execution Order Effects on HAVING

Let's explore specific consequences of HAVING's position in execution order:

Effect 1: WHERE changes what HAVING sees

Because WHERE filters rows before grouping, it changes the aggregate values that HAVING evaluates.

where_affects_having.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Query A: No WHERE filter
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000;
-- Returns: departments where overall average > $70,000
 
-- Query B: WITH WHERE filter
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE experience_years > 5  -- Only count senior employees!
GROUP BY department
HAVING AVG(salary) > 70000;
-- Returns: departments where senior employees' average > $70,000
-- HAVING sees DIFFERENT aggregate values because WHERE filtered first
 
-- These queries answer DIFFERENT questions!
-- A: "Which departments have high average salary?"
-- B: "Which departments have high average salary among seniors?"

Effect 2: HAVING can't undo WHERE

Once WHERE filters out rows, they're gone. HAVING cannot bring them back.

having_cant_undo_where.sql
1
2
3
4
5
6
7
8
-- This pattern is WRONG if you want all employees considered
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE department = 'Engineering'  -- Only Engineering rows remain
GROUP BY department
HAVING department IN ('Engineering', 'Sales', 'Marketing');
-- HAVING condition on department is pointless here!
-- Only 'Engineering' exists after WHERE. Sales/Marketing already filtered out.

Effect 3: Aggregates are fixed when HAVING runs

HAVING cannot change aggregate values—only decide whether to keep or discard groups based on those values.

aggregates_fixed.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- HAVING filters groups but doesn't recalculate aggregates
SELECT 
    department,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;
 
-- The avg_salary shown for each surviving group is the SAME as it would be
-- without HAVING. HAVING just removes groups; it doesn't change aggregates.
-- Groups with COUNT < 5 are removed, but the remaining groups' 
-- COUNT and AVG values are unchanged.

Aggregates Are Computed, Then Filtered

The SELECT Alias Accessibility Issue

One of the most frequently encountered execution order issues: can HAVING reference column aliases defined in SELECT?

Standard SQL answer: No. HAVING executes at step 6; SELECT at step 7. Aliases don't exist yet.

Practical reality: It depends on your database.

SELECT Alias in HAVING: Database Support
Database	Alias in HAVING?	Notes
PostgreSQL	❌ No	Strictly follows standard SQL
MySQL	✅ Yes	Extension: allows SELECT aliases in HAVING
SQL Server	❌ No	Standard behavior; use full expression
Oracle	❌ No	Standard behavior
SQLite	✅ Yes	Permissive; allows aliases
MariaDB	✅ Yes	MySQL-compatible extension

alias_portability.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- ⚠️ NON-PORTABLE: Works in MySQL/SQLite, fails in PostgreSQL/SQL Server
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING emp_count >= 5;  -- Using SELECT alias
 
-- ✓ PORTABLE: Repeat the aggregate expression
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) >= 5;  -- Using the expression directly
 
-- ✓ PORTABLE: Use a CTE to create intermediate table with aliases
WITH dept_counts AS (
    SELECT department, COUNT(*) AS emp_count
    FROM employees
    GROUP BY department
)
SELECT department, emp_count
FROM dept_counts
WHERE emp_count >= 5;  -- Now aliased column exists!

Best Practice: Repeat Expressions

The ORDER BY Anomaly: The Exception to the Rule

Unlike HAVING, ORDER BY can reference SELECT aliases in most databases. This is because ORDER BY executes after SELECT.

Why this matters:

It creates an asymmetry that confuses developers. The same alias that fails in HAVING works fine in ORDER BY.

order_by_aliases.sql
1
2
3
4
5
6
7
-- HAVING fails, ORDER BY succeeds (in standard-compliant databases)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
-- HAVING avg_salary > 50000;  -- ❌ Would fail in PostgreSQL
HAVING AVG(salary) > 50000     -- ✓ Works: repeat expression
ORDER BY avg_salary DESC;       -- ✓ Works: alias available after SELECT

ORDER BY can also access more:

In some databases, ORDER BY can reference columns from the source tables even if they're not in SELECT (though this is implementation-dependent):

order_by_flexibility.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Some databases allow ordering by non-selected columns
-- when not using aggregation
SELECT employee_id, name
FROM employees
ORDER BY hire_date;  -- hire_date not in SELECT but may work
 
-- With GROUP BY, only grouped/aggregated columns available
SELECT department, COUNT(*)
FROM employees
GROUP BY department
ORDER BY department;             -- ✓ GROUP BY column
-- ORDER BY employee_name;        -- ❌ Not grouped, not aggregated
-- ORDER BY MAX(salary);          -- ✓ Aggregate (computed during grouping)

Execution Order Explains the Difference

Complete Execution Order Reference

Here's a comprehensive reference for SQL execution order, summarizing what each clause can access and what it produces:

Complete Clause Reference
Order	Clause	Input	Output	Can Reference
1	FROM	Table names	All rows from table(s)	Table columns, literals
2	JOIN	Two+ tables	Combined rows	Columns from all joined tables
3	WHERE	Joined rows	Filtered rows	All table columns, subqueries, NO aggregates
4	GROUP BY	Filtered rows	Groups of rows	All columns, expressions, NOT aliases
5	Aggregates	Groups	Groups with aggregates	Columns within each group
6	HAVING	Groups with aggregates	Filtered groups	GROUP BY cols, aggregates, subqueries, NOT aliases
7	SELECT	Filtered groups	Result columns	GROUP BY cols, aggregates, expressions
8	DISTINCT	Result rows	Unique rows	Result columns
9	ORDER BY	Unique rows	Sorted rows	SELECT cols/aliases, GROUP BY cols, aggregates
10	LIMIT	Sorted rows	Limited rows	Integer count, offset

Memorize the Key Points

Practical Implications for Query Writing

Understanding execution order leads to better queries. Here are key practical implications:

Implication 1: Filter early when possible

Since WHERE runs before GROUP BY, filtering early reduces the data that needs to be grouped and aggregated:

filter_early.sql
1
2
3
4
5
6
7
8
9
10
-- ✓ GOOD: WHERE filters before grouping
SELECT department, AVG(salary)
FROM employees
WHERE hire_date > '2020-01-01'  -- 60% of rows eliminated early
GROUP BY department
HAVING AVG(salary) > 50000;
 
-- ✗ BAD: All rows grouped, then filter (if it were possible)
-- Hypothetically, if all data were grouped first, then filtered by date
-- You'd aggregate more data than necessary

Implication 2: Use HAVING only for aggregates

If a condition can go in WHERE, put it there. HAVING should be reserved for conditions that require aggregates:

having_for_aggregates.sql
1
2
3
4
5
6
7
8
9
10
11
-- ✗ WORKS but INEFFICIENT
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING department IN ('Engineering', 'Sales');  -- Could be WHERE!
 
-- ✓ EFFICIENT: Same result, better performance
SELECT department, COUNT(*)
FROM employees
WHERE department IN ('Engineering', 'Sales')  -- Filters first
GROUP BY department;

Implication 3: Repeat aggregate expressions in HAVING for portability

Don't rely on SELECT aliases in HAVING. Repeat the aggregate or use CTEs:

repeat_aggregates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- ✓ PORTABLE: Repeat aggregate in HAVING
SELECT 
    department, 
    SUM(CASE WHEN status='active' THEN salary ELSE 0 END) AS active_payroll
FROM employees
GROUP BY department
HAVING SUM(CASE WHEN status='active' THEN salary ELSE 0 END) > 100000;
 
-- ✓ CLEANER: Use CTE for complex expressions
WITH dept_payroll AS (
    SELECT 
        department,
        SUM(CASE WHEN status='active' THEN salary ELSE 0 END) AS active_payroll
    FROM employees
    GROUP BY department
)
SELECT * FROM dept_payroll
WHERE active_payroll > 100000;

Implication 4: Think in stages when debugging

When queries don't behave as expected, trace through the execution stages:

What rows does FROM produce?
Which rows survive WHERE?
How are rows grouped by GROUP BY?
What aggregates are computed?
Which groups survive HAVING?
What does SELECT output?

This systematic approach usually reveals where expectations diverge from reality.

Summary: Order of Execution

SQL's logical execution order underpins correct query writing. Understanding HAVING's position in this order is essential for masterful SQL. Let's consolidate:

Key Takeaways

•Logical order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
•WHERE filters rows — Executes before grouping, can't use aggregates
•HAVING filters groups — Executes after aggregation, designed for aggregate conditions
•Scope follows execution — Each clause can only access what exists at its stage
•SELECT aliases aren't in HAVING (standard SQL) — SELECT runs after HAVING
•ORDER BY is special — Runs after SELECT, so can use aliases
•Physical execution may differ — Optimizers reorder for efficiency, but results obey logical order
•Filter early — WHERE conditions are more efficient than equivalent HAVING conditions

Module Complete: HAVING Clause

Module recap:

Page 1 (Filtering Groups): Why group-level filtering requires HAVING, not WHERE
Page 2 (HAVING vs WHERE): Comprehensive comparison of the two filtering mechanisms
Page 3 (Aggregate Conditions): The full vocabulary of aggregate expressions in HAVING
Page 4 (Complex Filters): Multi-condition HAVING with logical operators
Page 5 (Order of Execution): HAVING's position and what it can access

With these foundations, you're ready to tackle the next modules on window functions—which provide even more powerful analytical capabilities building on your aggregation knowledge.

5 / 5