Loading content...
GROUP BY queries operate under strict rules that aren't arbitrary limitations—they're logical requirements for producing deterministic, meaningful results. When you group rows, each output row represents potentially many input rows. The rules ensure that every value in your output can be unambiguously determined.
Understanding these rules deeply helps you:
This page formalizes the rules you've encountered and explores their mathematical and practical foundations.
By the end of this page, you will understand the fundamental rule governing GROUP BY SELECT clauses, know how functional dependencies allow valid exceptions, recognize differences between databases in rule enforcement, and write queries that are correct across all SQL implementations.
The core rule of GROUP BY can be stated simply:
Every expression in the SELECT clause must evaluate to exactly one value per group.
This is the single-value requirement. Since each group becomes one output row, every column in that row must have exactly one value—not multiple, not zero, exactly one.
Two types of expressions naturally satisfy this requirement:
| Expression Type | Why It Works | Example |
|---|---|---|
| Grouping column | By definition, all rows in the group have the same value | salesperson when GROUP BY salesperson |
| Aggregate function | Computes one result from many values | SUM(amount), COUNT(*), AVG(price) |
| Constant/literal | Same value for every row, every group | 'USD', 2024, NULL |
| Expression on grouping column | Derived deterministically from grouping column | UPPER(salesperson), LENGTH(category) |
| Aggregate in expression | Expression using aggregate results | SUM(a) / COUNT(*), MAX(x) - MIN(x) |
1234567891011121314151617181920
-- All expressions satisfy single-value requirement:SELECT salesperson, -- Grouping column ✓ UPPER(salesperson) AS name_upper, -- Expression on grouping column ✓ 'Active' AS status, -- Constant ✓ COUNT(*) AS cnt, -- Aggregate ✓ SUM(amount) AS total, -- Aggregate ✓ SUM(amount) / COUNT(*) AS avg, -- Aggregates in expression ✓ MAX(amount) - MIN(amount) AS range -- Aggregates in expression ✓FROM salesGROUP BY salesperson; -- INVALID: product_category is neither grouped nor aggregatedSELECT salesperson, product_category, -- ❌ Multiple values per group! SUM(amount)FROM salesGROUP BY salesperson;-- Alice sold Electronics and Furniture - which one should appear?This isn't a language quirk—it's mathematically necessary. A function must produce one output for each input. GROUP BY maps groups to rows. If a column could have multiple values in a group, the mapping would be undefined. SQL prevents this ambiguity by requiring single-valued expressions.
Some columns not in GROUP BY can still satisfy the single-value requirement because they're functionally dependent on the grouping columns. A column Y is functionally dependent on column X if each value of X determines exactly one value of Y.
Example: In an employees table where emp_id is the primary key:
emp_id → emp_name (emp_id determines emp_name)
emp_id → department (emp_id determines department)
emp_id → hire_date (emp_id determines hire_date)
If you GROUP BY emp_id, other columns like emp_name are automatically single-valued (one name per employee ID).
123456789101112131415161718192021
-- Table structure:-- employees(emp_id PK, emp_name, department, hire_date)-- orders(order_id PK, emp_id FK, amount, order_date) -- GROUP BY primary key → all columns of that table are functionally dependentSELECT e.emp_id, e.emp_name, -- Functionally dependent on emp_id ✓ e.department, -- Functionally dependent on emp_id ✓ SUM(o.amount) AS total_salesFROM employees eJOIN orders o ON e.emp_id = o.emp_idGROUP BY e.emp_id; -- Only emp_id in GROUP BY! -- This is valid because:-- 1. emp_id is the grouping column-- 2. emp_name and department are functionally dependent on emp_id-- 3. For each emp_id, there's exactly one emp_name and department -- The SQL standard (SQL:1999+) recognizes this pattern-- and allows it when the database can verify functional dependencyHow databases verify functional dependency:
| Dependency Source | What It Allows | Example |
|---|---|---|
| GROUP BY emp_id (PK) | All columns from employees | emp_name, department in SELECT |
| GROUP BY order_id (PK) | All columns from orders | customer_id, order_date in SELECT |
| JOIN ON foreign key | Columns from referenced table | Depends on database support |
Use functional dependency rules when you need to display descriptive columns alongside a primary key grouping. Instead of GROUP BY id, name, department (redundant), just GROUP BY id and let the database infer that name and department are determined by id.
Different databases enforce GROUP BY rules with varying strictness. Understanding these differences is crucial for writing portable SQL and debugging confusing errors.
| Database | Enforcement Level | Functional Dependency Support | Notes |
|---|---|---|---|
| PostgreSQL | Strict | Primary key only | Errors on non-grouped, non-aggregated columns unless PK grouped |
| MySQL 5.7+ | Strict (default) | Primary key only | ONLY_FULL_GROUP_BY mode; can be disabled (not recommended) |
| MySQL < 5.7 | Permissive | None | Silently picks arbitrary values—dangerous! |
| SQL Server | Strict | None | Always requires explicit GROUP BY or aggregate |
| Oracle | Strict | None | Always requires explicit GROUP BY or aggregate |
| SQLite | Permissive | None | No enforcement—picks arbitrary values (MIN value) |
| MariaDB | Configurable | Primary key | Follows MySQL behavior based on SQL mode |
1234567891011121314151617181920212223
-- This query behaves differently across databases:SELECT salesperson, product_category, SUM(amount)FROM salesGROUP BY salesperson; -- PostgreSQL / MySQL (strict) / SQL Server / Oracle:-- ERROR: column "product_category" must appear in GROUP BY-- or be used in an aggregate function -- MySQL (permissive) / SQLite:-- Returns a result, but product_category is ARBITRARY-- You might get 'Electronics' or 'Furniture' for Alice-- The choice is undefined and may change between runs! -- Safe, portable version:SELECT salesperson, SUM(amount) AS totalFROM salesGROUP BY salesperson; -- Works everywhere, no ambiguity -- Or include category in grouping:SELECT salesperson, product_category, SUM(amount) AS totalFROM salesGROUP BY salesperson, product_category; -- Also universally validOlder MySQL versions (and some permissive configurations) allow non-grouped columns. This causes silent data corruption—queries return 'results' that are meaningless. If you encounter a legacy MySQL without ONLY_FULL_GROUP_BY, treat every GROUP BY query with suspicion and enable the mode if possible.
GROUP BY can contain expressions, not just column names. When you use an expression, you're grouping by the computed result of that expression:
GROUP BY EXTRACT(YEAR FROM order_date)
-- Groups by year value (2023, 2024, etc.), not the full date
The matching rule: To include an expression's result in SELECT, you must use the exact same expression—or wrap it in an aggregate.
123456789101112131415161718192021222324252627282930313233
-- Grouping by date expressionSELECT EXTRACT(YEAR FROM sale_date) AS year, -- Same expression ✓ EXTRACT(MONTH FROM sale_date) AS month, -- Different expression ❌ (ERROR!) SUM(amount) AS totalFROM salesGROUP BY EXTRACT(YEAR FROM sale_date); -- Only grouping by year-- ERROR: month expression not in GROUP BY -- Correct: include month in GROUP BYSELECT EXTRACT(YEAR FROM sale_date) AS year, EXTRACT(MONTH FROM sale_date) AS month, SUM(amount) AS totalFROM salesGROUP BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date); -- Using CASE expressions in GROUP BYSELECT CASE WHEN amount >= 1000 THEN 'Large' WHEN amount >= 500 THEN 'Medium' ELSE 'Small' END AS size_bucket, COUNT(*) AS transaction_count, SUM(amount) AS total_revenueFROM salesGROUP BY CASE WHEN amount >= 1000 THEN 'Large' WHEN amount >= 500 THEN 'Medium' ELSE 'Small' END; -- Must repeat the full CASE expressionExpression matching nuances:
The database must recognize that SELECT expression = GROUP BY expression. This matching is:
UPPER(name) ≠ LOWER(name), even if results could be equivalent| Pattern | Standard SQL | PostgreSQL | MySQL (5.7+) |
|---|---|---|---|
| GROUP BY column | ✓ | ✓ | ✓ |
| GROUP BY expression | ✓ | ✓ | ✓ |
| GROUP BY position (1, 2, 3) | ❌ | ✓ | ✓ |
| GROUP BY alias | ❌ | ✓ | ✓ |
Some databases allow GROUP BY year (alias) or GROUP BY 1 (first SELECT column). While convenient, these aren't standard SQL. For maximum portability, repeat the full expression in GROUP BY. For readability in one database, check if your database supports alias references.
Understanding what's invalid is as important as knowing what's valid. Here's a comprehensive list of SELECT expressions that violate GROUP BY rules:
| Invalid Expression | Why It Fails | Solution |
|---|---|---|
| Non-grouped column | Multiple values exist in group | Add to GROUP BY or use aggregate |
Nested aggregate: AVG(SUM(x)) | Can't aggregate aggregates directly | Use subquery or CTE |
| Window function in aggregate | Different semantics | Compute separately |
| Subquery with outer aggregates | Correlation issues | Restructure query |
| Column from ON-joined table not in GROUP BY | Not functionally dependent | Add to GROUP BY or aggregate |
123456789101112131415161718192021222324252627282930
-- INVALID: Non-grouped columnSELECT salesperson, product_category, SUM(amount)FROM salesGROUP BY salesperson;-- Fix: Add product_category to GROUP BY or use MIN/MAX/aggregate -- INVALID: Nested aggregatesSELECT category, AVG(SUM(amount))FROM salesGROUP BY category;-- Fix: Use subquerySELECT AVG(category_total) AS avg_category_totalFROM ( SELECT category, SUM(amount) AS category_total FROM sales GROUP BY category) subq; -- INVALID: Column from joined table not in GROUP BYSELECT e.department, c.customer_name, -- Not in GROUP BY! SUM(o.amount)FROM orders oJOIN employees e ON o.emp_id = e.emp_idJOIN customers c ON o.customer_id = c.customer_idGROUP BY e.department;-- Fix: Either add c.customer_name to GROUP BY-- or use MIN(c.customer_name), MAX(c.customer_name), etc.-- or reorganize the querySome databases (MySQL 8.0+, BigQuery) offer ANY_VALUE(column) which explicitly says 'give me any arbitrary value from the group.' Use this only when you're certain all values in the group are identical, or when any value is acceptable. It's a way to satisfy the rule while acknowledging potential ambiguity.
Aggregate functions change behavior based on GROUP BY presence:
Without GROUP BY: Aggregates operate on the entire result set, producing one row.
With GROUP BY: Aggregates operate on each group independently, producing one value per group.
This behavior applies to all aggregates: COUNT, SUM, AVG, MIN, MAX, and any user-defined aggregates.
1234567891011121314151617181920212223242526272829303132333435
-- COUNT(*) behavior comparison -- Without GROUP BY: counts entire tableSELECT COUNT(*) AS total_rowsFROM sales;-- Result: 8 (one row, total count) -- With GROUP BY: counts per groupSELECT salesperson, COUNT(*) AS rows_in_groupFROM salesGROUP BY salesperson;-- Result: 3 rows (Alice:3, Bob:3, Carol:2) -- SUM behavior comparison -- Without GROUP BY: sums entire tableSELECT SUM(amount) AS total_revenueFROM sales;-- Result: $6,455 (one row) -- With GROUP BY: sums per groupSELECT salesperson, SUM(amount) AS group_revenueFROM salesGROUP BY salesperson;-- Result: 3 rows with per-person totals -- DISTINCT in aggregates operates per groupSELECT salesperson, COUNT(DISTINCT product_category) AS unique_categoriesFROM salesGROUP BY salesperson;-- Alice: 2 (Electronics, Furniture)-- Bob: 2 (Furniture, Electronics) -- Carol: 2 (Electronics, Office)Aggregate scope clarification:
When GROUP BY is present:
This is why GROUP BY fundamentally changes query semantics—aggregates no longer summarize 'everything,' they summarize 'each category.'
Window functions (covered separately) operate on 'windows' of rows independently from GROUP BY. You can have both GROUP BY aggregates and window functions in the same query, but they follow different rules. Window functions compute over groups but don't collapse rows.
GROUP BY queries can produce subtle bugs. Here's a systematic approach to testing and debugging:
1234567891011121314151617181920212223242526272829303132
-- Technique 1: Count groups before and after filteringSELECT COUNT(DISTINCT salesperson) AS expected_groupsFROM sales;-- Compare to actual GROUP BY row count -- Technique 2: Inspect group contents before aggregatingSELECT salesperson, COUNT(*) AS rows_in_groupFROM salesGROUP BY salesperson;-- Verify counts match expectations -- Technique 3: Check for NULL groupsSELECT CASE WHEN salesperson IS NULL THEN 'NULL_GROUP' ELSE 'VALID' END AS group_type, COUNT(*) AS countFROM salesGROUP BY CASE WHEN salesperson IS NULL THEN 'NULL_GROUP' ELSE 'VALID' END; -- Technique 4: Detect row multiplication from JOINs-- Compare: direct SUM vs joined SUMSELECT SUM(amount) FROM sales; -- Baseline SELECT SUM(s.amount) FROM sales sJOIN some_table t ON s.id = t.sales_id;-- If this is higher, the JOIN is creating duplicates -- Technique 5: Use HAVING to find suspicious groupsSELECT salesperson, COUNT(*) AS cntFROM salesGROUP BY salespersonHAVING COUNT(*) > 100; -- Unexpectedly large groups?A common bug: JOINing to a table with multiple matching rows multiplies your data before aggregation. If orders JOIN to order_items, each order row becomes multiple rows. SUM(order.total) then adds the same order total multiple times. Solution: Aggregate before joining, or use DISTINCT carefully.
Understanding GROUP BY rules is essential for writing correct, portable SQL. Let's consolidate the key principles:
UPPER(name) requires exactly UPPER(name) in SELECT. Aliases work in some databases but aren't standard.What's Next:
Even experienced developers make GROUP BY mistakes. The next page covers common errors—the bugs that appear repeatedly in production code, why they happen, and how to avoid them. Learning from common mistakes accelerates your mastery and helps you review others' code effectively.
You now understand the formal rules governing GROUP BY queries, including functional dependencies and database-specific variations. Next, we'll examine common errors and pitfalls to make you a more robust SQL developer.