Loading learning content...
Every meaningful interaction with data begins with a fundamental question: How many? How many customers placed orders last month? How many products are in stock? How many transactions failed? The COUNT function is SQL's answer to this universal need—a deceptively simple construct that, when understood deeply, becomes an indispensable tool in the data professional's arsenal.
Yet beneath this apparent simplicity lies nuance that trips up even experienced developers. The difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column) is not merely syntactic—it reflects fundamentally different semantic operations with distinct performance characteristics and results. Understanding these distinctions is the difference between getting correct answers and getting confident-but-wrong answers.
By the end of this page, you will master the COUNT function in all its variations. You'll understand its precise semantics, NULL handling behavior, performance implications, and when to choose each variant. You'll see how COUNT operates as a foundational aggregate that sets the pattern for understanding all other aggregation functions.
Before diving into COUNT specifically, we must understand what aggregate functions are and how they differ fundamentally from scalar functions.
Scalar functions operate on individual values and return individual values. Functions like UPPER(), ROUND(), or LENGTH() take one input and produce one output per row. If you have 1,000 rows, a scalar function produces 1,000 results.
Aggregate functions operate on sets of values and collapse them into single summary values. They transform many rows into fewer rows—often just one. This collapse is the essence of aggregation: converting detail-level data into summary-level insight.
| Characteristic | Scalar Functions | Aggregate Functions |
|---|---|---|
| Input | Single value | Set of values (potentially many rows) |
| Output | One value per input row | One value per group (or entire result set) |
| Examples | UPPER(), LOWER(), ROUND(), CONCAT() | COUNT(), SUM(), AVG(), MIN(), MAX() |
| Row preservation | Maintains row count | Reduces row count |
| NULL handling | Usually returns NULL if input is NULL | Varies by function; often ignores NULLs |
The aggregation contract:
When you invoke an aggregate function, you're making an implicit contract with the database engine:
GROUP BY or, if absent, the entire result set forms one group)This three-part contract applies to every aggregate function, and understanding it is essential for writing correct aggregation queries.
Think of aggregate functions as "reducers" in functional programming. They take a collection and reduce it to a single value. Each group of rows becomes one row in the output. No GROUP BY means one group (all rows). With GROUP BY, you get as many output rows as there are distinct group combinations.
The COUNT(*) form is the most straightforward variant: count every row in the group, unconditionally. The asterisk doesn't mean "all columns"—it means "the row itself." This distinction is crucial.
COUNT(*) answers: How many rows exist in this set?
It doesn't examine column values. It doesn't care about NULLs. It simply counts the presence of rows.
12345678910111213141516171819202122
-- Total number of rows in the employees tableSELECT COUNT(*) AS total_employeesFROM employees;-- Result: Returns the total row count, regardless of NULL values -- Count rows with a WHERE filterSELECT COUNT(*) AS active_employeesFROM employeesWHERE status = 'active';-- Result: Counts only rows where the condition is true -- Count within groupsSELECT department_id, COUNT(*) AS dept_sizeFROM employeesGROUP BY department_id;-- Result: One row per department, with employee count -- Count with multiple conditionsSELECT COUNT(*) AS senior_engineersFROM employeesWHERE job_title = 'Engineer' AND hire_date < '2020-01-01';Why COUNT(*) and not COUNT(column)?
Beginners often write COUNT(id) or COUNT(some_column) when they mean COUNT(*). While these may produce identical results in many cases, they have different semantics:
COUNT(*) counts rows, including rows where every column is NULLCOUNT(column) counts non-NULL values in that columnIf your intent is to count rows, always use COUNT(*). It's more explicit about intent and often marginally faster because the engine doesn't need to evaluate a specific column's nullability.
In some databases, COUNT() can be optimized using index metadata or specialized counters, especially for InnoDB tables in MySQL or when counting entire tables without WHERE clauses. However, with complex predicates, the engine must still scan or seek through the data. Don't assume COUNT() is always instantaneous for large tables.
| employees table | department_id | name | salary |
|---|---|---|---|
| Row 1 | 10 | Alice | 75000 |
| Row 2 | 10 | Bob | NULL |
| Row 3 | 20 | NULL | 60000 |
| Row 4 | 20 | Diana | NULL |
| Row 5 | NULL | Eve | 55000 |
123456789101112131415161718
-- Given the table above: SELECT COUNT(*) FROM employees; -- Result: 5 (all rows counted) SELECT COUNT(*) FROM employees WHERE department_id = 10; -- Result: 2 (Alice and Bob) SELECT COUNT(*) FROM employees WHERE salary IS NOT NULL; -- Result: 3 (Alice, row 3, Eve) SELECT department_id, COUNT(*) AS cntFROM employeesGROUP BY department_id;-- Result:-- department_id | cnt-- 10 | 2-- 20 | 2-- NULL | 1When you specify a column inside COUNT(), the semantic changes fundamentally: only non-NULL values are counted. This is one of the most important behaviors to understand in SQL aggregation.
COUNT(column) answers: How many non-NULL values exist for this column in the set?
This makes COUNT(column) a null-filtering operation, not just a counting operation.
1234567891011121314151617181920
-- Count employees with non-NULL salariesSELECT COUNT(salary) AS employees_with_salaryFROM employees;-- If salary is NULL for some employees, they're excluded -- Compare COUNT(*) vs COUNT(column)SELECT COUNT(*) AS total_rows, COUNT(salary) AS with_salary, COUNT(bonus) AS with_bonusFROM employees;-- These numbers may all differ! -- Using COUNT(column) to find data completenessSELECT COUNT(*) AS total_records, COUNT(email) AS has_email, COUNT(phone) AS has_phone, COUNT(address) AS has_addressFROM customers;Practical use cases for COUNT(column):
COUNT(email) * 100.0 / COUNT(*)COUNT() evaluates its argument for each row. If the argument expression evaluates to NULL, that row doesn't increment the count. This means COUNT(NULL) always returns 0, and COUNT(column_a + column_b) counts only rows where the addition doesn't produce NULL.
1234567891011121314151617
-- Counting with expressionsSELECT COUNT(salary + bonus) AS with_total_comp, -- NULL if either is NULL COUNT(COALESCE(bonus, 0)) AS all_rows -- COALESCE prevents NULLFROM employees; -- Using CASE for conditional counting (alternative to FILTER)SELECT COUNT(*) AS total, COUNT(CASE WHEN status = 'active' THEN 1 END) AS active_count, COUNT(CASE WHEN salary > 100000 THEN 1 END) AS high_earnersFROM employees;-- CASE returns NULL when condition is false, so COUNT excludes those -- This pattern is so common, some databases have shorthand:-- PostgreSQL: COUNT(*) FILTER (WHERE status = 'active')-- SQL Server: COUNT(CASE WHEN status = 'active' THEN 1 END)| Query | Result (using sample table) | Explanation |
|---|---|---|
| COUNT(*) | 5 | Counts all rows regardless of NULL values |
| COUNT(name) | 4 | Excludes row 3 where name is NULL |
| COUNT(salary) | 3 | Excludes rows 2 and 4 where salary is NULL |
| COUNT(department_id) | 4 | Excludes row 5 where department_id is NULL |
COUNT(DISTINCT expression) combines deduplication with counting: count each unique non-NULL value only once. This is remarkably powerful for understanding cardinality—the number of distinct values in a dataset.
COUNT(DISTINCT column) answers: How many different non-NULL values exist for this column?
123456789101112131415
-- How many distinct departments have employees?SELECT COUNT(DISTINCT department_id) AS num_departmentsFROM employees; -- How many unique customers placed orders this month?SELECT COUNT(DISTINCT customer_id) AS unique_customersFROM ordersWHERE order_date >= DATE_TRUNC('month', CURRENT_DATE); -- Compare total orders vs unique customersSELECT COUNT(*) AS total_orders, COUNT(DISTINCT customer_id) AS unique_customers, COUNT(*) * 1.0 / COUNT(DISTINCT customer_id) AS orders_per_customerFROM orders;The deduplication process:
Internally, COUNT(DISTINCT) must track which values it has already seen. This typically involves:
This explains why COUNT(DISTINCT) is often slower than COUNT(*) or COUNT(column)—it requires additional memory and computational overhead for deduplication.
COUNT(DISTINCT) on high-cardinality columns (many unique values) can be expensive. For millions of rows with mostly unique values, the engine must maintain millions of entries in its deduplication structure. For approximate counts at scale, consider HyperLogLog-based alternatives like PostgreSQL's pg_hll extension or BigQuery's APPROX_COUNT_DISTINCT.
1234567891011121314151617181920
-- Multiple DISTINCT counts in one querySELECT COUNT(DISTINCT department_id) AS num_departments, COUNT(DISTINCT job_title) AS num_job_titles, COUNT(DISTINCT manager_id) AS num_managersFROM employees; -- DISTINCT on expressionsSELECT COUNT(DISTINCT YEAR(hire_date)) AS years_with_hiresFROM employees; -- Counting distinct combinations (standard SQL)-- Note: COUNT(DISTINCT col1, col2) is NOT standard SQL-- Use this pattern instead:SELECT COUNT(DISTINCT CONCAT(department_id, '-', job_title)) AS unique_combinationsFROM employees; -- PostgreSQL alternative: row-based distinctSELECT COUNT(DISTINCT (department_id, job_title)) AS unique_combinationsFROM employees;Common use cases for COUNT(DISTINCT):
| Variant | NULLs Included? | Deduplication? | Performance |
|---|---|---|---|
| COUNT(*) | Yes (counts all rows) | No | Generally fastest |
| COUNT(column) | No (excludes NULLs) | No | Fast (one column check) |
| COUNT(DISTINCT column) | No (excludes NULLs) | Yes | Slower (tracking overhead) |
The true power of aggregate functions emerges when combined with GROUP BY. Instead of producing one count for the entire dataset, you produce one count per group—enabling segmented analysis.
When GROUP BY is present, the aggregation contract changes:
12345678910111213141516171819202122232425
-- Count employees per departmentSELECT department_id, COUNT(*) AS employee_countFROM employeesGROUP BY department_idORDER BY employee_count DESC; -- Count with multiple grouping columnsSELECT department_id, job_title, COUNT(*) AS headcountFROM employeesGROUP BY department_id, job_titleORDER BY department_id, headcount DESC; -- Combining COUNT variants with GROUP BYSELECT department_id, COUNT(*) AS total_employees, COUNT(salary) AS with_salary, COUNT(DISTINCT job_title) AS unique_rolesFROM employeesGROUP BY department_id;Understanding the grouping process:
This is why you cannot select non-aggregated columns that aren't in the GROUP BY clause—the database wouldn't know which value to show when multiple rows collapse into one.
NULL values in grouping columns form their own group. If three employees have NULL department_id, they'll appear as one row with department_id = NULL and count = 3. This is often desired behavior but can surprise developers expecting NULLs to be excluded.
12345678910111213141516171819202122232425
-- Time-based groupingSELECT DATE_TRUNC('month', order_date) AS month, COUNT(*) AS total_orders, COUNT(DISTINCT customer_id) AS unique_customersFROM ordersGROUP BY DATE_TRUNC('month', order_date)ORDER BY month; -- Hierarchical grouping with ROLLUP (extended totals)SELECT COALESCE(region, 'ALL REGIONS') AS region, COALESCE(department, 'ALL DEPTS') AS department, COUNT(*) AS employee_countFROM employeesGROUP BY ROLLUP(region, department); -- Using GROUP BY with HAVING to filter groupsSELECT department_id, COUNT(*) AS employee_countFROM employeesGROUP BY department_idHAVING COUNT(*) >= 5 -- Only departments with 5+ employeesORDER BY employee_count DESC;Understanding COUNT conceptually is one thing; using it correctly in production is another. Let's examine common patterns and pitfalls that distinguish novice from expert usage.
COUNT(*) not COUNT(id). It's clearer in intent and may be optimized differently.COUNT(email) * 100.0 / NULLIF(COUNT(*), 0) prevents division by zero and gives percentage with non-NULL emails.WHERE COUNT(*) > 5 is invalid; use HAVING COUNT(*) > 5 to filter groups.COUNT(column) on an all-NULL column returns 0, not NULL.123456789101112131415161718192021222324252627282930313233
-- Pattern: Data completeness reportSELECT 'employees' AS table_name, COUNT(*) AS total_rows, COUNT(email) AS has_email, COUNT(phone) AS has_phone, COUNT(address) AS has_address, ROUND(COUNT(email) * 100.0 / NULLIF(COUNT(*), 0), 2) AS email_pctFROM employees; -- Pattern: Conditional counting with CASESELECT department_id, COUNT(*) AS total, COUNT(CASE WHEN status = 'active' THEN 1 END) AS active, COUNT(CASE WHEN status = 'inactive' THEN 1 END) AS inactive, COUNT(CASE WHEN status IS NULL THEN 1 END) AS unknownFROM employeesGROUP BY department_id; -- Pattern: Count with existence check (correlation pattern)SELECT d.department_name, (SELECT COUNT(*) FROM employees e WHERE e.department_id = d.id) AS emp_countFROM departments d; -- More efficient join-based alternative:SELECT d.department_name, COUNT(e.id) AS emp_countFROM departments dLEFT JOIN employees e ON e.department_id = d.idGROUP BY d.department_name;While COUNT is defined by the SQL standard, implementations vary across database systems. Understanding these variations is essential for writing portable code or leveraging database-specific optimizations.
12345678910111213141516171819
-- PostgreSQL supports COUNT with FILTER clause (SQL:2003)SELECT COUNT(*) AS total, COUNT(*) FILTER (WHERE status = 'active') AS active, COUNT(*) FILTER (WHERE status = 'inactive') AS inactiveFROM employees; -- COUNT(DISTINCT) on multiple columns using row comparisonSELECT COUNT(DISTINCT (department_id, job_title)) FROM employees; -- Approximate count with pg_class (instant but approximate)SELECT reltuples::bigint AS estimateFROM pg_classWHERE relname = 'employees'; -- For large tables, consider pg_stat_user_tablesSELECT n_live_tup FROM pg_stat_user_tables WHERE relname = 'employees';The COUNT function is deceptively simple on the surface but rich in nuance. Let's consolidate what we've learned:
What's next:
Now that we've mastered counting, we'll explore SUM—the aggregate function for total value calculations. You'll see how SUM shares NULL-handling behavior with COUNT but introduces new considerations around numeric types and overflow.
You now understand COUNT in all its forms. You can count rows unconditionally, count non-NULL values, count unique values, and combine these with GROUP BY for segmented analysis. This foundational knowledge prepares you for all other aggregate functions.