Select Basics - Learning Module

Loading content...

0/241

All Columns (*): The Asterisk Operator

The Most Convenient—and Dangerous—Shortcut

The asterisk (*) in SQL is simultaneously the most convenient and the most potentially dangerous feature of the SELECT statement. With a single character, you can retrieve every column from every table in your query—no typing, no remembering column names, no updates needed when the schema changes.

This convenience comes at a cost. **SELECT *** is the source of countless production bugs, performance disasters, and maintenance nightmares. Yet when used appropriately, it remains a valuable tool for exploration, debugging, and specific technical patterns.

This page provides a comprehensive examination of the asterisk operator—its exact behavior, appropriate use cases, serious pitfalls, and the nuanced judgment required to use it wisely.

What You Will Learn

By the end of this page, you will understand every aspect of SELECT *—how it expands, how it behaves in joins and subqueries, when it's appropriate to use, and why experienced practitioners treat it with caution in production code.

Asterisk Fundamentals

The asterisk (*) is a special symbol in the SELECT clause that means "all columns from the table(s) in the FROM clause." At query execution time, the asterisk is expanded into a complete list of column names before the query runs.

Expansion behavior:

When you write SELECT * FROM employees, the database internally expands this to something like:

SELECT employee_id, first_name, last_name, email, phone, department_id, ...
FROM employees

This expansion happens based on the table's current schema definition. The order of columns in the expansion follows the table's column definition order.

asterisk_fundamentals.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Basic asterisk usage
SELECT * FROM employees;
-- Expands to all columns in employees table
-- Column order matches table definition order
 
-- Equivalent to (in this example):
SELECT employee_id, first_name, last_name, email, phone, 
       hire_date, salary, commission_pct, manager_id, department_id
FROM employees;
 
-- Asterisk with other columns (mixing is allowed)
SELECT *, 'Active' AS status
FROM employees;
-- Returns: all columns from employees PLUS a literal column
 
SELECT employee_id, *, salary * 12 AS annual_salary
FROM employees;
-- Returns: employee_id, then all columns, then annual_salary
-- Note: employee_id appears twice (once explicit, once from *)
 
-- Qualified asterisk: table.* syntax
SELECT employees.*
FROM employees;
-- Explicitly: all columns from employees (same as SELECT * here)
 
-- Qualified asterisk becomes essential in joins
SELECT employees.*, departments.*
FROM employees
JOIN departments ON employees.department_id = departments.department_id;
-- Returns: all employee columns, then all department columns

Column Order Is Defined

The order of columns returned by SELECT * is not random—it follows the column definition order in the table schema (typically the order columns were created via CREATE TABLE or ALTER TABLE ADD). However, relying on this order in application code is fragile, as it can change with schema modifications.

Asterisk Expansion Behavior
Query Pattern	Expansion	Column Order
`SELECT * FROM t`	All columns from t	Table definition order
`SELECT t.* FROM t`	All columns from t (qualified)	Table definition order
`SELECT *, col FROM t`	All cols + col (duplicate if col in t)	All cols, then col
`SELECT col, * FROM t`	col + all cols (duplicate if col in t)	col, then all cols
`SELECT * FROM t1 JOIN t2`	All from t1, then all from t2	t1 cols, then t2 cols

Asterisk in Joins

When a query involves multiple tables, the asterisk's behavior becomes more complex—and more potentially problematic. Understanding how * expands across joins is critical for avoiding bugs and confusion.

Multi-table expansion:

In a join query, SELECT * expands to include columns from all tables in the FROM clause, in the order they appear. This can result in:

Duplicate column names (common columns like 'id', 'name', 'created_at')
Redundant data (join keys appear multiple times)
Very wide result sets (all columns from all tables)

asterisk_in_joins.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Two-table join with asterisk
SELECT *
FROM employees
JOIN departments ON employees.department_id = departments.department_id;
 
-- Expands to (approximately):
-- employees.employee_id, employees.first_name, employees.department_id, ...
-- departments.department_id, departments.department_name, departments.location_id, ...
-- Note: department_id appears TWICE (from both tables)
 
-- Qualified asterisk for specific table's columns
SELECT employees.*, departments.department_name
FROM employees
JOIN departments ON employees.department_id = departments.department_id;
-- Returns: all employee columns + department_name only
 
-- Mixing qualified asterisks
SELECT employees.*, departments.*, locations.city
FROM employees
JOIN departments ON employees.department_id = departments.department_id
JOIN locations ON departments.location_id = locations.location_id;
 
-- NATURAL JOIN asterisk behavior: Removes duplicate join columns
SELECT *
FROM employees
NATURAL JOIN departments;
-- Returns: shared columns once, then remaining from employees, then remaining from departments
-- department_id appears only ONCE (at the beginning)
 
-- USING clause asterisk behavior: Similar to NATURAL JOIN
SELECT *
FROM employees
JOIN departments USING (department_id);
-- department_id appears only once
 
-- Outer join with asterisk shows NULLs
SELECT *
FROM employees
LEFT JOIN departments ON employees.department_id = departments.department_id;
-- Employees without departments: department columns are all NULL

Duplicate Column Names in Results

Many applications cannot handle duplicate column names in a result set. If employees.id and departments.id both appear in SELECT * with a join, the application may only see one 'id' column (usually the last one). This is a common source of subtle bugs that work in development but fail in production.

Join Asterisk Guidelines

•Avoid naked * in production joins — Explicit columns prevent duplicate name issues.
•Use qualified t. when appropriate* — Clearly states which table's columns you want.
•Consider USING vs ON — USING deduplicates join columns; ON includes both copies.
•Test with your data access layer — Different ORMs and database clients handle duplicates differently.
•Document expected columns — If using *, comment what columns you expect to receive.

Asterisk in Subqueries

The asterisk appears in several subquery patterns with different meanings and implications. Understanding these contexts helps you use * appropriately and recognize its effects.

*Key subquery contexts for :

EXISTS subqueries: * is conventional but not actually evaluated
Derived tables: * passes all columns to outer query
IN subqueries: Typically wrong (must return single column)
Scalar subqueries: Always wrong (must return single value)

asterisk_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- EXISTS: * is conventional and correct
SELECT d.department_name
FROM departments d
WHERE EXISTS (
    SELECT *  -- Convention: just checking existence, not column values
    FROM employees e
    WHERE e.department_id = d.department_id
);
-- Semantically equivalent to:
SELECT d.department_name
FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.department_id = d.department_id);
 
-- Derived table: * passes all columns through
SELECT dt.employee_id, dt.full_name
FROM (
    SELECT employee_id, 
           CONCAT(first_name, ' ', last_name) AS full_name,
           *  -- Includes all original columns too
    FROM employees
) AS dt
WHERE dt.salary > 50000;
 
-- Better: Only include needed columns in derived table
SELECT dt.employee_id, dt.full_name
FROM (
    SELECT employee_id, 
           CONCAT(first_name, ' ', last_name) AS full_name,
           salary
    FROM employees
) AS dt
WHERE dt.salary > 50000;
 
-- IN subquery: * is WRONG (returns multiple columns)
-- INCORRECT:
SELECT department_name
FROM departments
WHERE department_id IN (SELECT * FROM employees);  -- ERROR!
 
-- CORRECT: Return single column
SELECT department_name
FROM departments
WHERE department_id IN (SELECT department_id FROM employees);
 
-- Scalar subquery: * is WRONG (must return single value)
-- INCORRECT:
SELECT employee_id,
       (SELECT * FROM departments WHERE department_id = e.department_id)
FROM employees e;  -- ERROR!
 
-- CORRECT: Return single column
SELECT employee_id,
       (SELECT department_name FROM departments d 
        WHERE d.department_id = e.department_id) AS dept_name
FROM employees e;
 
-- CTE with asterisk: Passes through all columns
WITH emp_data AS (
    SELECT * FROM employees WHERE department_id = 5
)
SELECT * FROM emp_data;  -- All employee columns available

Asterisk Usage in Subquery Contexts
Subquery Type	Asterisk Usage	Correct?	Notes
EXISTS	`WHERE EXISTS (SELECT * ...)`	Yes (conventional)	Value never examined; 1 is equivalent
NOT EXISTS	`WHERE NOT EXISTS (SELECT * ...)`	Yes (conventional)	Same as EXISTS
Derived table	`FROM (SELECT * FROM t) AS dt`	Sometimes	Passes all columns; often too many
CTE body	`WITH cte AS (SELECT * FROM t)`	Sometimes	Same considerations as derived tables
IN (single col)	`WHERE x IN (SELECT * FROM t)`	No	ERROR: IN requires single column
Scalar subquery	`SELECT (SELECT * FROM t WHERE ...)`	No	ERROR: Must return single value

EXISTS with SELECT 1 vs SELECT *

While SELECT * and SELECT 1 are semantically identical in EXISTS (the database only checks if rows exist, not their values), many style guides prefer SELECT 1 to signal intent clearly. Some databases optimize identically; others may marginally prefer SELECT 1. The choice is largely stylistic.

SELECT * as a Production Anti-Pattern

In production application code, SELECT * is widely considered an anti-pattern—a common but harmful practice that should generally be avoided. Understanding why helps you make informed decisions about when to break this rule.

Why SELECT * Is Problematic in Production

•Schema coupling — Application code becomes dependent on column order and presence. Adding a column can break deserialization. Removing a column crashes the application.
•Performance degradation — Retrieves columns you don't need. Large TEXT/BLOB columns are read even if unused. Prevents covering index optimizations.
•Network overhead — More data transferred between database and application. At scale, this adds latency and costs.
•Memory consumption — Application loads entire rows into memory. For large tables, this can cause OOM errors.
•Unclear intent — Readers cannot tell which columns are actually used. Makes refactoring and optimization difficult.
•Duplicate column issues — In joins, same-named columns overwrite each other in application data structures.
•Security exposure — May expose sensitive columns (SSN, salary, internal IDs) that shouldn't leave the database.

anti_pattern_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
-- SCENARIO: Application breaks when schema changes
 
-- Original table
CREATE TABLE users (
    id INT PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(100),
    created_at TIMESTAMP
);
 
-- Application code expecting: id, username, email, created_at
-- Using SELECT * works initially
 
-- DBA adds confidential column
ALTER TABLE users ADD ssn VARCHAR(11);  -- Social Security Number
 
-- Now SELECT * returns SSN to the application
-- Security breach: SSN exposed to frontend/logs
 
-- Another scenario: Column removed
ALTER TABLE users DROP COLUMN email;
 
-- Application expecting email column crashes:
-- "Column 'email' not found in result set"
 
-- PROPER APPROACH: Explicit columns
 
-- Application query (immune to column changes)
SELECT id, username, email, created_at
FROM users
WHERE id = ?;
 
-- If email is removed:
-- - Query fails at database level with clear error
-- - No silent bugs or security issues
-- - Can be fixed by updating the query
 
-- SCENARIO: Performance disaster
 
-- Table with large bio TEXT column
CREATE TABLE profiles (
    user_id INT PRIMARY KEY,
    display_name VARCHAR(100),
    bio TEXT,  -- Average 10KB per row
    avatar BLOB  -- Average 500KB per row
);
 
-- Query needing only name and ID
-- POOR: SELECT * FROM profiles WHERE active = 1;
-- Retrieves: ~510KB per row (bio + avatar)
 
-- BETTER: SELECT user_id, display_name FROM profiles WHERE active = 1;
-- Retrieves: ~104 bytes per row
 
-- For 10,000 rows:
-- SELECT *: 5.1 GB of data transfer
-- Explicit: 1 MB of data transfer
-- 5000x difference!

Real-World Incident

A major e-commerce site experienced a 60-minute outage when a DBA added a 'metadata' JSON column to their products table. SELECT * queries suddenly retrieved megabytes of JSON per product, overwhelming application servers. The fix was changing SELECT * to explicit columns—a 5-minute code change that required an hour of downtime.

When Asterisk Is Appropriate

Despite its pitfalls, SELECT * has legitimate uses. The key is understanding the difference between exploratory work and production code.

*Appropriate contexts for SELECT :

Valid SELECT * Use Cases

•Ad-hoc exploration — Discovering table structure and sample data interactively. No production impact.
•Debugging investigations — Examining complete row contents when troubleshooting. Temporary usage.
•EXISTS subqueries — Standard idiom where column values are never accessed.
•Backup and migration — CREATE TABLE AS SELECT * or INSERT INTO...SELECT * for schema cloning.
•Views that expose entire tables — Encapsulating access control without column filtering.
•Teaching and documentation — Showing query patterns where specific columns aren't relevant.
•Quick prototypes — Early development exploring requirements. Convert before production.

appropriate_asterisk_usage.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- 1. Ad-hoc exploration in SQL client
SELECT * FROM unknown_table LIMIT 10;
-- Perfectly fine for discovery
 
-- 2. Debugging a specific row
SELECT * FROM orders WHERE order_id = 'ORD-12345';
-- Examining all columns to find the problem
 
-- 3. EXISTS pattern (conventional)
SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
    SELECT * FROM orders o WHERE o.customer_id = c.customer_id
);
 
-- 4. Table cloning
CREATE TABLE employees_backup AS SELECT * FROM employees;
 
-- 5. Data migration
INSERT INTO new_employees SELECT * FROM old_employees;
-- Note: Requires identical schemas
 
-- 6. View for access control
CREATE VIEW public_products AS
SELECT * FROM products WHERE is_public = TRUE;
-- All columns, but filtered rows
 
-- 7. CTE for readability when all columns needed
WITH relevant_orders AS (
    SELECT * FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT customer_id, COUNT(*), SUM(total)
FROM relevant_orders
GROUP BY customer_id;
-- Still recommend explicit columns, but * is understandable here
 
-- 8. Quick prototype (mark for refactoring)
-- TODO: Replace with explicit columns before production
SELECT * FROM products WHERE category = 'electronics';

SELECT * Decision Matrix
Context	SELECT * OK?	Recommendation
Interactive SQL session	✓ Yes	Fine for exploration
Application production code	✗ No	Always use explicit columns
EXISTS subquery	✓ Yes	Conventional usage
Stored procedure body	✗ No	Explicit for maintainability
Backup operations	✓ Yes	Appropriate for schema cloning
ORM-generated queries	Depends	Prefer explicit; ORM may handle
Unit test setup	∼ Sometimes	OK for fixtures, not assertions

Development to Production Transition

A practical workflow: Use SELECT * during initial development and exploration. Before code review or production deployment, replace all SELECT * with explicit column lists. Some teams enforce this with linting rules that flag SELECT * in application code.

Qualified Asterisk Patterns

The qualified asterisk (table.* or alias.*) offers more control than naked *, allowing you to select all columns from a specific table while being explicit about other tables.

Syntax:

SELECT table_name.* FROM table_name
SELECT alias.* FROM table_name AS alias

This is particularly useful in joins where you want all columns from one table but only specific columns from others.

qualified_asterisk.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- All columns from one table, specific from another
SELECT e.*, d.department_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id;
-- Returns: all employee columns + department_name only
 
-- Multiple qualified asterisks
SELECT e.*, d.*, l.city, l.country
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id
JOIN locations AS l ON d.location_id = l.location_id;
-- Returns: all from employees, all from departments, specific from locations
 
-- Qualified asterisk in subqueries
SELECT outer_query.*
FROM (
    SELECT e.*, d.department_name
    FROM employees AS e
    JOIN departments AS d ON e.department_id = d.department_id
) AS outer_query
WHERE outer_query.salary > 50000;
 
-- EXCEPT pattern (PostgreSQL, SQL Server 2022+)
-- Select all columns except specific ones
SELECT * EXCEPT (ssn, internal_id) FROM employees;  -- Hypothetical syntax
-- Note: Not standard SQL, but supported by some databases
 
-- Workaround for EXCEPT pattern: Use a view
CREATE VIEW employees_public AS
SELECT employee_id, first_name, last_name, email, department_id, hire_date
FROM employees;
-- Excludes: ssn, salary, internal_notes, etc.
 
SELECT * FROM employees_public;  -- Safe to use * on filtered view
 
-- Table-qualified * with schema qualification
SELECT schema1.table1.*, schema2.table2.column1
FROM schema1.table1
JOIN schema2.table2 ON schema1.table1.id = schema2.table2.table1_id;

Qualified Asterisk Use Cases

•Primary entity with lookup — All columns from main table, only key fields from lookups: SELECT orders.*, customers.customer_name
•Audit logging — Capture all columns from data table plus audit metadata: SELECT t.*, CURRENT_TIMESTAMP AS logged_at
•Denormalization queries — Flatten relationships by pulling all from one side: SELECT o.*, oi.* FROM orders o JOIN order_items oi
•Schema exploration in joins — Quickly see what's in joined tables during development

Qualified vs Unqualified

Using table.* in joins is marginally better than naked *—it makes the source of columns explicit and prevents some accidental inclusion of unneeded columns. However, the best practice remains explicit column listing for production code, even with qualified asterisks.

COUNT(*) and Other Special Cases

The asterisk has a special meaning in the COUNT(*) aggregate function that differs fundamentally from its use in SELECT. Understanding this distinction is important for query correctness and performance.

COUNT(*) vs COUNT(column):

COUNT(*) counts all rows, including those with NULL values in any column
COUNT(column) counts rows where that specific column is not NULL

This distinction is subtle but critical when data contains NULLs.

count_asterisk.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- Sample table with NULLs
-- | id | name  | email         |
-- |----|-------|---------------|
-- | 1  | Alice | alice@ex.com  |
-- | 2  | Bob   | NULL          |
-- | 3  | Carol | carol@ex.com  |
-- | 4  | Dave  | NULL          |
 
-- COUNT(*): Count all rows
SELECT COUNT(*) FROM users;
-- Returns: 4 (all rows, regardless of NULLs)
 
-- COUNT(column): Count non-NULL values
SELECT COUNT(email) FROM users;
-- Returns: 2 (only Alice and Carol have non-NULL emails)
 
-- COUNT(1) vs COUNT(*): Semantically identical
SELECT COUNT(1) FROM users;
SELECT COUNT(*) FROM users;
-- Both return 4; COUNT(1) is just counting a constant
 
-- Performance: Modern databases optimize COUNT(*) and COUNT(1) identically
-- No need to prefer one over the other
 
-- COUNT with GROUP BY
SELECT department_id, COUNT(*) AS total, COUNT(manager_id) AS with_manager
FROM employees
GROUP BY department_id;
-- total: all employees in department
-- with_manager: employees who have a manager assigned
 
-- COUNT(*) in subqueries
SELECT d.department_name,
       (SELECT COUNT(*) FROM employees e WHERE e.department_id = d.department_id) AS emp_count
FROM departments d;
 
-- COUNT(*) vs COUNT(DISTINCT column)
SELECT 
    COUNT(*) AS total_orders,           -- All order rows
    COUNT(customer_id) AS orders_with_customer, -- Orders where customer_id not NULL
    COUNT(DISTINCT customer_id) AS unique_customers  -- Distinct customer count
FROM orders;
 
-- EXISTS with COUNT(*) (inefficient)
-- POOR:
SELECT department_name
FROM departments d
WHERE (SELECT COUNT(*) FROM employees e WHERE e.department_id = d.department_id) > 0;
 
-- BETTER: Use EXISTS (stops at first match)
SELECT department_name
FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.department_id = d.department_id);

COUNT Variations Comparison
Expression	What It Counts	NULL Handling
`COUNT(*)`	All rows in the result	Counts all rows including those with only NULLs
`COUNT(column)`	Rows where column is not NULL	Excludes NULL values
`COUNT(DISTINCT column)`	Distinct non-NULL values	Excludes NULL values
`COUNT(1)` or `COUNT('x')`	All rows (same as COUNT(*))	Counts all rows
`COUNT(expression)`	Rows where expression is not NULL	Excludes NULL results

COUNT(*) Is Optimized

Modern databases recognize COUNT() as 'count all rows' and apply special optimizations. In many cases, COUNT() can be served from an index or table statistics without scanning actual data. Don't try to 'optimize' by changing to COUNT(1) or COUNT(primary_key)—the optimizer handles COUNT(*) efficiently.

Summary: The Asterisk Operator

The asterisk (*) in SELECT is a powerful shortcut with significant implications. Used wisely, it accelerates exploration and fits certain patterns perfectly. Used carelessly in production, it creates fragile, inefficient, and potentially insecure applications.

Let's consolidate the key points:

Key Takeaways

•Asterisk expands at query time — * becomes the current list of all columns in definition order. Schema changes affect expansion.
•In joins, * includes all tables — Creates wide results with potential duplicate column names.
•Qualified t. offers partial control* — Specify which table's columns to include while being explicit about others.
•EXISTS is the exception — SELECT * in EXISTS is conventional and harmless; the values are never examined.
•**Production code should avoid *** — Schema coupling, performance, security, and maintainability all suffer.
•Use * for exploration only — Ad-hoc queries, debugging, prototypes. Replace before production.
•COUNT(*) is special — Counts all rows regardless of NULLs, unlike COUNT(column).

What's next:

We'll explore expressions in SELECT—how to compute values, transform data, apply functions, and create dynamic columns that go beyond simple column retrieval to transform your queries into powerful data processing tools.

Page Complete

You now understand SELECT * comprehensively—its behavior, appropriate uses, serious pitfalls, and the judgment required to use it wisely. This knowledge helps you avoid common mistakes while still leveraging the asterisk's convenience when appropriate.