Database Management SystemsSELECT Basics

SELECT Basics: Mastering Data Retrieval

LevelBeginner

Duration60 mins

TopicSELECT Basics

3 / 5

Column Selection: Precise Data Retrieval

Precision in Data Retrieval

The power of SQL lies in its ability to retrieve exactly the data you need—no more, no less. Column selection is where this precision is realized. While beginners often reach for SELECT * as a convenient shortcut, expert practitioners understand that thoughtful column selection is a cornerstone of efficient, maintainable, and performant database applications.

Column selection determines:

What data crosses the network from database to application
How much memory is needed to process query results
Whether indexes can serve the query without touching table data
How readable and maintainable your queries are for future developers
Whether schema changes will break your application code

This page explores column selection comprehensively, from basic syntax through qualified references, computed columns, and best practices that distinguish professional SQL from amateur attempts.

What You Will Learn

By the end of this page, you will master column selection techniques—from simple column listing through qualified names, ordinal positions, computed columns, and the critical distinctions between development and production query practices.

Basic Column References

At its simplest, column selection involves listing the column names you want to retrieve, separated by commas. This projection operation selects specific attributes from the source relation.

Fundamental rules:

Column names are case-insensitive in most databases when unquoted
Column order in SELECT determines column order in results
Duplicate column references are allowed (same column can appear multiple times)
Column names must exist in the source table(s) or be valid alias references

basic_column_selection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Selecting specific columns
SELECT first_name, last_name, email
FROM employees;
 
-- Column order affects result order
SELECT last_name, first_name, email   -- Different from above
FROM employees;
 
-- Case insensitivity (these are equivalent in most DBs)
SELECT FirstName, LastName, Email    -- PascalCase
FROM Employees;
 
SELECT FIRSTNAME, LASTNAME, EMAIL    -- UPPERCASE
FROM EMPLOYEES;
 
SELECT firstname, lastname, email    -- lowercase
FROM employees;
 
-- Duplicate columns are allowed
SELECT first_name, last_name, first_name AS name_again
FROM employees;
-- Returns: first_name, last_name, name_again (same data, different names)
 
-- Column reference error
SELECT first_name, last_name, middle_name  -- Error if middle_name doesn't exist
FROM employees;                             -- "Unknown column 'middle_name'"
 
-- Selecting from joined tables requires awareness of available columns
SELECT employee_id, first_name, department_name  -- Columns from both tables
FROM employees
JOIN departments ON employees.department_id = departments.department_id;

Case Sensitivity Nuances

While column names are typically case-insensitive, quoted identifiers preserve exact case. PostgreSQL folds unquoted names to lowercase and preserves quoted names. Oracle folds to uppercase. SQL Server is configuration-dependent. For portability, use consistent lowercase with underscores.

Column Name Case Handling by Database
Database	Unquoted Handling	Quoted Handling	Recommended Style
PostgreSQL	Folded to lowercase	Exact case preserved	`snake_case`
MySQL	Case-insensitive (usually)	Case-sensitive on Linux	`snake_case`
SQL Server	Configuration-dependent	Exact case preserved	`PascalCase` or `snake_case`
Oracle	Folded to UPPERCASE	Exact case preserved	`UPPERCASE` or `snake_case`
SQLite	Case-insensitive	Exact case preserved	`snake_case`

Qualified Column Names

When a query involves multiple tables (through joins or subqueries), column names may be ambiguous if the same name exists in multiple tables. Qualified column names resolve this ambiguity by prefixing the column with its table name or alias.

Syntax:

table_name.column_name
--or--
alias.column_name

When qualification is required:

Column name exists in multiple tables in the FROM clause
Improving code clarity and maintainability
Following team or organizational coding standards

When qualification is optional:

Column name is unique across all tables in the query
Single-table queries where ambiguity is impossible

qualified_column_names.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- Ambiguous column reference (ERROR)
SELECT id, name  -- Both tables likely have 'id' and 'name'
FROM employees
JOIN departments ON employees.department_id = departments.id;
-- ERROR: Column 'id' is ambiguous
 
-- Qualified column names resolve ambiguity
SELECT 
    employees.id AS employee_id,
    employees.name AS employee_name,
    departments.id AS department_id,
    departments.name AS department_name
FROM employees
JOIN departments ON employees.department_id = departments.id;
 
-- Using table aliases for brevity
SELECT 
    e.id AS employee_id,
    e.name AS employee_name,
    d.id AS department_id,
    d.name AS department_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.id;
 
-- Self-join requires qualification
SELECT 
    emp.id AS employee_id,
    emp.name AS employee_name,
    mgr.id AS manager_id,
    mgr.name AS manager_name
FROM employees AS emp
LEFT JOIN employees AS mgr ON emp.manager_id = mgr.id;
 
-- Best practice: Always qualify in multi-table queries
SELECT 
    e.id,
    e.first_name,
    e.last_name,
    d.department_name,
    l.city,
    l.country
FROM employees AS e
JOIN departments AS d ON e.department_id = d.id
JOIN locations AS l ON d.location_id = l.id;
 
-- Schema qualification for cross-schema queries
SELECT 
    hr.employees.employee_id,
    hr.employees.first_name,
    sales.orders.order_date,
    sales.orders.total_amount
FROM hr.employees
JOIN sales.orders ON hr.employees.employee_id = sales.orders.salesperson_id;

Column Qualification Best Practices

•Always qualify in multi-table queries — Even if not strictly required, qualification prevents future ambiguity if schemas change.
•Use short, meaningful aliases — e for employees, d for departments makes code concise yet readable.
•Be consistent — If you qualify some columns, qualify all columns for visual consistency.
•Include table name in column alias — e.name AS employee_name prevents confusion in result sets.
•Consider readability vs. brevity — For complex queries, meaningful aliases like emp and mgr beat single letters.

Aliasing Hides Original Names

When you alias a table (FROM employees AS e), you MUST use the alias in column qualification. Writing employees.first_name after aliasing to 'e' will cause an error in most databases. The alias replaces the original name within that query's scope.

Column Ordinals and Positional References

SQL supports positional references—referring to columns by their position number (ordinal) in the SELECT list rather than by name. This feature is primarily used in ORDER BY and GROUP BY clauses.

Syntax:

ORDER BY 1, 2 DESC    -- Sort by first column, then second descending
GROUP BY 1, 2         -- Group by first and second columns

Important caveats:

Ordinals are 1-indexed (first column is 1, not 0)
Ordinals cannot be used in SELECT itself to reference other columns
They work in ORDER BY and GROUP BY, not in WHERE or HAVING
Considered poor practice in production code due to fragility

column_ordinals.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Using ordinals in ORDER BY
SELECT first_name, last_name, salary
FROM employees
ORDER BY 3 DESC;  -- Equivalent to: ORDER BY salary DESC
 
-- Multiple ordinals
SELECT department_id, job_title, COUNT(*) AS emp_count
FROM employees
GROUP BY 1, 2      -- GROUP BY department_id, job_title
ORDER BY 1, 3 DESC; -- ORDER BY department_id, emp_count DESC
 
-- Mixing ordinals and names (valid but inconsistent)
SELECT first_name, last_name, hire_date, salary
FROM employees
ORDER BY 4 DESC, first_name ASC;  -- Works but confusing
 
-- Why ordinals are fragile
-- Original query:
SELECT first_name, last_name, salary
FROM employees
ORDER BY 3 DESC;  -- Orders by salary
 
-- After adding a column:
SELECT first_name, last_name, email, salary
FROM employees
ORDER BY 3 DESC;  -- Now orders by email! (silent bug)
 
-- Safe approach: Always use column names
SELECT first_name, last_name, email, salary
FROM employees
ORDER BY salary DESC;  -- Clear and refactor-safe
 
-- Ordinals can be useful for complex expressions
SELECT 
    department_id,
    CASE 
        WHEN AVG(salary) > 100000 THEN 'High'
        WHEN AVG(salary) > 50000 THEN 'Medium'
        ELSE 'Low'
    END AS salary_tier
FROM employees
GROUP BY 1
ORDER BY 2;  -- Easier than repeating the CASE expression
 
-- Better: Use alias in ORDER BY
SELECT 
    department_id,
    CASE 
        WHEN AVG(salary) > 100000 THEN 'High'
        WHEN AVG(salary) > 50000 THEN 'Medium'
        ELSE 'Low'
    END AS salary_tier
FROM employees
GROUP BY department_id
ORDER BY salary_tier;  -- Uses alias (works in most DBs)

Ordinal Support by Clause
Clause	Ordinals Allowed?	Notes
SELECT	No	Cannot reference other columns by position
WHERE	No	Must use column names or expressions
GROUP BY	Yes	Supported in most databases
HAVING	No	Must use column names or aggregates
ORDER BY	Yes	Widely supported, but discouraged

Ordinals Are an Anti-Pattern

Using column ordinals in production code is widely considered an anti-pattern. It creates fragile queries that break silently when columns are added, removed, or reordered. Always prefer explicit column names or aliases for maintainability and clarity.

Computed Columns and Expressions

Beyond simple column references, SELECT can include expressions—computed values derived from columns, literals, operators, and functions. This capability transforms SELECT from a simple projection into a powerful transformation tool.

Types of expressions in SELECT:

Arithmetic expressions: Mathematical operations on numeric columns
String expressions: Concatenation, substring extraction, formatting
Date/time expressions: Date arithmetic, extraction, formatting
Conditional expressions: CASE statements, COALESCE, NULLIF
Function calls: Built-in and user-defined functions
Subqueries: Scalar subqueries returning single values

computed_columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
-- Arithmetic expressions
SELECT 
    product_name,
    price,
    quantity,
    price * quantity AS line_total,
    price * quantity * 0.08 AS tax_amount,
    price * quantity * 1.08 AS total_with_tax
FROM order_items;
 
-- String expressions
SELECT 
    first_name,
    last_name,
    CONCAT(first_name, ' ', last_name) AS full_name,
    CONCAT(UPPER(LEFT(first_name, 1)), '.', UPPER(LEFT(last_name, 1)), '.') AS initials,
    LENGTH(first_name) + LENGTH(last_name) AS name_length
FROM employees;
 
-- Date/time expressions
SELECT 
    order_id,
    order_date,
    ship_date,
    DATEDIFF(ship_date, order_date) AS days_to_ship,  -- MySQL syntax
    DATE_ADD(order_date, INTERVAL 30 DAY) AS payment_due_date,
    YEAR(order_date) AS order_year,
    MONTH(order_date) AS order_month
FROM orders;
 
-- PostgreSQL date arithmetic
SELECT 
    order_id,
    order_date,
    ship_date,
    ship_date - order_date AS days_to_ship,
    order_date + INTERVAL '30 days' AS payment_due_date,
    EXTRACT(YEAR FROM order_date) AS order_year
FROM orders;
 
-- Conditional expressions with CASE
SELECT 
    employee_id,
    first_name,
    salary,
    CASE 
        WHEN salary >= 100000 THEN 'Executive'
        WHEN salary >= 70000 THEN 'Senior'
        WHEN salary >= 40000 THEN 'Mid-Level'
        ELSE 'Junior'
    END AS salary_band,
    CASE department_id
        WHEN 1 THEN 'Engineering'
        WHEN 2 THEN 'Sales'
        WHEN 3 THEN 'Marketing'
        ELSE 'Other'
    END AS department_name
FROM employees;
 
-- NULL handling expressions
SELECT 
    employee_id,
    first_name,
    commission_pct,
    COALESCE(commission_pct, 0) AS commission_or_zero,
    NULLIF(commission_pct, 0) AS null_if_zero,
    IFNULL(manager_id, 'No Manager') AS manager_display  -- MySQL
FROM employees;
 
-- Scalar subquery (returns single value per row)
SELECT 
    e.employee_id,
    e.first_name,
    e.salary,
    e.department_id,
    (SELECT AVG(salary) FROM employees e2 
     WHERE e2.department_id = e.department_id) AS dept_avg_salary,
    e.salary - (SELECT AVG(salary) FROM employees e2 
                WHERE e2.department_id = e.department_id) AS salary_vs_avg
FROM employees e;

Always Alias Expressions

Without an alias, computed columns get auto-generated names that vary by database (e.g., 'expr1', 'column1', or the entire expression). Always provide meaningful aliases like salary * 12 AS annual_salary for readable results and reliable application code.

Expression Performance Considerations

•Simple arithmetic is cheap — Basic math operations add negligible overhead.
•String operations vary — Concatenation is fast; complex regex is slow.
•Scalar subqueries can be expensive — May execute once per row; prefer JOINs when possible.
•Function calls add overhead — User-defined functions often slower than built-in equivalents.
•CASE is efficiently optimized — Modern optimizers handle CASE expressions well.

Column Aliases Deep Dive

Column aliases rename columns in the result set. While syntactically simple, aliases have important scope rules and database-specific behaviors that affect where and how they can be used.

Alias syntax variations:

column_expression AS alias      -- Standard, explicit
column_expression alias          -- Shorthand (no AS)
column_expression AS "Alias"    -- Quoted for special chars/spaces
column_expression AS [Alias]     -- SQL Server bracket notation

column_aliases.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
-- Standard alias syntax
SELECT 
    first_name AS given_name,
    last_name AS family_name,
    salary AS annual_compensation
FROM employees;
 
-- Without AS keyword (works but less explicit)
SELECT 
    first_name given_name,
    last_name family_name,
    salary annual_compensation
FROM employees;
 
-- Quoted aliases for spaces and special characters
SELECT 
    first_name AS "First Name",
    last_name AS "Last Name",
    salary AS "Annual Salary ($)",
    hire_date AS "Start Date (mm/dd/yyyy)"
FROM employees;
 
-- SQL Server bracket syntax
SELECT 
    first_name AS [First Name],
    salary AS [Annual Salary]
FROM employees;
 
-- Alias scope: Available in ORDER BY
SELECT 
    first_name,
    last_name,
    salary * 12 AS annual_salary
FROM employees
ORDER BY annual_salary DESC;  -- Works: ORDER BY is processed after SELECT
 
-- Alias scope: NOT available in WHERE (processed before SELECT)
SELECT 
    first_name,
    salary * 12 AS annual_salary
FROM employees
WHERE annual_salary > 60000;  -- ERROR! Alias doesn't exist yet
 
-- Workaround 1: Repeat the expression
SELECT 
    first_name,
    salary * 12 AS annual_salary
FROM employees
WHERE salary * 12 > 60000;
 
-- Workaround 2: Use a subquery/CTE
SELECT * FROM (
    SELECT 
        first_name,
        salary * 12 AS annual_salary
    FROM employees
) AS subq
WHERE annual_salary > 60000;
 
-- MySQL special case: Allows alias in GROUP BY (non-standard)
SELECT 
    YEAR(hire_date) AS hire_year,
    COUNT(*) AS emp_count
FROM employees
GROUP BY hire_year;  -- Works in MySQL, fails in many other DBs
 
-- Standard-compliant version
SELECT 
    YEAR(hire_date) AS hire_year,
    COUNT(*) AS emp_count
FROM employees
GROUP BY YEAR(hire_date);

Column Alias Availability by Clause
Clause	Can Use Alias?	Processing Order	Notes
FROM	No	1st	Aliases not defined yet
WHERE	No	2nd	Processed before SELECT
GROUP BY	Usually No*	3rd	*MySQL allows, others don't
HAVING	No	4th	Use aggregate expressions
SELECT	Defined here	5th	Cannot reference other SELECT aliases
ORDER BY	Yes	6th	Processed after SELECT
LIMIT/OFFSET	N/A	7th	Works on final result

Alias Naming Conventions

Use snake_case for unquoted aliases (annual_salary), PascalCase or spaces in quoted aliases for user-facing output. Avoid SQL reserved words as aliases. Keep aliases short but meaningful—full_name is better than fn, but employee_full_name_formatted is too verbose.

Selecting from Multiple Tables

When your FROM clause includes multiple tables (through joins), you can select columns from any of them. This is where qualified column names become essential, and where understanding join semantics affects what data is available.

Key considerations:

Columns from all joined tables are available in SELECT
Ambiguous columns require qualification with table name or alias
NULL values may appear for outer join non-matching rows
Column availability depends on join type (inner vs outer)

multi_table_selection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- Inner join: Columns from both tables
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,
    d.location_id
FROM employees AS e
INNER JOIN departments AS d ON e.department_id = d.department_id;
 
-- Left outer join: All employees, NULLs for unmatched departments
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,    -- NULL if employee has no department
    d.location_id         -- NULL if employee has no department
FROM employees AS e
LEFT JOIN departments AS d ON e.department_id = d.department_id;
 
-- Multiple joins: Columns from all tables
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,
    l.city,
    l.country,
    m.first_name AS manager_first_name,
    m.last_name AS manager_last_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id
JOIN locations AS l ON d.location_id = l.location_id
LEFT JOIN employees AS m ON e.manager_id = m.employee_id;
 
-- Handling NULL from outer joins with COALESCE
SELECT 
    e.employee_id,
    e.first_name,
    COALESCE(d.department_name, 'Unassigned') AS department_name,
    COALESCE(m.first_name || ' ' || m.last_name, 'No Manager') AS manager_name
FROM employees AS e
LEFT JOIN departments AS d ON e.department_id = d.department_id
LEFT JOIN employees AS m ON e.manager_id = m.employee_id;
 
-- Cross join: Every combination (N * M rows)
SELECT 
    e.first_name,
    p.project_name,
    'Possible Assignment' AS assignment_type
FROM employees AS e
CROSS JOIN projects AS p;
 
-- Natural join: Automatically matches same-named columns
SELECT 
    employee_id,      -- No qualification needed (only one source)
    first_name,
    department_name
FROM employees
NATURAL JOIN departments;
-- Warning: NATURAL JOIN is fragile if schemas change

NATURAL JOIN Pitfalls

NATURAL JOIN automatically joins on all columns with matching names. This is convenient but dangerous: adding a column with a common name (like 'id' or 'created_at') to any table can silently change join behavior. Prefer explicit JOIN ON conditions in production code.

Multi-Table Selection Guidelines

•Always use table aliases — Even for two-table joins, aliases improve readability.
•Qualify all columns — Makes origin clear and prevents future ambiguity.
•Handle outer join NULLs — Use COALESCE for display-friendly defaults.
•Consider result size — Cross joins and many-to-many relationships can explode row counts.
•Use meaningful aliases in results — After joining, customer.name and vendor.name both become just 'name' without aliases.

Performance Impact of Column Selection

Column selection has real performance implications that compound at scale. Understanding these effects helps you write queries that not only return correct data but do so efficiently.

Performance factors:

How Column Selection Affects Performance

•I/O reduction — Selecting fewer columns means reading less data from disk, especially for wide tables.
•Network transfer — Result set size affects time to transfer data from database to application.
•Memory usage — Smaller result sets consume less memory on both server and client.
•Covering indexes — If all selected columns exist in an index, the database can serve the query without accessing the table (index-only scan).
•LOB avoidance — Skipping BLOB/CLOB/TEXT columns avoids expensive large object retrieval.

column_selection_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- Table: employees (many columns, including a large 'bio' TEXT column)
-- Index: idx_emp_dept (department_id, employee_id, first_name, last_name)
 
-- POOR: Selecting all columns forces table access
SELECT *
FROM employees
WHERE department_id = 5;
-- Must read: all columns including large 'bio' TEXT
-- Cannot use covering index
 
-- BETTER: Select only needed columns
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 5;
-- Can potentially use covering index (index-only scan)
-- Avoids reading 'bio' and other unnecessary columns
 
-- MEASUREABLE DIFFERENCE at scale:
-- employees table: 1 million rows, 50 columns, average row size 2KB
-- SELECT * retrieves: ~2GB of data
-- SELECT employee_id, first_name: ~50MB of data
 
-- Example: Covering index optimization
-- Index exists: CREATE INDEX idx_covering ON orders(customer_id, order_date, total);
 
-- Uses covering index (very fast)
SELECT order_date, total
FROM orders
WHERE customer_id = 12345;
 
-- Cannot use covering index (must access table)
SELECT order_date, total, shipping_address
FROM orders
WHERE customer_id = 12345;
 
-- Large object column impact
-- Table: documents (id, title, file_content BLOB)
 
-- SLOW: Forces BLOB loading
SELECT * FROM documents WHERE id = 100;
 
-- FAST: Avoids BLOB
SELECT id, title FROM documents WHERE id = 100;
 
-- When you need the BLOB, get just that row
SELECT file_content FROM documents WHERE id = 100;

Production Query Audit

Periodically audit production queries for SELECT * usage. Replace with explicit column lists. This simple change can significantly reduce database load, network traffic, and application memory usage—especially for frequently-executed queries.

SELECT * vs Explicit Columns Impact
Factor	SELECT *	Explicit Columns	Improvement
Data transfer	All columns (often KB/row)	Needed columns only	Often 50-90% reduction
Memory usage	High (entire rows cached)	Low (subset cached)	Proportional to column count
Index usage	Rarely covering	Often covering	Orders of magnitude faster
Schema coupling	Breaks on column add	Stable	Fewer production incidents
Readability	Unclear intent	Self-documenting	Easier maintenance

Summary: Column Selection

Column selection is a fundamental skill that directly impacts query correctness, performance, and maintainability. The precision with which you specify columns distinguishes professional SQL from amateur attempts.

Let's consolidate the key points:

Key Takeaways

•Explicit is better than implicit — List specific columns rather than using SELECT *; document your intent and prevent schema coupling.
•Qualify multi-table references — Use table.column or alias.column to prevent ambiguity and improve readability.
•Avoid column ordinals — Use column names in ORDER BY and GROUP BY for maintainable, refactor-safe queries.
•Leverage expressions — SELECT can compute values, not just retrieve them; use CASE, functions, and arithmetic.
•Alias strategically — Provide clear names for expressions and disambiguate same-named columns from different tables.
•Understand alias scope — Aliases from SELECT are available in ORDER BY but not in WHERE or GROUP BY (in standard SQL).
•Consider performance — Column selection affects I/O, network transfer, memory, and index usage at scale.

What's next:

We'll examine the asterisk (*) operator for selecting all columns—when it's appropriate, when to avoid it, and the nuances of its behavior in different contexts including joins and subqueries.

Page Complete

You now understand column selection comprehensively—from basic references through qualification, ordinals, expressions, aliases, and performance implications. This knowledge enables you to write precise, efficient, and maintainable SELECT statements.

3 / 5

Loading learning content...

Database Management SystemsSELECT Basics

SELECT Basics: Mastering Data Retrieval

LevelBeginner

Duration60 mins

TopicSELECT Basics

3 / 5

Column Selection: Precise Data Retrieval

Precision in Data Retrieval

Column selection determines:

What data crosses the network from database to application
How much memory is needed to process query results
Whether indexes can serve the query without touching table data
How readable and maintainable your queries are for future developers
Whether schema changes will break your application code

This page explores column selection comprehensively, from basic syntax through qualified references, computed columns, and best practices that distinguish professional SQL from amateur attempts.

What You Will Learn

Basic Column References

At its simplest, column selection involves listing the column names you want to retrieve, separated by commas. This projection operation selects specific attributes from the source relation.

Fundamental rules:

Column names are case-insensitive in most databases when unquoted
Column order in SELECT determines column order in results
Duplicate column references are allowed (same column can appear multiple times)
Column names must exist in the source table(s) or be valid alias references

basic_column_selection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Selecting specific columns
SELECT first_name, last_name, email
FROM employees;
 
-- Column order affects result order
SELECT last_name, first_name, email   -- Different from above
FROM employees;
 
-- Case insensitivity (these are equivalent in most DBs)
SELECT FirstName, LastName, Email    -- PascalCase
FROM Employees;
 
SELECT FIRSTNAME, LASTNAME, EMAIL    -- UPPERCASE
FROM EMPLOYEES;
 
SELECT firstname, lastname, email    -- lowercase
FROM employees;
 
-- Duplicate columns are allowed
SELECT first_name, last_name, first_name AS name_again
FROM employees;
-- Returns: first_name, last_name, name_again (same data, different names)
 
-- Column reference error
SELECT first_name, last_name, middle_name  -- Error if middle_name doesn't exist
FROM employees;                             -- "Unknown column 'middle_name'"
 
-- Selecting from joined tables requires awareness of available columns
SELECT employee_id, first_name, department_name  -- Columns from both tables
FROM employees
JOIN departments ON employees.department_id = departments.department_id;

Case Sensitivity Nuances

Column Name Case Handling by Database
Database	Unquoted Handling	Quoted Handling	Recommended Style
PostgreSQL	Folded to lowercase	Exact case preserved	`snake_case`
MySQL	Case-insensitive (usually)	Case-sensitive on Linux	`snake_case`
SQL Server	Configuration-dependent	Exact case preserved	`PascalCase` or `snake_case`
Oracle	Folded to UPPERCASE	Exact case preserved	`UPPERCASE` or `snake_case`
SQLite	Case-insensitive	Exact case preserved	`snake_case`

Qualified Column Names

Syntax:

table_name.column_name
--or--
alias.column_name

When qualification is required:

Column name exists in multiple tables in the FROM clause
Improving code clarity and maintainability
Following team or organizational coding standards

When qualification is optional:

Column name is unique across all tables in the query
Single-table queries where ambiguity is impossible

qualified_column_names.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- Ambiguous column reference (ERROR)
SELECT id, name  -- Both tables likely have 'id' and 'name'
FROM employees
JOIN departments ON employees.department_id = departments.id;
-- ERROR: Column 'id' is ambiguous
 
-- Qualified column names resolve ambiguity
SELECT 
    employees.id AS employee_id,
    employees.name AS employee_name,
    departments.id AS department_id,
    departments.name AS department_name
FROM employees
JOIN departments ON employees.department_id = departments.id;
 
-- Using table aliases for brevity
SELECT 
    e.id AS employee_id,
    e.name AS employee_name,
    d.id AS department_id,
    d.name AS department_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.id;
 
-- Self-join requires qualification
SELECT 
    emp.id AS employee_id,
    emp.name AS employee_name,
    mgr.id AS manager_id,
    mgr.name AS manager_name
FROM employees AS emp
LEFT JOIN employees AS mgr ON emp.manager_id = mgr.id;
 
-- Best practice: Always qualify in multi-table queries
SELECT 
    e.id,
    e.first_name,
    e.last_name,
    d.department_name,
    l.city,
    l.country
FROM employees AS e
JOIN departments AS d ON e.department_id = d.id
JOIN locations AS l ON d.location_id = l.id;
 
-- Schema qualification for cross-schema queries
SELECT 
    hr.employees.employee_id,
    hr.employees.first_name,
    sales.orders.order_date,
    sales.orders.total_amount
FROM hr.employees
JOIN sales.orders ON hr.employees.employee_id = sales.orders.salesperson_id;

Column Qualification Best Practices

•Always qualify in multi-table queries — Even if not strictly required, qualification prevents future ambiguity if schemas change.
•Use short, meaningful aliases — e for employees, d for departments makes code concise yet readable.
•Be consistent — If you qualify some columns, qualify all columns for visual consistency.
•Include table name in column alias — e.name AS employee_name prevents confusion in result sets.
•Consider readability vs. brevity — For complex queries, meaningful aliases like emp and mgr beat single letters.

Aliasing Hides Original Names

Column Ordinals and Positional References

Syntax:

ORDER BY 1, 2 DESC    -- Sort by first column, then second descending
GROUP BY 1, 2         -- Group by first and second columns

Important caveats:

Ordinals are 1-indexed (first column is 1, not 0)
Ordinals cannot be used in SELECT itself to reference other columns
They work in ORDER BY and GROUP BY, not in WHERE or HAVING
Considered poor practice in production code due to fragility

column_ordinals.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Using ordinals in ORDER BY
SELECT first_name, last_name, salary
FROM employees
ORDER BY 3 DESC;  -- Equivalent to: ORDER BY salary DESC
 
-- Multiple ordinals
SELECT department_id, job_title, COUNT(*) AS emp_count
FROM employees
GROUP BY 1, 2      -- GROUP BY department_id, job_title
ORDER BY 1, 3 DESC; -- ORDER BY department_id, emp_count DESC
 
-- Mixing ordinals and names (valid but inconsistent)
SELECT first_name, last_name, hire_date, salary
FROM employees
ORDER BY 4 DESC, first_name ASC;  -- Works but confusing
 
-- Why ordinals are fragile
-- Original query:
SELECT first_name, last_name, salary
FROM employees
ORDER BY 3 DESC;  -- Orders by salary
 
-- After adding a column:
SELECT first_name, last_name, email, salary
FROM employees
ORDER BY 3 DESC;  -- Now orders by email! (silent bug)
 
-- Safe approach: Always use column names
SELECT first_name, last_name, email, salary
FROM employees
ORDER BY salary DESC;  -- Clear and refactor-safe
 
-- Ordinals can be useful for complex expressions
SELECT 
    department_id,
    CASE 
        WHEN AVG(salary) > 100000 THEN 'High'
        WHEN AVG(salary) > 50000 THEN 'Medium'
        ELSE 'Low'
    END AS salary_tier
FROM employees
GROUP BY 1
ORDER BY 2;  -- Easier than repeating the CASE expression
 
-- Better: Use alias in ORDER BY
SELECT 
    department_id,
    CASE 
        WHEN AVG(salary) > 100000 THEN 'High'
        WHEN AVG(salary) > 50000 THEN 'Medium'
        ELSE 'Low'
    END AS salary_tier
FROM employees
GROUP BY department_id
ORDER BY salary_tier;  -- Uses alias (works in most DBs)

Ordinal Support by Clause
Clause	Ordinals Allowed?	Notes
SELECT	No	Cannot reference other columns by position
WHERE	No	Must use column names or expressions
GROUP BY	Yes	Supported in most databases
HAVING	No	Must use column names or aggregates
ORDER BY	Yes	Widely supported, but discouraged

Ordinals Are an Anti-Pattern

Computed Columns and Expressions

Types of expressions in SELECT:

Arithmetic expressions: Mathematical operations on numeric columns
String expressions: Concatenation, substring extraction, formatting
Date/time expressions: Date arithmetic, extraction, formatting
Conditional expressions: CASE statements, COALESCE, NULLIF
Function calls: Built-in and user-defined functions
Subqueries: Scalar subqueries returning single values

computed_columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
-- Arithmetic expressions
SELECT 
    product_name,
    price,
    quantity,
    price * quantity AS line_total,
    price * quantity * 0.08 AS tax_amount,
    price * quantity * 1.08 AS total_with_tax
FROM order_items;
 
-- String expressions
SELECT 
    first_name,
    last_name,
    CONCAT(first_name, ' ', last_name) AS full_name,
    CONCAT(UPPER(LEFT(first_name, 1)), '.', UPPER(LEFT(last_name, 1)), '.') AS initials,
    LENGTH(first_name) + LENGTH(last_name) AS name_length
FROM employees;
 
-- Date/time expressions
SELECT 
    order_id,
    order_date,
    ship_date,
    DATEDIFF(ship_date, order_date) AS days_to_ship,  -- MySQL syntax
    DATE_ADD(order_date, INTERVAL 30 DAY) AS payment_due_date,
    YEAR(order_date) AS order_year,
    MONTH(order_date) AS order_month
FROM orders;
 
-- PostgreSQL date arithmetic
SELECT 
    order_id,
    order_date,
    ship_date,
    ship_date - order_date AS days_to_ship,
    order_date + INTERVAL '30 days' AS payment_due_date,
    EXTRACT(YEAR FROM order_date) AS order_year
FROM orders;
 
-- Conditional expressions with CASE
SELECT 
    employee_id,
    first_name,
    salary,
    CASE 
        WHEN salary >= 100000 THEN 'Executive'
        WHEN salary >= 70000 THEN 'Senior'
        WHEN salary >= 40000 THEN 'Mid-Level'
        ELSE 'Junior'
    END AS salary_band,
    CASE department_id
        WHEN 1 THEN 'Engineering'
        WHEN 2 THEN 'Sales'
        WHEN 3 THEN 'Marketing'
        ELSE 'Other'
    END AS department_name
FROM employees;
 
-- NULL handling expressions
SELECT 
    employee_id,
    first_name,
    commission_pct,
    COALESCE(commission_pct, 0) AS commission_or_zero,
    NULLIF(commission_pct, 0) AS null_if_zero,
    IFNULL(manager_id, 'No Manager') AS manager_display  -- MySQL
FROM employees;
 
-- Scalar subquery (returns single value per row)
SELECT 
    e.employee_id,
    e.first_name,
    e.salary,
    e.department_id,
    (SELECT AVG(salary) FROM employees e2 
     WHERE e2.department_id = e.department_id) AS dept_avg_salary,
    e.salary - (SELECT AVG(salary) FROM employees e2 
                WHERE e2.department_id = e.department_id) AS salary_vs_avg
FROM employees e;

Always Alias Expressions

Expression Performance Considerations

•Simple arithmetic is cheap — Basic math operations add negligible overhead.
•String operations vary — Concatenation is fast; complex regex is slow.
•Scalar subqueries can be expensive — May execute once per row; prefer JOINs when possible.
•Function calls add overhead — User-defined functions often slower than built-in equivalents.
•CASE is efficiently optimized — Modern optimizers handle CASE expressions well.

Column Aliases Deep Dive

Column aliases rename columns in the result set. While syntactically simple, aliases have important scope rules and database-specific behaviors that affect where and how they can be used.

Alias syntax variations:

column_expression AS alias      -- Standard, explicit
column_expression alias          -- Shorthand (no AS)
column_expression AS "Alias"    -- Quoted for special chars/spaces
column_expression AS [Alias]     -- SQL Server bracket notation

column_aliases.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
-- Standard alias syntax
SELECT 
    first_name AS given_name,
    last_name AS family_name,
    salary AS annual_compensation
FROM employees;
 
-- Without AS keyword (works but less explicit)
SELECT 
    first_name given_name,
    last_name family_name,
    salary annual_compensation
FROM employees;
 
-- Quoted aliases for spaces and special characters
SELECT 
    first_name AS "First Name",
    last_name AS "Last Name",
    salary AS "Annual Salary ($)",
    hire_date AS "Start Date (mm/dd/yyyy)"
FROM employees;
 
-- SQL Server bracket syntax
SELECT 
    first_name AS [First Name],
    salary AS [Annual Salary]
FROM employees;
 
-- Alias scope: Available in ORDER BY
SELECT 
    first_name,
    last_name,
    salary * 12 AS annual_salary
FROM employees
ORDER BY annual_salary DESC;  -- Works: ORDER BY is processed after SELECT
 
-- Alias scope: NOT available in WHERE (processed before SELECT)
SELECT 
    first_name,
    salary * 12 AS annual_salary
FROM employees
WHERE annual_salary > 60000;  -- ERROR! Alias doesn't exist yet
 
-- Workaround 1: Repeat the expression
SELECT 
    first_name,
    salary * 12 AS annual_salary
FROM employees
WHERE salary * 12 > 60000;
 
-- Workaround 2: Use a subquery/CTE
SELECT * FROM (
    SELECT 
        first_name,
        salary * 12 AS annual_salary
    FROM employees
) AS subq
WHERE annual_salary > 60000;
 
-- MySQL special case: Allows alias in GROUP BY (non-standard)
SELECT 
    YEAR(hire_date) AS hire_year,
    COUNT(*) AS emp_count
FROM employees
GROUP BY hire_year;  -- Works in MySQL, fails in many other DBs
 
-- Standard-compliant version
SELECT 
    YEAR(hire_date) AS hire_year,
    COUNT(*) AS emp_count
FROM employees
GROUP BY YEAR(hire_date);

Column Alias Availability by Clause
Clause	Can Use Alias?	Processing Order	Notes
FROM	No	1st	Aliases not defined yet
WHERE	No	2nd	Processed before SELECT
GROUP BY	Usually No*	3rd	*MySQL allows, others don't
HAVING	No	4th	Use aggregate expressions
SELECT	Defined here	5th	Cannot reference other SELECT aliases
ORDER BY	Yes	6th	Processed after SELECT
LIMIT/OFFSET	N/A	7th	Works on final result

Alias Naming Conventions

Selecting from Multiple Tables

Key considerations:

Columns from all joined tables are available in SELECT
Ambiguous columns require qualification with table name or alias
NULL values may appear for outer join non-matching rows
Column availability depends on join type (inner vs outer)

multi_table_selection.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- Inner join: Columns from both tables
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,
    d.location_id
FROM employees AS e
INNER JOIN departments AS d ON e.department_id = d.department_id;
 
-- Left outer join: All employees, NULLs for unmatched departments
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,    -- NULL if employee has no department
    d.location_id         -- NULL if employee has no department
FROM employees AS e
LEFT JOIN departments AS d ON e.department_id = d.department_id;
 
-- Multiple joins: Columns from all tables
SELECT 
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,
    l.city,
    l.country,
    m.first_name AS manager_first_name,
    m.last_name AS manager_last_name
FROM employees AS e
JOIN departments AS d ON e.department_id = d.department_id
JOIN locations AS l ON d.location_id = l.location_id
LEFT JOIN employees AS m ON e.manager_id = m.employee_id;
 
-- Handling NULL from outer joins with COALESCE
SELECT 
    e.employee_id,
    e.first_name,
    COALESCE(d.department_name, 'Unassigned') AS department_name,
    COALESCE(m.first_name || ' ' || m.last_name, 'No Manager') AS manager_name
FROM employees AS e
LEFT JOIN departments AS d ON e.department_id = d.department_id
LEFT JOIN employees AS m ON e.manager_id = m.employee_id;
 
-- Cross join: Every combination (N * M rows)
SELECT 
    e.first_name,
    p.project_name,
    'Possible Assignment' AS assignment_type
FROM employees AS e
CROSS JOIN projects AS p;
 
-- Natural join: Automatically matches same-named columns
SELECT 
    employee_id,      -- No qualification needed (only one source)
    first_name,
    department_name
FROM employees
NATURAL JOIN departments;
-- Warning: NATURAL JOIN is fragile if schemas change

NATURAL JOIN Pitfalls

Multi-Table Selection Guidelines

•Always use table aliases — Even for two-table joins, aliases improve readability.
•Qualify all columns — Makes origin clear and prevents future ambiguity.
•Handle outer join NULLs — Use COALESCE for display-friendly defaults.
•Consider result size — Cross joins and many-to-many relationships can explode row counts.
•Use meaningful aliases in results — After joining, customer.name and vendor.name both become just 'name' without aliases.

Performance Impact of Column Selection

Column selection has real performance implications that compound at scale. Understanding these effects helps you write queries that not only return correct data but do so efficiently.

Performance factors:

How Column Selection Affects Performance

•I/O reduction — Selecting fewer columns means reading less data from disk, especially for wide tables.
•Network transfer — Result set size affects time to transfer data from database to application.
•Memory usage — Smaller result sets consume less memory on both server and client.
•Covering indexes — If all selected columns exist in an index, the database can serve the query without accessing the table (index-only scan).
•LOB avoidance — Skipping BLOB/CLOB/TEXT columns avoids expensive large object retrieval.

column_selection_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- Table: employees (many columns, including a large 'bio' TEXT column)
-- Index: idx_emp_dept (department_id, employee_id, first_name, last_name)
 
-- POOR: Selecting all columns forces table access
SELECT *
FROM employees
WHERE department_id = 5;
-- Must read: all columns including large 'bio' TEXT
-- Cannot use covering index
 
-- BETTER: Select only needed columns
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 5;
-- Can potentially use covering index (index-only scan)
-- Avoids reading 'bio' and other unnecessary columns
 
-- MEASUREABLE DIFFERENCE at scale:
-- employees table: 1 million rows, 50 columns, average row size 2KB
-- SELECT * retrieves: ~2GB of data
-- SELECT employee_id, first_name: ~50MB of data
 
-- Example: Covering index optimization
-- Index exists: CREATE INDEX idx_covering ON orders(customer_id, order_date, total);
 
-- Uses covering index (very fast)
SELECT order_date, total
FROM orders
WHERE customer_id = 12345;
 
-- Cannot use covering index (must access table)
SELECT order_date, total, shipping_address
FROM orders
WHERE customer_id = 12345;
 
-- Large object column impact
-- Table: documents (id, title, file_content BLOB)
 
-- SLOW: Forces BLOB loading
SELECT * FROM documents WHERE id = 100;
 
-- FAST: Avoids BLOB
SELECT id, title FROM documents WHERE id = 100;
 
-- When you need the BLOB, get just that row
SELECT file_content FROM documents WHERE id = 100;

Production Query Audit

SELECT * vs Explicit Columns Impact
Factor	SELECT *	Explicit Columns	Improvement
Data transfer	All columns (often KB/row)	Needed columns only	Often 50-90% reduction
Memory usage	High (entire rows cached)	Low (subset cached)	Proportional to column count
Index usage	Rarely covering	Often covering	Orders of magnitude faster
Schema coupling	Breaks on column add	Stable	Fewer production incidents
Readability	Unclear intent	Self-documenting	Easier maintenance

Summary: Column Selection

Let's consolidate the key points:

Key Takeaways

•Explicit is better than implicit — List specific columns rather than using SELECT *; document your intent and prevent schema coupling.
•Qualify multi-table references — Use table.column or alias.column to prevent ambiguity and improve readability.
•Avoid column ordinals — Use column names in ORDER BY and GROUP BY for maintainable, refactor-safe queries.
•Leverage expressions — SELECT can compute values, not just retrieve them; use CASE, functions, and arithmetic.
•Alias strategically — Provide clear names for expressions and disambiguate same-named columns from different tables.
•Understand alias scope — Aliases from SELECT are available in ORDER BY but not in WHERE or GROUP BY (in standard SQL).
•Consider performance — Column selection affects I/O, network transfer, memory, and index usage at scale.

What's next:

We'll examine the asterisk (*) operator for selecting all columns—when it's appropriate, when to avoid it, and the nuances of its behavior in different contexts including joins and subqueries.

Page Complete

3 / 5