Database Management SystemsSELECT Basics

SELECT Basics: Mastering Data Retrieval

LevelBeginner

Duration60 mins

TopicSELECT Basics

1 / 5

The SELECT Clause: Foundation of Data Retrieval

The Most Important Keyword in SQL

If you had to learn only one SQL keyword, it would be SELECT. This seemingly simple statement is the gateway to every piece of data stored in relational databases worldwide. From the moment you type SELECT until you press enter, you are engaging with a sophisticated data retrieval system that powers virtually every modern application.

The SELECT statement accounts for approximately 80-90% of all SQL statements executed in production database systems. Whether you're building a search feature, generating reports, populating dashboards, or debugging data issues, SELECT is your primary tool. Understanding it deeply—not just superficially—separates proficient database practitioners from true experts.

This page provides an exhaustive examination of the SELECT clause itself: its syntax, semantics, execution model, and the subtle nuances that affect query behavior and performance.

What You Will Learn

By the end of this page, you will understand the SELECT clause at a fundamental level—its purpose, syntax variations, processing semantics, and how it interacts with the rest of the SQL statement. You'll gain insight into what happens internally when a SELECT statement executes and how to think about query construction like a database expert.

Understanding SELECT Fundamentally

At its core, SELECT is a declarative instruction that tells the database management system (DBMS) what data you want, not how to get it. This declarative nature is fundamental to understanding SQL and distinguishes it from procedural programming languages where you specify step-by-step instructions.

The philosophical foundation:

When you write a SELECT statement, you are expressing a question in a specialized language designed for data retrieval. The database engine's optimizer then determines the most efficient way to answer that question. This separation of what from how is one of SQL's greatest strengths—it allows the database to optimize execution based on data statistics, available indexes, and system resources.

The SELECT clause specifically:

Within a complete SELECT statement (which includes FROM, WHERE, GROUP BY, HAVING, ORDER BY, and other clauses), the SELECT clause defines the projection—which columns or expressions appear in the result set. In relational algebra terms, projection selects a subset of attributes from a relation.

Declarative vs. Procedural

In procedural code, you might write: 'Open file, read each line, check if name equals John, add to results.' In SQL, you simply write: 'SELECT * FROM employees WHERE name = "John"'. The database figures out how to execute this efficiently. This abstraction enables the same query to perform optimally across different data sizes and hardware configurations.

select_basic_anatomy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- The simplest possible SELECT statement
SELECT 1;
-- Returns: 1 (a single row with a single column containing the value 1)
 
-- SELECT with a constant expression
SELECT 'Hello, SQL World!';
-- Returns: Hello, SQL World!
 
-- The canonical form: SELECT from a table
SELECT column_name FROM table_name;
 
-- A complete example with real semantics
SELECT 
    employee_id,      -- Retrieve the employee_id column
    first_name,       -- Retrieve the first_name column  
    last_name,        -- Retrieve the last_name column
    hire_date         -- Retrieve the hire_date column
FROM 
    employees;        -- From the employees table

Key observations from the examples above:

SELECT can exist without FROM: SELECT 1 is valid SQL that returns a constant. This is useful for testing database connectivity, computing expressions, and certain advanced query patterns.
Column names form the projection list: When you list column names after SELECT, you're defining exactly which attributes from the source table(s) appear in your result.
Order matters: The order in which you list columns in the SELECT clause determines the order of columns in the result set.
Readability is crucial: While SQL is whitespace-insensitive, formatting queries across multiple lines with proper indentation dramatically improves readability and maintainability.

SELECT Clause Syntax Deep Dive

The SELECT clause follows a precise syntactic structure that, once mastered, allows you to express virtually any data retrieval pattern. Let's examine the complete syntax specification and understand each component.

Formal syntax:

SELECT [ALL | DISTINCT]
    select_expression [, select_expression ...]
    [INTO variable [, variable ...]]

Each select_expression can be:

A column name (possibly qualified with a table name or alias)
A literal value (string, number, date, etc.)
An expression involving operators and functions
A subquery returning a single value (scalar subquery)
The asterisk (*) representing all columns

SELECT Clause Components
Component	Required	Purpose	Example
`SELECT`	Yes	Keyword initiating the clause	`SELECT`
`ALL`	No (default)	Include all rows including duplicates	`SELECT ALL name`
`DISTINCT`	No	Remove duplicate rows from result	`SELECT DISTINCT city`
select_expression	Yes (at least one)	Define what data to retrieve	`first_name, last_name`
Alias (`AS`)	No	Rename column in result set	`salary AS annual_pay`
`INTO`	No	Store result in variables (procedural SQL)	`INTO @var1, @var2`

select_syntax_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Basic column selection
SELECT employee_id, first_name, last_name
FROM employees;
 
-- Using DISTINCT to remove duplicates
SELECT DISTINCT department_id
FROM employees;
 
-- Column aliasing with AS keyword
SELECT 
    first_name AS given_name,
    last_name AS family_name,
    salary AS annual_compensation
FROM employees;
 
-- Column aliasing without AS (valid but less readable)
SELECT 
    first_name given_name,
    last_name family_name
FROM employees;
 
-- Aliasing with quoted identifiers for special characters/spaces
SELECT 
    first_name AS "First Name",
    last_name AS "Family Name",
    salary AS "Annual Salary (USD)"
FROM employees;
 
-- Table-qualified column names (essential for joins)
SELECT 
    employees.employee_id,
    employees.first_name,
    departments.department_name
FROM employees, departments
WHERE employees.department_id = departments.department_id;
 
-- Using table aliases for brevity
SELECT 
    e.employee_id,
    e.first_name,
    d.department_name
FROM employees e, departments d
WHERE e.department_id = d.department_id;

Best Practice: Always Use Explicit Column Names

While SELECT * is convenient for ad-hoc exploration, production code should explicitly name columns. This prevents issues when table schemas change, makes queries self-documenting, and can improve performance by avoiding retrieval of unnecessary data.

The SELECT Processing Model

Understanding when the SELECT clause is processed relative to other clauses is crucial for writing correct and efficient queries. Despite appearing first in the written query, the SELECT clause is processed near the end of query execution.

Logical query processing order:

SQL statements have a defined logical order of evaluation that differs from their written syntax. This order determines what aliases are visible where, what expressions are valid in which clauses, and how the final result set is constructed.

SQL Logical Processing Order
Step	Clause	Operation	Available Aliases
1	`FROM`	Identify source tables and apply joins	Table aliases only
2	`WHERE`	Filter rows based on conditions	Table aliases, column names
3	`GROUP BY`	Group rows by specified columns	Table aliases, column names
4	`HAVING`	Filter groups based on aggregate conditions	Column names, aggregates
5	`SELECT`	Evaluate expressions, apply aliases	All previous + column aliases
6	`DISTINCT`	Remove duplicate rows	All aliases available
7	`ORDER BY`	Sort the result set	Column aliases, column positions
8	`LIMIT/OFFSET`	Restrict number of rows returned	N/A

processing_order_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- This query demonstrates why processing order matters
 
SELECT 
    department_id,
    COUNT(*) AS employee_count,       -- Alias defined in step 5
    AVG(salary) AS average_salary     -- Alias defined in step 5
FROM employees
WHERE hire_date > '2020-01-01'        -- Step 2: Cannot use employee_count here!
GROUP BY department_id                 -- Step 3: Groups are formed
HAVING COUNT(*) > 5                    -- Step 4: Must use COUNT(*), not alias
ORDER BY average_salary DESC;          -- Step 7: CAN use alias (some databases)
 
-- INCORRECT: Trying to use SELECT alias in WHERE
-- SELECT salary * 12 AS annual_salary
-- FROM employees
-- WHERE annual_salary > 50000;  -- ERROR! annual_salary doesn't exist yet
 
-- CORRECT: Repeat the expression in WHERE
SELECT salary * 12 AS annual_salary
FROM employees
WHERE salary * 12 > 50000;  -- Use the original expression
 
-- OR use a subquery/CTE
SELECT * FROM (
    SELECT employee_id, salary * 12 AS annual_salary
    FROM employees
) AS subq
WHERE annual_salary > 50000;  -- Now it works!

Common Pitfall: Alias Availability

One of the most common SQL errors is trying to use a column alias defined in SELECT within the WHERE or GROUP BY clause. Remember: SELECT is processed after WHERE and GROUP BY, so those aliases don't exist yet during earlier processing stages.

Practical implications of processing order:

Filtering efficiency: Conditions in WHERE are evaluated before grouping and projection, so filtering early reduces the data volume for subsequent operations.
Aggregate awareness: You cannot use aggregate functions in WHERE because aggregates are computed during/after GROUP BY, which comes after WHERE.
Expression recomputation: If you use the same expression in SELECT and WHERE, the optimizer typically computes it once, but the logical model suggests separate evaluations.
ORDER BY flexibility: ORDER BY is processed last (before LIMIT), so it can reference column aliases defined in SELECT.

What SELECT Actually Returns

The result of a SELECT statement is always a relation (in formal terms) or a result set (in practical terms). Understanding the properties of this result is fundamental to working with SQL effectively.

Properties of a SELECT result:

Tabular structure: The result is always rectangular—every row has the same columns in the same order.
Column metadata: Each column has a name (derived from the source or from an alias) and a data type (determined by the expression).
Row sequence: Without ORDER BY, the row order is undefined and non-deterministic. The same query may return rows in different orders across different executions.
Duplicate handling: By default (SELECT ALL), duplicates are preserved. SELECT DISTINCT removes duplicate rows.

result_set_properties.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Example table: products
-- | product_id | name        | category    | price  |
-- |------------|-------------|-------------|--------|
-- | 1          | Widget A    | Electronics | 29.99  |
-- | 2          | Widget B    | Electronics | 49.99  |
-- | 3          | Gadget X    | Electronics | 29.99  |
-- | 4          | Tool Y      | Hardware    | 15.00  |
-- | 5          | Tool Z      | Hardware    | 15.00  |
 
-- Query 1: Returns all columns with original names
SELECT product_id, name, category, price
FROM products;
-- Result columns: product_id, name, category, price
 
-- Query 2: Column order matches SELECT list ORDER
SELECT price, name, product_id
FROM products;
-- Result columns: price, name, product_id  (different order!)
 
-- Query 3: Expressions derive their types
SELECT 
    name,
    price,
    price * 1.1 AS price_with_tax,     -- Numeric type (DECIMAL)
    CONCAT(name, ' - ', category),      -- String type (VARCHAR)
    price > 20 AS is_expensive          -- Boolean type (in supporting DBs)
FROM products;
 
-- Query 4: Demonstrating duplicate handling
SELECT category FROM products;
-- Returns: Electronics, Electronics, Electronics, Hardware, Hardware
-- (5 rows with duplicates)
 
SELECT DISTINCT category FROM products;
-- Returns: Electronics, Hardware
-- (2 rows, duplicates removed)

Result Set Characteristics

•Row count: Can be zero (empty result), one, or many. The result is valid SQL even when empty.
•Column count: At least one column required. Can be hundreds of columns in extreme cases.
•NULL values: Any column that can contain NULL in source data may produce NULL in results.
•Data types: Result column types are inferred from source columns and expressions.
•Materialization: The result may or may not be fully materialized in memory depending on how it's consumed.

Results Are Transient

A SELECT result set exists only for the duration of query execution and result transmission. Unlike tables, result sets aren't stored permanently. To persist results, you must explicitly use INSERT INTO...SELECT, CREATE TABLE AS SELECT, or application-level storage.

Column Naming Rules and Conflict Resolution

When constructing SELECT queries, especially those involving multiple tables or complex expressions, column naming becomes a critical consideration. Understanding how databases handle column names prevents ambiguity errors and ensures query maintainability.

Column name derivation rules:

Result set column names are determined by the following priority:

Explicit alias: If you provide an alias with AS, that becomes the column name
Source column name: If selecting a simple column, its name is preserved
Generated name: For expressions without aliases, databases generate names (which vary by DBMS)

column_naming.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Source column name preservation
SELECT first_name FROM employees;
-- Result column name: first_name
 
-- Explicit aliasing overrides source name
SELECT first_name AS given_name FROM employees;
-- Result column name: given_name
 
-- Expression without alias: generated name (DBMS-dependent)
SELECT UPPER(first_name) FROM employees;
-- MySQL:      UPPER(first_name)
-- PostgreSQL: upper
-- SQL Server: (No column name)
-- Oracle:     UPPER(FIRST_NAME)
 
-- Always alias expressions for consistent, readable results
SELECT UPPER(first_name) AS upper_first_name FROM employees;
-- Result column name: upper_first_name (consistent across all DBMS)
 
-- Handling column name conflicts in joins
SELECT 
    employees.name,      -- Qualified: employees.name
    departments.name     -- Qualified: departments.name
FROM employees
JOIN departments ON employees.department_id = departments.id;
-- PROBLEM: Both columns named "name" in result - ambiguous!
 
-- Solution: Use aliases to disambiguate
SELECT 
    employees.name AS employee_name,
    departments.name AS department_name
FROM employees
JOIN departments ON employees.department_id = departments.id;
-- Clear result: employee_name, department_name

Column Alias Rules by Database
Feature	MySQL	PostgreSQL	SQL Server	Oracle
AS keyword	Optional	Optional	Optional	Optional
Quoted identifiers	Backticks ` or "	Double quotes "	Square brackets [] or "	Double quotes "
Case sensitivity	Case-insensitive	Lowercase unless quoted	Case-insensitive	Uppercase unless quoted
Reserved words as aliases	Requires quoting	Requires quoting	Requires quoting	Requires quoting
Spaces in aliases	Requires quoting	Requires quoting	Requires quoting	Requires quoting

Defensive Aliasing Strategy

For maximum portability and clarity: (1) Always use AS explicitly, (2) Use snake_case for alias names without quotes, (3) Avoid reserved words, (4) Alias ALL expressions, not just complex ones. This prevents surprises when queries move between databases.

SELECT and NULL Semantics

NULL handling is one of the most nuanced aspects of SQL, and the SELECT clause is where NULL values become visible in query results. Understanding how NULL propagates through expressions and how it affects result sets is essential for writing correct queries.

What NULL means:

In SQL, NULL represents the absence of a value. It is not zero, not an empty string, not false—it is unknown or missing data. This has profound implications for how expressions involving NULL are evaluated.

null_semantics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- NULL in source data propagates to results
-- Table: employees
-- | id | name  | manager_id | commission |
-- |----|-------|------------|------------|
-- | 1  | Alice | NULL       | 1000       |
-- | 2  | Bob   | 1          | NULL       |
-- | 3  | Carol | 1          | 500        |
 
SELECT name, manager_id, commission
FROM employees;
-- Alice has NULL manager_id (she's top-level)
-- Bob has NULL commission (no commission data)
 
-- NULL propagation in expressions
SELECT 
    name,
    commission,
    commission * 2 AS double_commission
FROM employees;
-- For Bob: commission * 2 = NULL * 2 = NULL
-- Any arithmetic with NULL produces NULL
 
-- NULL in string concatenation
SELECT 
    name,
    CONCAT('Manager: ', manager_id) AS manager_info  -- Varies by DBMS
FROM employees;
-- In standard SQL: NULL concatenation = NULL
-- MySQL CONCAT: NULL becomes empty string (non-standard!)
-- Use COALESCE for safety:
SELECT 
    name,
    CONCAT('Manager: ', COALESCE(CAST(manager_id AS CHAR), 'None')) AS manager_info
FROM employees;
 
-- NULL affects DISTINCT processing
SELECT DISTINCT manager_id FROM employees;
-- Returns: NULL, 1
-- Multiple NULLs are considered equal for DISTINCT purposes
 
-- Counting NULL vs non-NULL
SELECT 
    COUNT(*) AS total_rows,              -- Counts all rows: 3
    COUNT(commission) AS with_commission, -- Counts non-NULL: 2
    COUNT(manager_id) AS with_manager     -- Counts non-NULL: 2
FROM employees;

NULL Propagation Rule

Almost any operation involving NULL returns NULL. This includes arithmetic (+, -, *, /), most string functions, and comparisons (NULL = NULL is NULL, not TRUE). The exceptions are functions specifically designed to handle NULL like COALESCE, NULLIF, and IS NULL predicates.

NULL Handling Best Practices in SELECT

•Use COALESCE for defaults: COALESCE(commission, 0) returns 0 if commission is NULL
•Use NULLIF for conditional NULL: NULLIF(value, 0) returns NULL if value equals 0
•Consider IFNULL/NVL/ISNULL: Database-specific alternatives to COALESCE for two arguments
•Test with IS NULL/IS NOT NULL: Never use = NULL for comparisons; use IS NULL
•Document NULL expectations: Make clear in aliases or comments when NULL is possible

Performance Considerations for SELECT

While the SELECT clause itself is not typically the performance bottleneck in query execution, certain practices in writing SELECT clauses can significantly impact overall query performance. Understanding these considerations helps you write efficient queries from the start.

Key performance factors:

SELECT Clause Performance Factors

•Column selection: Retrieving fewer columns means less data transfer from disk and over the network. SELECT * retrieves all columns, even those you don't need.
•Expression complexity: Complex expressions in SELECT (subqueries, function chains) are evaluated for every row in the result set.
•DISTINCT overhead: DISTINCT requires sorting or hashing to identify duplicates—avoid when not needed.
•Covering indexes: If SELECT retrieves only columns present in an index, the database can serve the query without touching the table data (index-only scan).
•Large object columns: Selecting BLOB/CLOB/TEXT columns reads potentially megabytes of data per row.

select_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- POOR: Selects all columns when only id and name needed
SELECT * FROM employees WHERE department_id = 5;
-- Reads: id, name, email, phone, address, hire_date, salary, bio, photo...
 
-- BETTER: Select only needed columns
SELECT employee_id, first_name, last_name 
FROM employees 
WHERE department_id = 5;
-- Reads: only 3 columns, potentially from covering index
 
-- POOR: Unnecessary DISTINCT with primary key
SELECT DISTINCT employee_id, first_name FROM employees;
-- employee_id is unique, so DISTINCT does nothing but add sorting cost
 
-- BETTER: Remove unnecessary DISTINCT
SELECT employee_id, first_name FROM employees;
 
-- POOR: Expensive function call on every row
SELECT 
    employee_id,
    (SELECT department_name FROM departments d 
     WHERE d.id = e.department_id) AS dept_name  -- Scalar subquery per row!
FROM employees e;
 
-- BETTER: Use JOIN instead of scalar subquery
SELECT 
    e.employee_id,
    d.department_name AS dept_name
FROM employees e
JOIN departments d ON e.department_id = d.id;
 
-- CONSIDERATION: Covering index optimization
-- If index exists on (department_id, first_name, last_name):
SELECT first_name, last_name 
FROM employees 
WHERE department_id = 5;
-- Can be served entirely from the index without table access

The SELECT * Anti-Pattern

Avoid SELECT * in production code for three reasons: (1) Performance—retrieves unnecessary data, (2) Fragility—query breaks or changes meaning when columns are added/removed, (3) Clarity—readers can't tell what data is actually needed. Reserve SELECT * for ad-hoc exploration only.

Summary: The SELECT Clause

We've explored the SELECT clause in depth—from its fundamental purpose through syntax, processing model, and performance implications. Let's consolidate the key points:

Key Takeaways

•SELECT defines projection — It specifies which columns and expressions appear in the result set, implementing the relational projection operation.
•Declarative nature — SELECT tells the database what you want, not how to get it. The optimizer determines execution strategy.
•Processing order matters — Despite appearing first syntactically, SELECT is processed late, after FROM, WHERE, GROUP BY, and HAVING.
•Aliases with AS — Always use explicit aliases for expressions and renamed columns. Some databases allow alias reuse in ORDER BY.
•NULL propagates — Operations involving NULL generally yield NULL. Use COALESCE and IS NULL for proper handling.
•Performance awareness — Select only needed columns, avoid unnecessary DISTINCT, prefer JOINs over scalar subqueries.
•Result sets are relations — The output is tabular, with defined columns and types, but undefined row order without ORDER BY.

What's next:

With a solid understanding of the SELECT clause, we'll next examine the FROM clause—which specifies the source tables and is actually processed before SELECT in the logical query order. Understanding FROM is essential for comprehending how data enters the query pipeline.

Page Complete

You now have a comprehensive understanding of the SELECT clause—its syntax, semantics, processing model, and practical considerations. This foundation prepares you for all subsequent SQL learning, as SELECT is the cornerstone of every data retrieval operation.

1 / 5

Loading learning content...

Database Management SystemsSELECT Basics

SELECT Basics: Mastering Data Retrieval

LevelBeginner

Duration60 mins

TopicSELECT Basics

1 / 5

The SELECT Clause: Foundation of Data Retrieval

The Most Important Keyword in SQL

This page provides an exhaustive examination of the SELECT clause itself: its syntax, semantics, execution model, and the subtle nuances that affect query behavior and performance.

What You Will Learn

Understanding SELECT Fundamentally

The philosophical foundation:

The SELECT clause specifically:

Declarative vs. Procedural

select_basic_anatomy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- The simplest possible SELECT statement
SELECT 1;
-- Returns: 1 (a single row with a single column containing the value 1)
 
-- SELECT with a constant expression
SELECT 'Hello, SQL World!';
-- Returns: Hello, SQL World!
 
-- The canonical form: SELECT from a table
SELECT column_name FROM table_name;
 
-- A complete example with real semantics
SELECT 
    employee_id,      -- Retrieve the employee_id column
    first_name,       -- Retrieve the first_name column  
    last_name,        -- Retrieve the last_name column
    hire_date         -- Retrieve the hire_date column
FROM 
    employees;        -- From the employees table

Key observations from the examples above:

SELECT can exist without FROM: SELECT 1 is valid SQL that returns a constant. This is useful for testing database connectivity, computing expressions, and certain advanced query patterns.
Column names form the projection list: When you list column names after SELECT, you're defining exactly which attributes from the source table(s) appear in your result.
Order matters: The order in which you list columns in the SELECT clause determines the order of columns in the result set.
Readability is crucial: While SQL is whitespace-insensitive, formatting queries across multiple lines with proper indentation dramatically improves readability and maintainability.

SELECT Clause Syntax Deep Dive

Formal syntax:

SELECT [ALL | DISTINCT]
    select_expression [, select_expression ...]
    [INTO variable [, variable ...]]

Each select_expression can be:

A column name (possibly qualified with a table name or alias)
A literal value (string, number, date, etc.)
An expression involving operators and functions
A subquery returning a single value (scalar subquery)
The asterisk (*) representing all columns

SELECT Clause Components
Component	Required	Purpose	Example
`SELECT`	Yes	Keyword initiating the clause	`SELECT`
`ALL`	No (default)	Include all rows including duplicates	`SELECT ALL name`
`DISTINCT`	No	Remove duplicate rows from result	`SELECT DISTINCT city`
select_expression	Yes (at least one)	Define what data to retrieve	`first_name, last_name`
Alias (`AS`)	No	Rename column in result set	`salary AS annual_pay`
`INTO`	No	Store result in variables (procedural SQL)	`INTO @var1, @var2`

select_syntax_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Basic column selection
SELECT employee_id, first_name, last_name
FROM employees;
 
-- Using DISTINCT to remove duplicates
SELECT DISTINCT department_id
FROM employees;
 
-- Column aliasing with AS keyword
SELECT 
    first_name AS given_name,
    last_name AS family_name,
    salary AS annual_compensation
FROM employees;
 
-- Column aliasing without AS (valid but less readable)
SELECT 
    first_name given_name,
    last_name family_name
FROM employees;
 
-- Aliasing with quoted identifiers for special characters/spaces
SELECT 
    first_name AS "First Name",
    last_name AS "Family Name",
    salary AS "Annual Salary (USD)"
FROM employees;
 
-- Table-qualified column names (essential for joins)
SELECT 
    employees.employee_id,
    employees.first_name,
    departments.department_name
FROM employees, departments
WHERE employees.department_id = departments.department_id;
 
-- Using table aliases for brevity
SELECT 
    e.employee_id,
    e.first_name,
    d.department_name
FROM employees e, departments d
WHERE e.department_id = d.department_id;

Best Practice: Always Use Explicit Column Names

The SELECT Processing Model

Logical query processing order:

SQL Logical Processing Order
Step	Clause	Operation	Available Aliases
1	`FROM`	Identify source tables and apply joins	Table aliases only
2	`WHERE`	Filter rows based on conditions	Table aliases, column names
3	`GROUP BY`	Group rows by specified columns	Table aliases, column names
4	`HAVING`	Filter groups based on aggregate conditions	Column names, aggregates
5	`SELECT`	Evaluate expressions, apply aliases	All previous + column aliases
6	`DISTINCT`	Remove duplicate rows	All aliases available
7	`ORDER BY`	Sort the result set	Column aliases, column positions
8	`LIMIT/OFFSET`	Restrict number of rows returned	N/A

processing_order_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- This query demonstrates why processing order matters
 
SELECT 
    department_id,
    COUNT(*) AS employee_count,       -- Alias defined in step 5
    AVG(salary) AS average_salary     -- Alias defined in step 5
FROM employees
WHERE hire_date > '2020-01-01'        -- Step 2: Cannot use employee_count here!
GROUP BY department_id                 -- Step 3: Groups are formed
HAVING COUNT(*) > 5                    -- Step 4: Must use COUNT(*), not alias
ORDER BY average_salary DESC;          -- Step 7: CAN use alias (some databases)
 
-- INCORRECT: Trying to use SELECT alias in WHERE
-- SELECT salary * 12 AS annual_salary
-- FROM employees
-- WHERE annual_salary > 50000;  -- ERROR! annual_salary doesn't exist yet
 
-- CORRECT: Repeat the expression in WHERE
SELECT salary * 12 AS annual_salary
FROM employees
WHERE salary * 12 > 50000;  -- Use the original expression
 
-- OR use a subquery/CTE
SELECT * FROM (
    SELECT employee_id, salary * 12 AS annual_salary
    FROM employees
) AS subq
WHERE annual_salary > 50000;  -- Now it works!

Common Pitfall: Alias Availability

Practical implications of processing order:

Filtering efficiency: Conditions in WHERE are evaluated before grouping and projection, so filtering early reduces the data volume for subsequent operations.
Aggregate awareness: You cannot use aggregate functions in WHERE because aggregates are computed during/after GROUP BY, which comes after WHERE.
Expression recomputation: If you use the same expression in SELECT and WHERE, the optimizer typically computes it once, but the logical model suggests separate evaluations.
ORDER BY flexibility: ORDER BY is processed last (before LIMIT), so it can reference column aliases defined in SELECT.

What SELECT Actually Returns

Properties of a SELECT result:

Tabular structure: The result is always rectangular—every row has the same columns in the same order.
Column metadata: Each column has a name (derived from the source or from an alias) and a data type (determined by the expression).
Row sequence: Without ORDER BY, the row order is undefined and non-deterministic. The same query may return rows in different orders across different executions.
Duplicate handling: By default (SELECT ALL), duplicates are preserved. SELECT DISTINCT removes duplicate rows.

result_set_properties.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Example table: products
-- | product_id | name        | category    | price  |
-- |------------|-------------|-------------|--------|
-- | 1          | Widget A    | Electronics | 29.99  |
-- | 2          | Widget B    | Electronics | 49.99  |
-- | 3          | Gadget X    | Electronics | 29.99  |
-- | 4          | Tool Y      | Hardware    | 15.00  |
-- | 5          | Tool Z      | Hardware    | 15.00  |
 
-- Query 1: Returns all columns with original names
SELECT product_id, name, category, price
FROM products;
-- Result columns: product_id, name, category, price
 
-- Query 2: Column order matches SELECT list ORDER
SELECT price, name, product_id
FROM products;
-- Result columns: price, name, product_id  (different order!)
 
-- Query 3: Expressions derive their types
SELECT 
    name,
    price,
    price * 1.1 AS price_with_tax,     -- Numeric type (DECIMAL)
    CONCAT(name, ' - ', category),      -- String type (VARCHAR)
    price > 20 AS is_expensive          -- Boolean type (in supporting DBs)
FROM products;
 
-- Query 4: Demonstrating duplicate handling
SELECT category FROM products;
-- Returns: Electronics, Electronics, Electronics, Hardware, Hardware
-- (5 rows with duplicates)
 
SELECT DISTINCT category FROM products;
-- Returns: Electronics, Hardware
-- (2 rows, duplicates removed)

Result Set Characteristics

•Row count: Can be zero (empty result), one, or many. The result is valid SQL even when empty.
•Column count: At least one column required. Can be hundreds of columns in extreme cases.
•NULL values: Any column that can contain NULL in source data may produce NULL in results.
•Data types: Result column types are inferred from source columns and expressions.
•Materialization: The result may or may not be fully materialized in memory depending on how it's consumed.

Results Are Transient

Column Naming Rules and Conflict Resolution

Column name derivation rules:

Result set column names are determined by the following priority:

Explicit alias: If you provide an alias with AS, that becomes the column name
Source column name: If selecting a simple column, its name is preserved
Generated name: For expressions without aliases, databases generate names (which vary by DBMS)

column_naming.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Source column name preservation
SELECT first_name FROM employees;
-- Result column name: first_name
 
-- Explicit aliasing overrides source name
SELECT first_name AS given_name FROM employees;
-- Result column name: given_name
 
-- Expression without alias: generated name (DBMS-dependent)
SELECT UPPER(first_name) FROM employees;
-- MySQL:      UPPER(first_name)
-- PostgreSQL: upper
-- SQL Server: (No column name)
-- Oracle:     UPPER(FIRST_NAME)
 
-- Always alias expressions for consistent, readable results
SELECT UPPER(first_name) AS upper_first_name FROM employees;
-- Result column name: upper_first_name (consistent across all DBMS)
 
-- Handling column name conflicts in joins
SELECT 
    employees.name,      -- Qualified: employees.name
    departments.name     -- Qualified: departments.name
FROM employees
JOIN departments ON employees.department_id = departments.id;
-- PROBLEM: Both columns named "name" in result - ambiguous!
 
-- Solution: Use aliases to disambiguate
SELECT 
    employees.name AS employee_name,
    departments.name AS department_name
FROM employees
JOIN departments ON employees.department_id = departments.id;
-- Clear result: employee_name, department_name

Column Alias Rules by Database
Feature	MySQL	PostgreSQL	SQL Server	Oracle
AS keyword	Optional	Optional	Optional	Optional
Quoted identifiers	Backticks ` or "	Double quotes "	Square brackets [] or "	Double quotes "
Case sensitivity	Case-insensitive	Lowercase unless quoted	Case-insensitive	Uppercase unless quoted
Reserved words as aliases	Requires quoting	Requires quoting	Requires quoting	Requires quoting
Spaces in aliases	Requires quoting	Requires quoting	Requires quoting	Requires quoting

Defensive Aliasing Strategy

SELECT and NULL Semantics

What NULL means:

null_semantics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- NULL in source data propagates to results
-- Table: employees
-- | id | name  | manager_id | commission |
-- |----|-------|------------|------------|
-- | 1  | Alice | NULL       | 1000       |
-- | 2  | Bob   | 1          | NULL       |
-- | 3  | Carol | 1          | 500        |
 
SELECT name, manager_id, commission
FROM employees;
-- Alice has NULL manager_id (she's top-level)
-- Bob has NULL commission (no commission data)
 
-- NULL propagation in expressions
SELECT 
    name,
    commission,
    commission * 2 AS double_commission
FROM employees;
-- For Bob: commission * 2 = NULL * 2 = NULL
-- Any arithmetic with NULL produces NULL
 
-- NULL in string concatenation
SELECT 
    name,
    CONCAT('Manager: ', manager_id) AS manager_info  -- Varies by DBMS
FROM employees;
-- In standard SQL: NULL concatenation = NULL
-- MySQL CONCAT: NULL becomes empty string (non-standard!)
-- Use COALESCE for safety:
SELECT 
    name,
    CONCAT('Manager: ', COALESCE(CAST(manager_id AS CHAR), 'None')) AS manager_info
FROM employees;
 
-- NULL affects DISTINCT processing
SELECT DISTINCT manager_id FROM employees;
-- Returns: NULL, 1
-- Multiple NULLs are considered equal for DISTINCT purposes
 
-- Counting NULL vs non-NULL
SELECT 
    COUNT(*) AS total_rows,              -- Counts all rows: 3
    COUNT(commission) AS with_commission, -- Counts non-NULL: 2
    COUNT(manager_id) AS with_manager     -- Counts non-NULL: 2
FROM employees;

NULL Propagation Rule

NULL Handling Best Practices in SELECT

•Use COALESCE for defaults: COALESCE(commission, 0) returns 0 if commission is NULL
•Use NULLIF for conditional NULL: NULLIF(value, 0) returns NULL if value equals 0
•Consider IFNULL/NVL/ISNULL: Database-specific alternatives to COALESCE for two arguments
•Test with IS NULL/IS NOT NULL: Never use = NULL for comparisons; use IS NULL
•Document NULL expectations: Make clear in aliases or comments when NULL is possible

Performance Considerations for SELECT

Key performance factors:

SELECT Clause Performance Factors

•Column selection: Retrieving fewer columns means less data transfer from disk and over the network. SELECT * retrieves all columns, even those you don't need.
•Expression complexity: Complex expressions in SELECT (subqueries, function chains) are evaluated for every row in the result set.
•DISTINCT overhead: DISTINCT requires sorting or hashing to identify duplicates—avoid when not needed.
•Covering indexes: If SELECT retrieves only columns present in an index, the database can serve the query without touching the table data (index-only scan).
•Large object columns: Selecting BLOB/CLOB/TEXT columns reads potentially megabytes of data per row.

select_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- POOR: Selects all columns when only id and name needed
SELECT * FROM employees WHERE department_id = 5;
-- Reads: id, name, email, phone, address, hire_date, salary, bio, photo...
 
-- BETTER: Select only needed columns
SELECT employee_id, first_name, last_name 
FROM employees 
WHERE department_id = 5;
-- Reads: only 3 columns, potentially from covering index
 
-- POOR: Unnecessary DISTINCT with primary key
SELECT DISTINCT employee_id, first_name FROM employees;
-- employee_id is unique, so DISTINCT does nothing but add sorting cost
 
-- BETTER: Remove unnecessary DISTINCT
SELECT employee_id, first_name FROM employees;
 
-- POOR: Expensive function call on every row
SELECT 
    employee_id,
    (SELECT department_name FROM departments d 
     WHERE d.id = e.department_id) AS dept_name  -- Scalar subquery per row!
FROM employees e;
 
-- BETTER: Use JOIN instead of scalar subquery
SELECT 
    e.employee_id,
    d.department_name AS dept_name
FROM employees e
JOIN departments d ON e.department_id = d.id;
 
-- CONSIDERATION: Covering index optimization
-- If index exists on (department_id, first_name, last_name):
SELECT first_name, last_name 
FROM employees 
WHERE department_id = 5;
-- Can be served entirely from the index without table access

The SELECT * Anti-Pattern

Summary: The SELECT Clause

We've explored the SELECT clause in depth—from its fundamental purpose through syntax, processing model, and performance implications. Let's consolidate the key points:

Key Takeaways

•SELECT defines projection — It specifies which columns and expressions appear in the result set, implementing the relational projection operation.
•Declarative nature — SELECT tells the database what you want, not how to get it. The optimizer determines execution strategy.
•Processing order matters — Despite appearing first syntactically, SELECT is processed late, after FROM, WHERE, GROUP BY, and HAVING.
•Aliases with AS — Always use explicit aliases for expressions and renamed columns. Some databases allow alias reuse in ORDER BY.
•NULL propagates — Operations involving NULL generally yield NULL. Use COALESCE and IS NULL for proper handling.
•Performance awareness — Select only needed columns, avoid unnecessary DISTINCT, prefer JOINs over scalar subqueries.
•Result sets are relations — The output is tabular, with defined columns and types, but undefined row order without ORDER BY.

What's next:

Page Complete

1 / 5