Database Management SystemsQuery Processing

Parsing and Validation

LevelIntermediate

Duration75 mins

TopicQuery Processing

2 / 5

Semantic Analysis

Beyond Syntax: Does Your Query Make Sense?

A query can be syntactically perfect yet completely nonsensical. Consider this statement:

SELECT unicorn_count FROM mythical_creatures WHERE reality_level > 100

From a syntax perspective, this is flawless SQL. The grammar is correct, the structure is valid, and the parser happily produces a parse tree. But unless you have a table called mythical_creatures with columns unicorn_count and reality_level, this query will fail—not during parsing, but during semantic analysis.

Semantic analysis is the phase where the database answers a crucial question: Does this query refer to real objects, use compatible types, and request operations that make sense?

This is where abstract becomes concrete. The parse tree's generic "table reference" nodes become pointers to actual catalog entries. Column names become type-annotated attribute references. The query transforms from a syntactic structure into a semantically validated, executable specification.

What You Will Learn

By the end of this page, you will understand how databases perform semantic analysis: catalog lookups, schema validation, meaning verification, and the semantic constraints that must be satisfied before a query can proceed to optimization. You'll learn why some errors appear instantly while others only surface at execution time.

Understanding Syntax vs. Semantics

Before diving into semantic analysis details, let's clearly distinguish between syntactic and semantic validity. This distinction is fundamental to understanding why query processing has multiple validation phases.

Syntax concerns form—the structural arrangement of tokens according to grammar rules. A sentence like "Dog cat the jumped over" is syntactically invalid in English because the words don't follow English grammar, regardless of meaning.

Semantics concerns meaning—whether a syntactically valid statement makes sense in context. The sentence "Colorless green ideas sleep furiously" is syntactically valid English but semantically nonsensical.

SQL exhibits this same distinction:

Syntax Errors vs. Semantic Errors
Query	Error Type	Why It Fails	Detection Phase
`SELECT * FORM employees`	Syntax	`FORM` is not a valid keyword	Parsing
`SELECT * FROM employees,`	Syntax	Trailing comma expects another table	Parsing
`SELECT * FROM nonexistent_table`	Semantic	Table doesn't exist in schema	Semantic Analysis
`SELECT bogus_column FROM employees`	Semantic	Column not found in table	Semantic Analysis
`SELECT name + salary FROM employees`	Semantic	Cannot add string and number	Semantic Analysis
`SELECT * FROM employees WHERE salary = 'high'`	Semantic	Type mismatch: comparing number to string	Semantic Analysis

The Semantic Analysis Pipeline:

Semantic analysis typically involves several sub-phases, though different databases may organize these differently:

Catalog Binding — Resolve table/view names to catalog entries
Column Resolution — Match column names to table attributes
Type Checking — Verify type compatibility of expressions
Privilege Checking — Verify user has required permissions
Constraint Validation — Check semantic constraints (unique column in GROUP BY, etc.)
View Expansion — Replace view references with underlying definitions

The output is a validated parse tree (or annotated AST) with all references resolved and type information attached. This annotated tree is ready for query optimization.

Catalog and Schema Resolution

The first major task of semantic analysis is catalog binding—connecting the names in your query to actual database objects. This requires consulting the system catalog (also called the data dictionary or metadata repository).

What is the System Catalog?

The system catalog is a set of internal tables that store metadata about all objects in the database:

Tables and Views — Names, schemas, owners, storage parameters
Columns — Names, data types, nullability, defaults, constraints
Indexes — Indexed columns, index type, uniqueness
Constraints — Primary keys, foreign keys, check constraints
Users and Roles — Authentication info, group memberships
Privileges — Who can do what to which objects
Functions and Procedures — Signatures, implementations, volatility
Sequences, Triggers, etc. — All other database objects

catalog-query-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- PostgreSQL System Catalog Examples
 
-- View all tables in current schema
SELECT tablename FROM pg_tables WHERE schemaname = 'public';
 
-- View columns of a specific table
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'employees';
 
-- View privileges on a table
SELECT grantee, privilege_type
FROM information_schema.table_privileges
WHERE table_name = 'employees';
 
-- Internal catalog tables PostgreSQL uses during semantic analysis:
-- pg_class     - Tables, indexes, sequences, views
-- pg_attribute - Column definitions
-- pg_type      - Data type definitions  
-- pg_namespace - Schema definitions
-- pg_authid    - Users and roles
-- pg_proc      - Functions and procedures

Name Resolution Process:

When the semantic analyzer encounters a table reference like employees, it must determine exactly which table is meant. This involves several steps:

Step 1: Parse the qualified name

-- Fully qualified:    database.schema.table
-- Schema-qualified:   schema.table
-- Unqualified:        table
SELECT * FROM hr.employees;        -- Schema explicitly specified
SELECT * FROM employees;           -- Must search schema path

Step 2: Search the Schema Path

For unqualified names, the database searches a configured schema path:

-- PostgreSQL example
SHOW search_path;   -- Result: "$user", public

-- When you write: SELECT * FROM employees
-- Database checks: sales.employees (if user is 'sales')
-- Then checks:     public.employees
-- Uses first match found

Step 3: Resolve to Catalog Entry

Once located, the analyzer retrieves the full catalog entry:

Object ID (OID in PostgreSQL)
Column definitions with types
Constraints and indexes
Access control lists

This information becomes part of the annotated AST.

Schema Ambiguity Dangers

Relying on schema search paths can create subtle bugs. If two schemas both have an employees table, which one gets used depends on path order—which might differ between environments. Production best practice: always use schema-qualified names in application code to avoid ambiguity.

Column Resolution and Scope

After resolving table references, the semantic analyzer must resolve column references. This is more complex than table resolution because column names can be ambiguous and their meaning depends on query scope.

Column Resolution Challenges:

Consider this query with multiple tables:

column-resolution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Ambiguous column reference
SELECT employee_id, name, department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
 
-- Question: Which table does 'name' come from?
-- The analyzer must:
-- 1. Check if 'employees' has a 'name' column
-- 2. Check if 'departments' has a 'name' column  
-- 3. If only one has it, resolve there
-- 4. If both have it, raise an ambiguity error
 
-- Explicit qualification removes ambiguity:
SELECT e.employee_id, e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

Column Scope Rules:

Column references follow specific scoping rules:

Immediate Scope — Columns from tables in the current FROM clause
Outer Scope — In subqueries, columns from enclosing queries (correlated subqueries)
Alias Scope — SELECT aliases are available in ORDER BY but not WHERE in standard SQL

Let's examine these scopes:

column-scoping.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Scope Example 1: Basic Resolution
SELECT name FROM employees WHERE salary > 50000;
-- 'name' resolves to employees.name (immediate scope)
-- 'salary' resolves to employees.salary (immediate scope)
 
-- Scope Example 2: Correlated Subquery
SELECT e.name, 
       (SELECT COUNT(*) FROM orders o 
        WHERE o.employee_id = e.employee_id)  -- outer reference
FROM employees e;
-- 'e.employee_id' in subquery references outer query (outer scope)
 
-- Scope Example 3: Alias Visibility
SELECT salary * 12 AS annual_salary
FROM employees
WHERE annual_salary > 100000;   -- ERROR! Alias not visible in WHERE
 
SELECT salary * 12 AS annual_salary
FROM employees
ORDER BY annual_salary;         -- OK in most databases
 
-- Scope Example 4: HAVING vs WHERE
SELECT department_id, AVG(salary) as avg_sal
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000;     -- OK: aggregate in HAVING
-- HAVING avg_sal > 50000;      -- May or may not work (DB-dependent)

Column Resolution Algorithm

•Parse column reference — Extract any qualifier (table alias/name) and column name
•If qualified — Look up in specified table only; error if not found
•If unqualified — Search all tables in scope; must find exactly one match
•Check outer scopes — If not found and in subquery, check enclosing query scopes
•Annotate AST — Attach table reference, column position, and type to node
•Handle ambiguity — If found in multiple tables, raise informative error

Qualify Your Columns

Always qualify column names in JOINs and subqueries, even when technically unambiguous. This makes queries self-documenting and prevents future breaks when tables gain new columns that create ambiguity.

Object Existence and Accessibility Verification

Beyond simply finding objects, semantic analysis must verify existence, accessibility, and appropriate usage. This verification extends to various database objects:

Table and View Verification:

Does the table/view exist in the specified (or resolved) schema?
Is it actually a table/view (not a function or sequence)?
For views, is the view valid (underlying tables still exist)?

Function and Operator Verification:

Does a function with this name exist?
Does the function accept these argument types?
If overloaded, which overload matches best?

Constraint and Index Awareness:

Which constraints apply to affected tables?
Which indexes exist (relevant for optimizer, not execution validity)?

object-verification-errors.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Various Semantic Errors from Object Verification
 
-- Error: Table doesn't exist
SELECT * FROM nonexistent_employees;
-- ERROR: relation "nonexistent_employees" does not exist
 
-- Error: Column doesn't exist
SELECT nonexistent_column FROM employees;
-- ERROR: column "nonexistent_column" does not exist
 
-- Error: Function doesn't exist
SELECT custom_function(salary) FROM employees;
-- ERROR: function custom_function(numeric) does not exist
 
-- Error: Function argument type mismatch
SELECT UPPER(salary) FROM employees;
-- ERROR: function upper(numeric) does not exist
-- HINT: No function matches the given name and argument types.
 
-- Error: Schema doesn't exist
SELECT * FROM wrong_schema.employees;
-- ERROR: schema "wrong_schema" does not exist
 
-- Error: Using table as function
SELECT employees(1) FROM dual;
-- ERROR: employees(integer) is not a function
 
-- Error: Insufficient privilege (even if object exists)
SELECT * FROM payroll.salaries;  -- as unprivileged user
-- ERROR: permission denied for table salaries

View Expansion:

When a query references a view, semantic analysis must expand it. Views are essentially named queries, and the analyzer must:

Look up the view definition from the catalog
Replace the view reference with the underlying query
Recursively analyze the expanded query
Check that the user has privileges on the view (not necessarily on underlying tables)

This expansion reveals how views provide security—users need SELECT on the view, not on base tables. It also shows why broken views (referencing dropped tables) fail at analysis time, not creation time.

view-expansion.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Original View Definition
CREATE VIEW active_employees AS
SELECT employee_id, name, email, hire_date
FROM employees
WHERE status = 'active' AND termination_date IS NULL;
 
-- User Query
SELECT name, email FROM active_employees WHERE hire_date > '2020-01-01';
 
-- After View Expansion (conceptually)
SELECT name, email 
FROM (
    SELECT employee_id, name, email, hire_date
    FROM employees
    WHERE status = 'active' AND termination_date IS NULL
) AS active_employees
WHERE hire_date > '2020-01-01';
 
-- The optimizer later merges these into a single query:
SELECT name, email
FROM employees
WHERE status = 'active' 
  AND termination_date IS NULL
  AND hire_date > '2020-01-01';

Privilege and Permission Checking

An important part of semantic analysis is privilege checking—verifying that the current user has permission to perform the requested operations on the referenced objects. The database's authorization system integrates deeply with query processing.

Types of Privileges:

SQL defines standard privileges that can be granted or revoked:

SQL Privileges Checked During Semantic Analysis
Privilege	Applies To	Required For
SELECT	Tables, Views, Columns	Reading data from the object
INSERT	Tables, Views	Adding new rows
UPDATE	Tables, Views, Columns	Modifying existing rows
DELETE	Tables, Views	Removing rows
REFERENCES	Tables, Columns	Creating foreign key constraints
TRIGGER	Tables	Creating triggers on the table
EXECUTE	Functions, Procedures	Calling the function/procedure
USAGE	Schemas, Sequences, Types	Using the object in queries

Privilege Resolution:

Privilege checking follows a specific order:

Ownership — Object owners have all privileges automatically
Direct Grants — Privileges granted directly to the user
Role Membership — Privileges inherited through role hierarchy
PUBLIC Privileges — Privileges granted to the PUBLIC pseudo-role
Superuser Status — Superusers bypass all permission checks

Column-Level Privileges:

Modern databases support column-level granularity:

privilege-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Grant SELECT on specific columns only
GRANT SELECT (employee_id, name, department) 
ON employees TO analyst_role;
 
-- Analyst can do this:
SELECT employee_id, name, department FROM employees;
 
-- But NOT this:
SELECT salary FROM employees;
-- ERROR: permission denied for column salary
 
-- The semantic analyzer checks each column reference
-- against the user's column-level privileges
 
-- Grant-based security model:
GRANT SELECT ON employees TO hr_staff;
GRANT INSERT, UPDATE ON employees TO hr_admin;
GRANT DELETE ON employees TO hr_director;
 
-- Privilege checking happens during semantic analysis,
-- before any query optimization or execution

Early vs. Late Permission Checking

Most databases check permissions during semantic analysis rather than at execution time. This means a permission-denied query fails immediately rather than after expensive optimization. However, some databases defer certain checks (like row-level security) to execution time when the actual row data is needed.

Semantic Constraints and Validation Rules

Beyond type checking and privilege verification, semantic analysis enforces various semantic constraints—rules about what combinations of query elements are logically valid. These constraints catch errors that are syntactically legal but semantically meaningless or problematic.

Key Semantic Constraints:

Semantic Validation Rules

•Aggregate Function Placement — Aggregates are only valid in SELECT, HAVING, and ORDER BY; not in WHERE or plain FROM
•GROUP BY Consistency — Non-aggregated SELECT columns must appear in GROUP BY (standard SQL; some DBs are lenient)
•Subquery Column Count — Scalar subqueries must return exactly one column
•Set Operation Compatibility — UNION/INTERSECT/EXCEPT require matching column counts and compatible types
•ORDER BY in Subqueries — Not allowed except with LIMIT (meaningless without it)
•Window Function Restrictions — Can't nest window functions; can't use in WHERE
•INTO Clause Validity — SELECT INTO target must be writable; column counts must match
•Recursive CTE Rules — Anchor and recursive parts have specific structural requirements

semantic-constraint-errors.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Error: Aggregate in WHERE clause
SELECT department_id FROM employees
WHERE AVG(salary) > 50000;
-- ERROR: aggregate functions are not allowed in WHERE
 
-- Correct approach: Use HAVING
SELECT department_id FROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000;
 
-- Error: Non-grouped column in SELECT
SELECT department_id, employee_name, AVG(salary)
FROM employees
GROUP BY department_id;
-- ERROR: column "employee_name" must appear in GROUP BY 
-- or be used in an aggregate function
 
-- Error: Scalar subquery returns multiple columns
SELECT (SELECT name, salary FROM employees LIMIT 1)
FROM departments;
-- ERROR: subquery must return only one column
 
-- Error: UNION column count mismatch
SELECT id, name FROM employees
UNION
SELECT id, name, department FROM contractors;
-- ERROR: each UNION query must have the same number of columns
 
-- Error: Window function in WHERE
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (ORDER BY salary) = 1;
-- ERROR: window functions are not allowed in WHERE
 
-- Correct approach: Use subquery or CTE
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY salary) as rn
    FROM employees
) t WHERE rn = 1;

Constraint Validation Order:

Semantic constraints are typically checked after name and type resolution, because many constraints depend on knowing what types and objects are involved. For example, GROUP BY consistency checking requires knowing which expressions are aggregates.

The semantic analyzer maintains context about the current query level (main query, subquery, HAVING clause, etc.) to apply the appropriate constraints at each point.

Error Messages and Diagnostic Quality

The quality of semantic error messages significantly impacts developer productivity. A good semantic analyzer doesn't just detect errors—it explains them clearly and suggests corrections.

Characteristics of Good Semantic Errors:

Good Error Messages

•Precise location (line, column, context)
•Clear explanation of what's wrong
•What was expected vs. found
•Suggestions for similar valid names
•Single error per distinct problem
•Actionable hints for correction

Poor Error Messages

•Vague location or no position info
•Cryptic error codes without explanation
•Generic "invalid query" messages
•Cascading errors from one mistake
•Internal implementation details exposed
•No suggestions or context

error-quality-comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- PostgreSQL: Excellent Semantic Error Messages
 
-- Missing column with suggestion
SELECT employe_name FROM employees;
-- ERROR:  column "employe_name" does not exist
-- LINE 1: SELECT employe_name FROM employees
--                ^
-- HINT:  Perhaps you meant to reference the column "employees.employee_name"
 
-- Type mismatch with explanation  
SELECT * FROM employees WHERE hire_date = 'not-a-date';
-- ERROR:  invalid input syntax for type date: "not-a-date"
-- LINE 1: ...FROM employees WHERE hire_date = 'not-a-date'
--                                              ^
 
-- Function signature mismatch with alternatives
SELECT SUBSTR(salary, 1, 5) FROM employees;
-- ERROR:  function substr(numeric, integer, integer) does not exist
-- LINE 1: SELECT SUBSTR(salary, 1, 5) FROM employees
--                ^
-- HINT:  No function matches the given name and argument types.
--        You might need to add explicit type casts.
 
-- PostgreSQL's error format:
-- ERROR:  [clear description of the problem]
-- LINE n: [the offending line with caret pointing to error]
-- HINT:  [suggestion for how to fix it if possible]
-- DETAIL: [additional context when helpful]

Leveraging Error Hints

PostgreSQL's HINT lines are especially valuable. When you see 'Perhaps you meant...' or 'You might need to...', pay attention. These hints are generated by analyzing the catalog for similar names or compatible type conversions. They're usually correct.

Summary: Semantic Analysis

Semantic analysis transforms a syntactically valid parse tree into a semantically validated, executable specification. Let's consolidate the key concepts:

Key Takeaways

•Semantic analysis verifies that syntactically correct queries also make logical sense—referring to real objects with compatible types.
•Catalog/schema resolution maps table and view names to actual database objects using search paths and qualified names.
•Column resolution matches column references to specific table attributes, handling scope and ambiguity.
•Object verification confirms existence and appropriate usage of tables, functions, operators, and other database objects.
•Privilege checking ensures the user has required permissions before optimization begins.
•Semantic constraints enforce rules like GROUP BY consistency, aggregate placement, and subquery validity.
•Error quality varies significantly between databases—good errors speed debugging substantially.

What's Next:

With semantic analysis complete, every name in the query points to a real object, all types are verified as compatible, and we know the user is authorized. But column names like employees.name and customers.name still exist as abstract references. The next page explores name resolution in greater depth—how databases handle complex aliasing, subquery scopes, and the resolution of ambiguous references across multiple query levels.

Page Complete

You now understand semantic analysis: the phase that transforms abstract syntax into concrete database operations by resolving names, checking types, verifying permissions, and enforcing semantic constraints.

2 / 5

Loading learning content...

Database Management SystemsQuery Processing

Parsing and Validation

LevelIntermediate

Duration75 mins

TopicQuery Processing

2 / 5

Semantic Analysis

Beyond Syntax: Does Your Query Make Sense?

A query can be syntactically perfect yet completely nonsensical. Consider this statement:

SELECT unicorn_count FROM mythical_creatures WHERE reality_level > 100

Semantic analysis is the phase where the database answers a crucial question: Does this query refer to real objects, use compatible types, and request operations that make sense?

What You Will Learn

Understanding Syntax vs. Semantics

SQL exhibits this same distinction:

Syntax Errors vs. Semantic Errors
Query	Error Type	Why It Fails	Detection Phase
`SELECT * FORM employees`	Syntax	`FORM` is not a valid keyword	Parsing
`SELECT * FROM employees,`	Syntax	Trailing comma expects another table	Parsing
`SELECT * FROM nonexistent_table`	Semantic	Table doesn't exist in schema	Semantic Analysis
`SELECT bogus_column FROM employees`	Semantic	Column not found in table	Semantic Analysis
`SELECT name + salary FROM employees`	Semantic	Cannot add string and number	Semantic Analysis
`SELECT * FROM employees WHERE salary = 'high'`	Semantic	Type mismatch: comparing number to string	Semantic Analysis

The Semantic Analysis Pipeline:

Semantic analysis typically involves several sub-phases, though different databases may organize these differently:

Catalog Binding — Resolve table/view names to catalog entries
Column Resolution — Match column names to table attributes
Type Checking — Verify type compatibility of expressions
Privilege Checking — Verify user has required permissions
Constraint Validation — Check semantic constraints (unique column in GROUP BY, etc.)
View Expansion — Replace view references with underlying definitions

The output is a validated parse tree (or annotated AST) with all references resolved and type information attached. This annotated tree is ready for query optimization.

Catalog and Schema Resolution

What is the System Catalog?

The system catalog is a set of internal tables that store metadata about all objects in the database:

Tables and Views — Names, schemas, owners, storage parameters
Columns — Names, data types, nullability, defaults, constraints
Indexes — Indexed columns, index type, uniqueness
Constraints — Primary keys, foreign keys, check constraints
Users and Roles — Authentication info, group memberships
Privileges — Who can do what to which objects
Functions and Procedures — Signatures, implementations, volatility
Sequences, Triggers, etc. — All other database objects

catalog-query-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- PostgreSQL System Catalog Examples
 
-- View all tables in current schema
SELECT tablename FROM pg_tables WHERE schemaname = 'public';
 
-- View columns of a specific table
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'employees';
 
-- View privileges on a table
SELECT grantee, privilege_type
FROM information_schema.table_privileges
WHERE table_name = 'employees';
 
-- Internal catalog tables PostgreSQL uses during semantic analysis:
-- pg_class     - Tables, indexes, sequences, views
-- pg_attribute - Column definitions
-- pg_type      - Data type definitions  
-- pg_namespace - Schema definitions
-- pg_authid    - Users and roles
-- pg_proc      - Functions and procedures

Name Resolution Process:

When the semantic analyzer encounters a table reference like employees, it must determine exactly which table is meant. This involves several steps:

Step 1: Parse the qualified name

-- Fully qualified:    database.schema.table
-- Schema-qualified:   schema.table
-- Unqualified:        table
SELECT * FROM hr.employees;        -- Schema explicitly specified
SELECT * FROM employees;           -- Must search schema path

Step 2: Search the Schema Path

For unqualified names, the database searches a configured schema path:

-- PostgreSQL example
SHOW search_path;   -- Result: "$user", public

-- When you write: SELECT * FROM employees
-- Database checks: sales.employees (if user is 'sales')
-- Then checks:     public.employees
-- Uses first match found

Step 3: Resolve to Catalog Entry

Once located, the analyzer retrieves the full catalog entry:

Object ID (OID in PostgreSQL)
Column definitions with types
Constraints and indexes
Access control lists

This information becomes part of the annotated AST.

Schema Ambiguity Dangers

Column Resolution and Scope

Column Resolution Challenges:

Consider this query with multiple tables:

column-resolution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Ambiguous column reference
SELECT employee_id, name, department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
 
-- Question: Which table does 'name' come from?
-- The analyzer must:
-- 1. Check if 'employees' has a 'name' column
-- 2. Check if 'departments' has a 'name' column  
-- 3. If only one has it, resolve there
-- 4. If both have it, raise an ambiguity error
 
-- Explicit qualification removes ambiguity:
SELECT e.employee_id, e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

Column Scope Rules:

Column references follow specific scoping rules:

Immediate Scope — Columns from tables in the current FROM clause
Outer Scope — In subqueries, columns from enclosing queries (correlated subqueries)
Alias Scope — SELECT aliases are available in ORDER BY but not WHERE in standard SQL

Let's examine these scopes:

column-scoping.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Scope Example 1: Basic Resolution
SELECT name FROM employees WHERE salary > 50000;
-- 'name' resolves to employees.name (immediate scope)
-- 'salary' resolves to employees.salary (immediate scope)
 
-- Scope Example 2: Correlated Subquery
SELECT e.name, 
       (SELECT COUNT(*) FROM orders o 
        WHERE o.employee_id = e.employee_id)  -- outer reference
FROM employees e;
-- 'e.employee_id' in subquery references outer query (outer scope)
 
-- Scope Example 3: Alias Visibility
SELECT salary * 12 AS annual_salary
FROM employees
WHERE annual_salary > 100000;   -- ERROR! Alias not visible in WHERE
 
SELECT salary * 12 AS annual_salary
FROM employees
ORDER BY annual_salary;         -- OK in most databases
 
-- Scope Example 4: HAVING vs WHERE
SELECT department_id, AVG(salary) as avg_sal
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000;     -- OK: aggregate in HAVING
-- HAVING avg_sal > 50000;      -- May or may not work (DB-dependent)

Column Resolution Algorithm

•Parse column reference — Extract any qualifier (table alias/name) and column name
•If qualified — Look up in specified table only; error if not found
•If unqualified — Search all tables in scope; must find exactly one match
•Check outer scopes — If not found and in subquery, check enclosing query scopes
•Annotate AST — Attach table reference, column position, and type to node
•Handle ambiguity — If found in multiple tables, raise informative error

Qualify Your Columns

Object Existence and Accessibility Verification

Beyond simply finding objects, semantic analysis must verify existence, accessibility, and appropriate usage. This verification extends to various database objects:

Table and View Verification:

Does the table/view exist in the specified (or resolved) schema?
Is it actually a table/view (not a function or sequence)?
For views, is the view valid (underlying tables still exist)?

Function and Operator Verification:

Does a function with this name exist?
Does the function accept these argument types?
If overloaded, which overload matches best?

Constraint and Index Awareness:

Which constraints apply to affected tables?
Which indexes exist (relevant for optimizer, not execution validity)?

object-verification-errors.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Various Semantic Errors from Object Verification
 
-- Error: Table doesn't exist
SELECT * FROM nonexistent_employees;
-- ERROR: relation "nonexistent_employees" does not exist
 
-- Error: Column doesn't exist
SELECT nonexistent_column FROM employees;
-- ERROR: column "nonexistent_column" does not exist
 
-- Error: Function doesn't exist
SELECT custom_function(salary) FROM employees;
-- ERROR: function custom_function(numeric) does not exist
 
-- Error: Function argument type mismatch
SELECT UPPER(salary) FROM employees;
-- ERROR: function upper(numeric) does not exist
-- HINT: No function matches the given name and argument types.
 
-- Error: Schema doesn't exist
SELECT * FROM wrong_schema.employees;
-- ERROR: schema "wrong_schema" does not exist
 
-- Error: Using table as function
SELECT employees(1) FROM dual;
-- ERROR: employees(integer) is not a function
 
-- Error: Insufficient privilege (even if object exists)
SELECT * FROM payroll.salaries;  -- as unprivileged user
-- ERROR: permission denied for table salaries

View Expansion:

When a query references a view, semantic analysis must expand it. Views are essentially named queries, and the analyzer must:

Look up the view definition from the catalog
Replace the view reference with the underlying query
Recursively analyze the expanded query
Check that the user has privileges on the view (not necessarily on underlying tables)

view-expansion.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Original View Definition
CREATE VIEW active_employees AS
SELECT employee_id, name, email, hire_date
FROM employees
WHERE status = 'active' AND termination_date IS NULL;
 
-- User Query
SELECT name, email FROM active_employees WHERE hire_date > '2020-01-01';
 
-- After View Expansion (conceptually)
SELECT name, email 
FROM (
    SELECT employee_id, name, email, hire_date
    FROM employees
    WHERE status = 'active' AND termination_date IS NULL
) AS active_employees
WHERE hire_date > '2020-01-01';
 
-- The optimizer later merges these into a single query:
SELECT name, email
FROM employees
WHERE status = 'active' 
  AND termination_date IS NULL
  AND hire_date > '2020-01-01';

Privilege and Permission Checking

Types of Privileges:

SQL defines standard privileges that can be granted or revoked:

SQL Privileges Checked During Semantic Analysis
Privilege	Applies To	Required For
SELECT	Tables, Views, Columns	Reading data from the object
INSERT	Tables, Views	Adding new rows
UPDATE	Tables, Views, Columns	Modifying existing rows
DELETE	Tables, Views	Removing rows
REFERENCES	Tables, Columns	Creating foreign key constraints
TRIGGER	Tables	Creating triggers on the table
EXECUTE	Functions, Procedures	Calling the function/procedure
USAGE	Schemas, Sequences, Types	Using the object in queries

Privilege Resolution:

Privilege checking follows a specific order:

Ownership — Object owners have all privileges automatically
Direct Grants — Privileges granted directly to the user
Role Membership — Privileges inherited through role hierarchy
PUBLIC Privileges — Privileges granted to the PUBLIC pseudo-role
Superuser Status — Superusers bypass all permission checks

Column-Level Privileges:

Modern databases support column-level granularity:

privilege-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Grant SELECT on specific columns only
GRANT SELECT (employee_id, name, department) 
ON employees TO analyst_role;
 
-- Analyst can do this:
SELECT employee_id, name, department FROM employees;
 
-- But NOT this:
SELECT salary FROM employees;
-- ERROR: permission denied for column salary
 
-- The semantic analyzer checks each column reference
-- against the user's column-level privileges
 
-- Grant-based security model:
GRANT SELECT ON employees TO hr_staff;
GRANT INSERT, UPDATE ON employees TO hr_admin;
GRANT DELETE ON employees TO hr_director;
 
-- Privilege checking happens during semantic analysis,
-- before any query optimization or execution

Early vs. Late Permission Checking

Semantic Constraints and Validation Rules

Key Semantic Constraints:

Semantic Validation Rules

•Aggregate Function Placement — Aggregates are only valid in SELECT, HAVING, and ORDER BY; not in WHERE or plain FROM
•GROUP BY Consistency — Non-aggregated SELECT columns must appear in GROUP BY (standard SQL; some DBs are lenient)
•Subquery Column Count — Scalar subqueries must return exactly one column
•Set Operation Compatibility — UNION/INTERSECT/EXCEPT require matching column counts and compatible types
•ORDER BY in Subqueries — Not allowed except with LIMIT (meaningless without it)
•Window Function Restrictions — Can't nest window functions; can't use in WHERE
•INTO Clause Validity — SELECT INTO target must be writable; column counts must match
•Recursive CTE Rules — Anchor and recursive parts have specific structural requirements

semantic-constraint-errors.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Error: Aggregate in WHERE clause
SELECT department_id FROM employees
WHERE AVG(salary) > 50000;
-- ERROR: aggregate functions are not allowed in WHERE
 
-- Correct approach: Use HAVING
SELECT department_id FROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000;
 
-- Error: Non-grouped column in SELECT
SELECT department_id, employee_name, AVG(salary)
FROM employees
GROUP BY department_id;
-- ERROR: column "employee_name" must appear in GROUP BY 
-- or be used in an aggregate function
 
-- Error: Scalar subquery returns multiple columns
SELECT (SELECT name, salary FROM employees LIMIT 1)
FROM departments;
-- ERROR: subquery must return only one column
 
-- Error: UNION column count mismatch
SELECT id, name FROM employees
UNION
SELECT id, name, department FROM contractors;
-- ERROR: each UNION query must have the same number of columns
 
-- Error: Window function in WHERE
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (ORDER BY salary) = 1;
-- ERROR: window functions are not allowed in WHERE
 
-- Correct approach: Use subquery or CTE
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY salary) as rn
    FROM employees
) t WHERE rn = 1;

Constraint Validation Order:

The semantic analyzer maintains context about the current query level (main query, subquery, HAVING clause, etc.) to apply the appropriate constraints at each point.

Error Messages and Diagnostic Quality

The quality of semantic error messages significantly impacts developer productivity. A good semantic analyzer doesn't just detect errors—it explains them clearly and suggests corrections.

Characteristics of Good Semantic Errors:

Good Error Messages

•Precise location (line, column, context)
•Clear explanation of what's wrong
•What was expected vs. found
•Suggestions for similar valid names
•Single error per distinct problem
•Actionable hints for correction

Poor Error Messages

•Vague location or no position info
•Cryptic error codes without explanation
•Generic "invalid query" messages
•Cascading errors from one mistake
•Internal implementation details exposed
•No suggestions or context

error-quality-comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- PostgreSQL: Excellent Semantic Error Messages
 
-- Missing column with suggestion
SELECT employe_name FROM employees;
-- ERROR:  column "employe_name" does not exist
-- LINE 1: SELECT employe_name FROM employees
--                ^
-- HINT:  Perhaps you meant to reference the column "employees.employee_name"
 
-- Type mismatch with explanation  
SELECT * FROM employees WHERE hire_date = 'not-a-date';
-- ERROR:  invalid input syntax for type date: "not-a-date"
-- LINE 1: ...FROM employees WHERE hire_date = 'not-a-date'
--                                              ^
 
-- Function signature mismatch with alternatives
SELECT SUBSTR(salary, 1, 5) FROM employees;
-- ERROR:  function substr(numeric, integer, integer) does not exist
-- LINE 1: SELECT SUBSTR(salary, 1, 5) FROM employees
--                ^
-- HINT:  No function matches the given name and argument types.
--        You might need to add explicit type casts.
 
-- PostgreSQL's error format:
-- ERROR:  [clear description of the problem]
-- LINE n: [the offending line with caret pointing to error]
-- HINT:  [suggestion for how to fix it if possible]
-- DETAIL: [additional context when helpful]

Leveraging Error Hints

Summary: Semantic Analysis

Semantic analysis transforms a syntactically valid parse tree into a semantically validated, executable specification. Let's consolidate the key concepts:

Key Takeaways

•Semantic analysis verifies that syntactically correct queries also make logical sense—referring to real objects with compatible types.
•Catalog/schema resolution maps table and view names to actual database objects using search paths and qualified names.
•Column resolution matches column references to specific table attributes, handling scope and ambiguity.
•Object verification confirms existence and appropriate usage of tables, functions, operators, and other database objects.
•Privilege checking ensures the user has required permissions before optimization begins.
•Semantic constraints enforce rules like GROUP BY consistency, aggregate placement, and subquery validity.
•Error quality varies significantly between databases—good errors speed debugging substantially.

What's Next:

Page Complete

2 / 5