Database Management SystemsQuery Processing Overview

Query Processing Steps

LevelIntermediate

Duration60 mins

TopicQuery Processing Overview

3 / 5

Translation - From Parse Tree to Relational Algebra

Bridging Syntax and Semantics

The parse tree tells us the structure of the SQL statement—which keywords appear where, how expressions nest, what identifiers are referenced. But structure alone isn't enough. The database must now understand what this query means and how to compute its result.

This is the job of the translation phase (also called semantic analysis or query translation). It transforms the syntactic parse tree into a logical query plan expressed in relational algebra—the mathematical foundation of relational databases.

Translation is where SQL meets database theory. The declarative "what I want" becomes a precise specification of "what operations produce it."

What You Will Learn

By the end of this page, you will understand semantic analysis (name resolution, type checking, scope analysis), the relational algebra operations that form logical query plans, how SQL constructs map to relational algebra operators, and how the logical plan provides a foundation for optimization.

Semantic Analysis: Beyond Syntax

The parser verifies that SQL is syntactically valid—tokens appear in legal sequences. Semantic analysis verifies that SQL is semantically valid—it makes sense in the context of the database schema.

Syntax vs. Semantics:

-- Syntactically valid, semantically invalid:
SELECT unicorn_column FROM nonexistent_table WHERE 'hello' > 42;

This query parses successfully—the structure is correct. But semantic analysis will reject it:

Table nonexistent_table doesn't exist
Column unicorn_column is unknown
Comparing string to integer is a type error (in strict SQL)

Core Semantic Analysis Tasks:

Semantic Analysis Phases

•Name Resolution — Resolve table names to actual tables, column names to actual columns. Handle schema qualification, search paths, and aliases.
•Type Checking — Verify that operations are applied to compatible types. Check that comparison operands are comparable, function arguments match signatures.
•Access Control — Verify the user has permission to access referenced tables, columns, and execute functions.
•Constraint Validation — For DML statements, verify that the operation won't violate constraints, foreign keys, or triggers.
•Scope Analysis — Track aliases, subquery scope, correlation between outer and inner queries.
•Ambiguity Resolution — When column names appear in multiple tables, determine the correct source or report ambiguity errors.

Converting Mermaid diagram...

Name Resolution and Catalog Lookup

The first semantic task is name resolution: connecting identifiers in the query to actual database objects. This requires consulting the system catalog (also called data dictionary or metadata repository).

The System Catalog:

Every relational database maintains metadata tables describing all objects:

Tables and Views — Name, schema, owner, columns, storage parameters
Columns — Name, data type, nullability, default values, constraints
Indexes — Name, indexed columns, type (B-tree, hash, etc.)
Functions — Name, arguments, return type, implementation
Users and Roles — Permissions, grants, ownership
Constraints — Primary keys, foreign keys, check constraints

Resolution Process:

name_resolution_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Query to resolve:
SELECT o.order_id, c.customer_name, o.total
FROM   orders o
       JOIN customers c ON o.customer_id = c.id
WHERE  o.status = 'completed';
 
-- Name resolution must answer:
-- 1. Does table 'orders' exist? In which schema?
-- 2. Does table 'customers' exist? In which schema?
-- 3. Does 'orders' have columns: order_id, customer_id, total, status?
-- 4. Does 'customers' have columns: customer_name, id?
-- 5. Are aliases 'o' and 'c' unique within this query scope?
-- 6. Does current user have SELECT permission on these tables/columns?

Schema Search Path:

When a table name isn't schema-qualified (e.g., orders not sales.orders), the database searches schemas in a configured order:

-- PostgreSQL search path
SHOW search_path;
-- Result: "$user", public

-- Table 'orders' will be resolved by checking:
-- 1. schema = current_user (e.g., 'john')
-- 2. schema = 'public'
-- First match wins; if no match, error

-- Explicitly set search path
SET search_path = sales, inventory, public;

Alias Handling:

Aliases create temporary names within query scope. The resolver maintains a scope stack tracking which aliases are valid at each nesting level.

Ambiguous Column References

When multiple tables have columns with the same name, unqualified references are ambiguous. For example, if both 'orders' and 'order_items' have a column 'created_at', the query SELECT created_at is ambiguous. Most databases reject ambiguous references with an error like "column reference 'created_at' is ambiguous".

Type Checking and Coercion

After resolving names, the analyzer must verify that operations are type-safe. This involves checking that operators and functions receive compatible argument types.

Type Checking Rules:

Comparison Operations — Operands must be of compatible types (both numeric, both strings, etc.)
Arithmetic Operations — Operands must be numeric (integer, float, decimal)
Function Calls — Arguments must match the function's parameter types
Assignment — Values must be compatible with target column types
Aggregate Functions — Input types must match function expectations

Type Coercion:

When types don't exactly match but are compatible, the database applies implicit type coercion (also called type casting or type promotion):

Common Implicit Type Coercions
From Type	To Type	Example	Notes
INTEGER	BIGINT	`INT + BIGINT`	Promote to wider type
INTEGER	FLOAT	`INT * FLOAT`	Promote to float
VARCHAR	TEXT	`VARCHAR = TEXT`	Promote to longer text
DATE	TIMESTAMP	`DATE < TIMESTAMP`	Add midnight time
INTEGER	VARCHAR	`INT \\|\\| VARCHAR`	Concatenation context
NULL	(any)	Comparison with NULL	NULL propagation

type_coercion_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Implicit coercion examples
 
-- Integer to float in arithmetic
SELECT 42 + 3.14;               -- Result: 45.14 (INT promoted to FLOAT)
 
-- Integer to string in concatenation
SELECT 'Order #' || 12345;      -- Result: 'Order #12345'
 
-- Date to timestamp in comparison
SELECT * FROM events 
WHERE event_time > '2024-01-15';  -- String coerced to timestamp
 
-- Integer to decimal for precision
SELECT 100 / 3;                  -- Integer division: 33
SELECT 100.0 / 3;                -- Decimal division: 33.333...
SELECT CAST(100 AS DECIMAL) / 3; -- Explicit cast
 
-- Type errors (depending on database strictness)
SELECT 'hello' + 5;             -- Error in strict SQL
SELECT 'hello' < 100;           -- Error: incompatible types
 
-- Explicit casts when implicit coercion isn't available
SELECT CAST(quantity AS VARCHAR) || ' units' FROM inventory;
SELECT order_date::TEXT FROM orders;  -- PostgreSQL-specific syntax

Type Inference:

For expressions and subqueries, the analyzer infers result types by examining the operations:

-- Inferred types for SELECT expressions
SELECT 
    order_id,                    -- Type: from catalog (INTEGER)
    quantity * price AS total,   -- Type: numeric (from * operator)
    UPPER(customer_name),        -- Type: TEXT (from function signature)
    order_date + INTERVAL '7 days' -- Type: TIMESTAMP (date + interval)
FROM orders;

Type information propagates through the query tree, ensuring every expression has a known type before execution planning begins.

Avoiding Type Coercion Surprises

Implicit coercion can cause unexpected results. Comparing VARCHAR to INTEGER may work but not use indexes efficiently. Always use explicit casts when types don't naturally align, and be especially careful with date/time comparisons against string literals.

Relational Algebra Fundamentals

With semantic analysis complete, the parse tree transforms into a logical query plan expressed in relational algebra. Relational algebra is the mathematical foundation of relational databases, providing a formal language for expressing data manipulation operations.

Why Relational Algebra?

Mathematical Precision — Each operation has rigorous semantics
Closure Property — Operations on relations produce relations (enabling composition)
Equivalence Rules — Algebraic laws enable query optimization
Implementation Independence — Logical operators abstract away physical details

Core Relational Algebra Operators:

Fundamental Relational Algebra Operations
Operator	Symbol	Description	SQL Equivalent
Selection	σ (sigma)	Filter rows matching a predicate	`WHERE` clause
Projection	π (pi)	Choose specific columns	`SELECT` column list
Cartesian Product	× (cross)	All combinations of rows	`CROSS JOIN`
Join	⋈ (bowtie)	Combine related rows	`[INNER] JOIN ... ON`
Union	∪	Combine row sets	`UNION`
Difference	−	Rows in first but not second	`EXCEPT`
Intersection	∩	Rows in both sets	`INTERSECT`
Rename	ρ (rho)	Rename relation or attributes	`AS` aliases

Extended Operators for SQL Features:

Standard relational algebra is extended to support SQL's full feature set:

Extended Operator	Purpose	SQL Feature
Aggregation (γ)	Group-by and aggregate functions	`GROUP BY`, `SUM()`, `COUNT()`
Outer Join (⟕⟖⟗)	Preserve unmatched rows	`LEFT/RIGHT/FULL OUTER JOIN`
Duplicate Elimination (δ)	Remove duplicate rows	`DISTINCT`
Sorting (τ)	Order result rows	`ORDER BY`
Limit (λ)	Restrict result size	`LIMIT`, `OFFSET`

Converting Mermaid diagram...

SQL to Relational Algebra Translation

The translator systematically converts SQL constructs into relational algebra operators. Each SQL clause maps to specific algebra operations.

Translation Rules:

translation_examples.sql

SQL → Algebra

-- SQL Query
SELECT DISTINCT c.name, o.total
FROM   customers c
       JOIN orders o ON c.id = o.customer_id
WHERE  o.status = 'completed'
  AND  o.total > 1000
ORDER BY o.total DESC
LIMIT 10;
 
-- Relational Algebra Translation (bottom-up):
 
-- Step 1: Base relations with rename
ρ c: customers (Customers)
ρ o: orders (Orders)
 
-- Step 2: Join operation
⋈ c.id = o.customer_id (c × o)
 
-- Step 3: Selection (WHERE clause)
σ o.status = 'completed' ∧ o.total > 1000 (joined)
 
-- Step 4: Projection (SELECT columns)
π c.name, o.total (selected)
 
-- Step 5: Duplicate elimination (DISTINCT)
δ (projected)
 
-- Step 6: Sorting (ORDER BY)
τ o.total DESC (distinct)
 
-- Step 7: Limit
λ 10 (sorted)
 
-- Complete algebra expression (nested notation):
λ₁₀ (τ_{total DESC} (δ (π_{c.name, o.total} (
    σ_{status='completed' ∧ total>1000} (
        ⋈_{c.id = o.customer_id} (ρ_c(Customers), ρ_o(Orders))
    )
))))

Translation Rules for Common SQL Clauses:

SQL Clause	Algebra Operator	Notes
`FROM table`	Base relation	Starting point
`FROM t1, t2`	t1 × t2 (cross product)	Implicit join
`JOIN ... ON`	⋈ condition	Theta join
`WHERE cond`	σ cond	Selection filter
`SELECT cols`	π cols	Projection
`SELECT DISTINCT`	δ (π cols)	Projection + dedup
`GROUP BY ... HAVING`	γ grouping, agg; σ having	Grouping with filter
`ORDER BY`	τ (sort)	May be optimized away
`UNION/EXCEPT/INTERSECT`	∪/−/∩	Set operations
`Subquery in FROM`	Nested algebra tree	Virtual relation

Order of Operations

SQL's conceptual order of operations defines the translation sequence: FROM (including JOINs) → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT. This order determines how algebra operators nest in the logical plan.

Subquery Translation

Subqueries add complexity to translation. Different subquery types require different handling strategies.

Subquery Types:

Subquery Classifications

•Scalar Subqueries — Return single value; used in SELECT or WHERE expressions. Translated to nested plans returning exactly one row/column.
•Table Subqueries — Return a relation; used in FROM clause. Become nested subtrees in the algebra expression.
•EXISTS Subqueries — Test for existence of rows. Translated to semi-join operators.
•IN/NOT IN Subqueries — Test membership in result set. Translated to semi-join or anti-join operators.
•Correlated Subqueries — Reference outer query columns. Require special handling (often decorrelation).
•Derived Tables/CTEs — Named subqueries in WITH clause. Translated to reusable subtrees.

subquery_translation.sql
SQL Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Scalar subquery (single value)
SELECT order_id, 
       (SELECT name FROM customers WHERE id = orders.customer_id) AS customer_name
FROM   orders;
-- Translation: Nested loop with scalar aggregation enforced
 
-- EXISTS subquery (semi-join)
SELECT *
FROM   customers c
WHERE  EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- Translation: Semi-join ⋉ (returns c rows that have matching o rows)
 
-- NOT EXISTS (anti-join)  
SELECT *
FROM   customers c
WHERE  NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- Translation: Anti-join ▷ (returns c rows with NO matching o rows)
 
-- IN subquery (also semi-join)
SELECT *
FROM   products
WHERE  category_id IN (SELECT id FROM categories WHERE active = true);
-- Translation: Semi-join on products.category_id = categories.id
 
-- Correlated subquery (references outer query)
SELECT c.name,
       (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count
FROM   customers c;
-- Translation: Decoration pattern or lateral join
-- Often "decorrelated" to a more efficient join form during optimization

Subquery Decorrelation:

Correlated subqueries execute once per outer row—potentially very expensive. The optimizer often "decorrelates" them into equivalent join expressions:

-- Correlated (slow: one subquery execution per customer)
SELECT c.name,
       (SELECT SUM(total) FROM orders WHERE customer_id = c.id) AS total
FROM   customers c;

-- Decorrelated (fast: single join operation)
SELECT c.name, COALESCE(o.total_sum, 0) AS total
FROM   customers c
       LEFT JOIN (
           SELECT customer_id, SUM(total) AS total_sum
           FROM orders
           GROUP BY customer_id
       ) o ON c.id = o.customer_id;

Decorrelation is a key optimization performed during or after translation.

View Expansion

When a query references a view, the translator must expand the view definition into the logical plan. Views are essentially named, stored queries that appear as virtual tables.

View Expansion Process:

Detect view reference during name resolution
Retrieve view definition from system catalog
Parse the view's query (if not already cached)
Substitute the view's algebra tree wherever the view is referenced
Merge/simplify the resulting tree during optimization

view_expansion_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- View definition
CREATE VIEW active_orders AS
SELECT o.order_id, o.customer_id, o.total, c.name AS customer_name
FROM   orders o
       JOIN customers c ON o.customer_id = c.id
WHERE  o.status = 'active';
 
-- Query using the view
SELECT customer_name, SUM(total) AS total_value
FROM   active_orders
WHERE  total > 100
GROUP BY customer_name;
 
-- After view expansion, equivalent to:
SELECT customer_name, SUM(total) AS total_value
FROM   (
    SELECT o.order_id, o.customer_id, o.total, c.name AS customer_name
    FROM   orders o
           JOIN customers c ON o.customer_id = c.id
    WHERE  o.status = 'active'
) AS active_orders
WHERE  total > 100
GROUP BY customer_name;
 
-- The optimizer can then merge and simplify:
SELECT c.name AS customer_name, SUM(o.total) AS total_value
FROM   orders o
       JOIN customers c ON o.customer_id = c.id
WHERE  o.status = 'active'
  AND  o.total > 100
GROUP BY c.name;

Materialized View Handling:

Unlike regular views, materialized views store computed results. Translation differs:

View Type	Translation Behavior
Regular View	Expand definition, optimize combined query
Materialized View	Reference stored data directly (like a table)
Materialized View (stale)	May expand or use stale data based on settings

Some databases perform query rewriting to use materialized views even when the original query doesn't reference them—if the materialized view can answer the query more efficiently.

View Performance

Views don't inherently add overhead—after expansion and optimization, the query plan may be identical to writing the query directly. However, complex nested views can obscure optimization opportunities. If performance matters, examine the actual execution plan, not the view nesting depth.

Query Tree / Logical Plan Representation

The translation phase produces a query tree (or logical plan tree) that represents the algebraic expression as a data structure. This tree flows to the optimizer for transformation.

Query Tree Properties:

Nodes represent operators (selection, projection, join, etc.)
Leaves represent base relations (tables, indexes)
Edges represent data flow (tuples flowing from child to parent)
Annotations include schemas, cardinalities, cost estimates

query_tree_structure.c
C (Conceptual)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
/* Logical plan node types */
typedef enum PlanNodeType {
    /* Scan operators */
    T_SeqScan,         /* Sequential table scan */
    T_IndexScan,       /* Index scan */
    T_SubqueryScan,    /* Scan subquery result */
    
    /* Join operators */
    T_Join,            /* Generic join (type specified in node) */
    T_NestLoop,        /* Nested loop join */
    T_MergeJoin,       /* Sort-merge join */
    T_HashJoin,        /* Hash join */
    
    /* Set operators */
    T_SetOp,           /* UNION/INTERSECT/EXCEPT */
    
    /* Transformation operators */
    T_Sort,            /* ORDER BY */
    T_Group,           /* GROUP BY */
    T_Agg,             /* Aggregation */
    T_Unique,          /* DISTINCT elimination */
    T_Limit,           /* LIMIT/OFFSET */
    
    /* Result operators */
    T_Result,          /* Project/compute expressions */
} PlanNodeType;
 
/* Generic plan node */
typedef struct Plan {
    PlanNodeType type;
    
    /* Cost estimates (filled by optimizer) */
    Cost startup_cost;     /* Cost before first tuple */
    Cost total_cost;       /* Cost for all tuples */
    double plan_rows;      /* Estimated row count */
    int plan_width;        /* Estimated row width in bytes */
    
    /* Output schema */
    List *targetlist;      /* Expressions to compute */
    List *qual;            /* Selection predicates */
    
    /* Child nodes */
    struct Plan *lefttree;
    struct Plan *righttree;
} Plan;
 
/* Join-specific node */
typedef struct Join {
    Plan plan;
    JoinType jointype;     /* INNER, LEFT, RIGHT, FULL, SEMI, ANTI */
    List *joinqual;        /* Join conditions */
} Join;
 
/* Aggregation node */
typedef struct Agg {
    Plan plan;
    AggStrategy aggstrategy;  /* PLAIN, SORTED, HASHED */
    int numCols;              /* Number of grouping columns */
    AttrNumber *grpColIdx;    /* Grouping column indices */
    List *aggfunctions;       /* Aggregate function calls */
} Agg;

Viewing Query Trees:

Most databases provide ways to inspect the logical plan:

-- PostgreSQL: View parse tree, rewritten query, plan
SET debug_print_parse = on;
SET debug_print_rewritten = on;
SET debug_print_plan = on;

-- MySQL: Show query tree
EXPLAIN FORMAT=TREE SELECT ...;

-- SQL Server: Show execution plan
SET SHOWPLAN_ALL ON;
GO
SELECT ...;
GO
SET SHOWPLAN_ALL OFF;

Understanding these representations helps debug unexpected query behavior.

Summary: Translation Bridges SQL and Algebra

We've explored the translation phase comprehensively. Let's consolidate the key concepts:

Key Takeaways

•Semantic Analysis — Translation validates query meaning, not just structure. Name resolution, type checking, and access control ensure the query is executable.
•System Catalog — Metadata lookup enables name resolution and type inference. Every identifier must map to a real database object.
•Type Coercion — When types don't match exactly, implicit conversions enable operations on compatible types while maintaining type safety.
•Relational Algebra — The mathematical foundation providing rigorous operator semantics, closure property, and transformation rules.
•Core Operators — Selection (σ), Projection (π), Join (⋈), and set operations form the building blocks of all query plans.
•Subquery Handling — Different subquery types translate to specialized operators; correlated subqueries often undergo decorrelation for efficiency.
•View Expansion — View definitions are substituted inline, then optimized as part of the containing query.
•Query Tree — The logical plan is a tree of operator nodes that flows to the optimizer for cost-based transformation.

What's Next:

With the logical query plan constructed, we enter the most intellectually rich phase: optimization. The next page examines how the query optimizer evaluates countless equivalent execution strategies to find the most efficient plan—using cost models, statistics, and sophisticated search algorithms.

Page Complete

You now understand how SQL statements transform into relational algebra expressions through semantic analysis and translation. This foundation prepares you for understanding query optimization—where the database finds the fastest path to your answer.

3 / 5

Loading learning content...

Database Management SystemsQuery Processing Overview

Query Processing Steps

LevelIntermediate

Duration60 mins

TopicQuery Processing Overview

3 / 5

Translation - From Parse Tree to Relational Algebra

Bridging Syntax and Semantics

Translation is where SQL meets database theory. The declarative "what I want" becomes a precise specification of "what operations produce it."

What You Will Learn

Semantic Analysis: Beyond Syntax

Syntax vs. Semantics:

-- Syntactically valid, semantically invalid:
SELECT unicorn_column FROM nonexistent_table WHERE 'hello' > 42;

This query parses successfully—the structure is correct. But semantic analysis will reject it:

Table nonexistent_table doesn't exist
Column unicorn_column is unknown
Comparing string to integer is a type error (in strict SQL)

Core Semantic Analysis Tasks:

Semantic Analysis Phases

•Name Resolution — Resolve table names to actual tables, column names to actual columns. Handle schema qualification, search paths, and aliases.
•Type Checking — Verify that operations are applied to compatible types. Check that comparison operands are comparable, function arguments match signatures.
•Access Control — Verify the user has permission to access referenced tables, columns, and execute functions.
•Constraint Validation — For DML statements, verify that the operation won't violate constraints, foreign keys, or triggers.
•Scope Analysis — Track aliases, subquery scope, correlation between outer and inner queries.
•Ambiguity Resolution — When column names appear in multiple tables, determine the correct source or report ambiguity errors.

Converting Mermaid diagram...

Name Resolution and Catalog Lookup

The System Catalog:

Every relational database maintains metadata tables describing all objects:

Tables and Views — Name, schema, owner, columns, storage parameters
Columns — Name, data type, nullability, default values, constraints
Indexes — Name, indexed columns, type (B-tree, hash, etc.)
Functions — Name, arguments, return type, implementation
Users and Roles — Permissions, grants, ownership
Constraints — Primary keys, foreign keys, check constraints

Resolution Process:

name_resolution_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Query to resolve:
SELECT o.order_id, c.customer_name, o.total
FROM   orders o
       JOIN customers c ON o.customer_id = c.id
WHERE  o.status = 'completed';
 
-- Name resolution must answer:
-- 1. Does table 'orders' exist? In which schema?
-- 2. Does table 'customers' exist? In which schema?
-- 3. Does 'orders' have columns: order_id, customer_id, total, status?
-- 4. Does 'customers' have columns: customer_name, id?
-- 5. Are aliases 'o' and 'c' unique within this query scope?
-- 6. Does current user have SELECT permission on these tables/columns?

Schema Search Path:

When a table name isn't schema-qualified (e.g., orders not sales.orders), the database searches schemas in a configured order:

-- PostgreSQL search path
SHOW search_path;
-- Result: "$user", public

-- Table 'orders' will be resolved by checking:
-- 1. schema = current_user (e.g., 'john')
-- 2. schema = 'public'
-- First match wins; if no match, error

-- Explicitly set search path
SET search_path = sales, inventory, public;

Alias Handling:

Aliases create temporary names within query scope. The resolver maintains a scope stack tracking which aliases are valid at each nesting level.

Ambiguous Column References

Type Checking and Coercion

After resolving names, the analyzer must verify that operations are type-safe. This involves checking that operators and functions receive compatible argument types.

Type Checking Rules:

Comparison Operations — Operands must be of compatible types (both numeric, both strings, etc.)
Arithmetic Operations — Operands must be numeric (integer, float, decimal)
Function Calls — Arguments must match the function's parameter types
Assignment — Values must be compatible with target column types
Aggregate Functions — Input types must match function expectations

Type Coercion:

When types don't exactly match but are compatible, the database applies implicit type coercion (also called type casting or type promotion):

Common Implicit Type Coercions
From Type	To Type	Example	Notes
INTEGER	BIGINT	`INT + BIGINT`	Promote to wider type
INTEGER	FLOAT	`INT * FLOAT`	Promote to float
VARCHAR	TEXT	`VARCHAR = TEXT`	Promote to longer text
DATE	TIMESTAMP	`DATE < TIMESTAMP`	Add midnight time
INTEGER	VARCHAR	`INT \\|\\| VARCHAR`	Concatenation context
NULL	(any)	Comparison with NULL	NULL propagation

type_coercion_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Implicit coercion examples
 
-- Integer to float in arithmetic
SELECT 42 + 3.14;               -- Result: 45.14 (INT promoted to FLOAT)
 
-- Integer to string in concatenation
SELECT 'Order #' || 12345;      -- Result: 'Order #12345'
 
-- Date to timestamp in comparison
SELECT * FROM events 
WHERE event_time > '2024-01-15';  -- String coerced to timestamp
 
-- Integer to decimal for precision
SELECT 100 / 3;                  -- Integer division: 33
SELECT 100.0 / 3;                -- Decimal division: 33.333...
SELECT CAST(100 AS DECIMAL) / 3; -- Explicit cast
 
-- Type errors (depending on database strictness)
SELECT 'hello' + 5;             -- Error in strict SQL
SELECT 'hello' < 100;           -- Error: incompatible types
 
-- Explicit casts when implicit coercion isn't available
SELECT CAST(quantity AS VARCHAR) || ' units' FROM inventory;
SELECT order_date::TEXT FROM orders;  -- PostgreSQL-specific syntax

Type Inference:

For expressions and subqueries, the analyzer infers result types by examining the operations:

-- Inferred types for SELECT expressions
SELECT 
    order_id,                    -- Type: from catalog (INTEGER)
    quantity * price AS total,   -- Type: numeric (from * operator)
    UPPER(customer_name),        -- Type: TEXT (from function signature)
    order_date + INTERVAL '7 days' -- Type: TIMESTAMP (date + interval)
FROM orders;

Type information propagates through the query tree, ensuring every expression has a known type before execution planning begins.

Avoiding Type Coercion Surprises

Relational Algebra Fundamentals

Why Relational Algebra?

Mathematical Precision — Each operation has rigorous semantics
Closure Property — Operations on relations produce relations (enabling composition)
Equivalence Rules — Algebraic laws enable query optimization
Implementation Independence — Logical operators abstract away physical details

Core Relational Algebra Operators:

Fundamental Relational Algebra Operations
Operator	Symbol	Description	SQL Equivalent
Selection	σ (sigma)	Filter rows matching a predicate	`WHERE` clause
Projection	π (pi)	Choose specific columns	`SELECT` column list
Cartesian Product	× (cross)	All combinations of rows	`CROSS JOIN`
Join	⋈ (bowtie)	Combine related rows	`[INNER] JOIN ... ON`
Union	∪	Combine row sets	`UNION`
Difference	−	Rows in first but not second	`EXCEPT`
Intersection	∩	Rows in both sets	`INTERSECT`
Rename	ρ (rho)	Rename relation or attributes	`AS` aliases

Extended Operators for SQL Features:

Standard relational algebra is extended to support SQL's full feature set:

Extended Operator	Purpose	SQL Feature
Aggregation (γ)	Group-by and aggregate functions	`GROUP BY`, `SUM()`, `COUNT()`
Outer Join (⟕⟖⟗)	Preserve unmatched rows	`LEFT/RIGHT/FULL OUTER JOIN`
Duplicate Elimination (δ)	Remove duplicate rows	`DISTINCT`
Sorting (τ)	Order result rows	`ORDER BY`
Limit (λ)	Restrict result size	`LIMIT`, `OFFSET`

Converting Mermaid diagram...

SQL to Relational Algebra Translation

The translator systematically converts SQL constructs into relational algebra operators. Each SQL clause maps to specific algebra operations.

Translation Rules:

translation_examples.sql

SQL → Algebra

-- SQL Query
SELECT DISTINCT c.name, o.total
FROM   customers c
       JOIN orders o ON c.id = o.customer_id
WHERE  o.status = 'completed'
  AND  o.total > 1000
ORDER BY o.total DESC
LIMIT 10;
 
-- Relational Algebra Translation (bottom-up):
 
-- Step 1: Base relations with rename
ρ c: customers (Customers)
ρ o: orders (Orders)
 
-- Step 2: Join operation
⋈ c.id = o.customer_id (c × o)
 
-- Step 3: Selection (WHERE clause)
σ o.status = 'completed' ∧ o.total > 1000 (joined)
 
-- Step 4: Projection (SELECT columns)
π c.name, o.total (selected)
 
-- Step 5: Duplicate elimination (DISTINCT)
δ (projected)
 
-- Step 6: Sorting (ORDER BY)
τ o.total DESC (distinct)
 
-- Step 7: Limit
λ 10 (sorted)
 
-- Complete algebra expression (nested notation):
λ₁₀ (τ_{total DESC} (δ (π_{c.name, o.total} (
    σ_{status='completed' ∧ total>1000} (
        ⋈_{c.id = o.customer_id} (ρ_c(Customers), ρ_o(Orders))
    )
))))

Translation Rules for Common SQL Clauses:

SQL Clause	Algebra Operator	Notes
`FROM table`	Base relation	Starting point
`FROM t1, t2`	t1 × t2 (cross product)	Implicit join
`JOIN ... ON`	⋈ condition	Theta join
`WHERE cond`	σ cond	Selection filter
`SELECT cols`	π cols	Projection
`SELECT DISTINCT`	δ (π cols)	Projection + dedup
`GROUP BY ... HAVING`	γ grouping, agg; σ having	Grouping with filter
`ORDER BY`	τ (sort)	May be optimized away
`UNION/EXCEPT/INTERSECT`	∪/−/∩	Set operations
`Subquery in FROM`	Nested algebra tree	Virtual relation

Order of Operations

Subquery Translation

Subqueries add complexity to translation. Different subquery types require different handling strategies.

Subquery Types:

Subquery Classifications

•Scalar Subqueries — Return single value; used in SELECT or WHERE expressions. Translated to nested plans returning exactly one row/column.
•Table Subqueries — Return a relation; used in FROM clause. Become nested subtrees in the algebra expression.
•EXISTS Subqueries — Test for existence of rows. Translated to semi-join operators.
•IN/NOT IN Subqueries — Test membership in result set. Translated to semi-join or anti-join operators.
•Correlated Subqueries — Reference outer query columns. Require special handling (often decorrelation).
•Derived Tables/CTEs — Named subqueries in WITH clause. Translated to reusable subtrees.

subquery_translation.sql
SQL Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Scalar subquery (single value)
SELECT order_id, 
       (SELECT name FROM customers WHERE id = orders.customer_id) AS customer_name
FROM   orders;
-- Translation: Nested loop with scalar aggregation enforced
 
-- EXISTS subquery (semi-join)
SELECT *
FROM   customers c
WHERE  EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- Translation: Semi-join ⋉ (returns c rows that have matching o rows)
 
-- NOT EXISTS (anti-join)  
SELECT *
FROM   customers c
WHERE  NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- Translation: Anti-join ▷ (returns c rows with NO matching o rows)
 
-- IN subquery (also semi-join)
SELECT *
FROM   products
WHERE  category_id IN (SELECT id FROM categories WHERE active = true);
-- Translation: Semi-join on products.category_id = categories.id
 
-- Correlated subquery (references outer query)
SELECT c.name,
       (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count
FROM   customers c;
-- Translation: Decoration pattern or lateral join
-- Often "decorrelated" to a more efficient join form during optimization

Subquery Decorrelation:

Correlated subqueries execute once per outer row—potentially very expensive. The optimizer often "decorrelates" them into equivalent join expressions:

-- Correlated (slow: one subquery execution per customer)
SELECT c.name,
       (SELECT SUM(total) FROM orders WHERE customer_id = c.id) AS total
FROM   customers c;

-- Decorrelated (fast: single join operation)
SELECT c.name, COALESCE(o.total_sum, 0) AS total
FROM   customers c
       LEFT JOIN (
           SELECT customer_id, SUM(total) AS total_sum
           FROM orders
           GROUP BY customer_id
       ) o ON c.id = o.customer_id;

Decorrelation is a key optimization performed during or after translation.

View Expansion

When a query references a view, the translator must expand the view definition into the logical plan. Views are essentially named, stored queries that appear as virtual tables.

View Expansion Process:

Detect view reference during name resolution
Retrieve view definition from system catalog
Parse the view's query (if not already cached)
Substitute the view's algebra tree wherever the view is referenced
Merge/simplify the resulting tree during optimization

view_expansion_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- View definition
CREATE VIEW active_orders AS
SELECT o.order_id, o.customer_id, o.total, c.name AS customer_name
FROM   orders o
       JOIN customers c ON o.customer_id = c.id
WHERE  o.status = 'active';
 
-- Query using the view
SELECT customer_name, SUM(total) AS total_value
FROM   active_orders
WHERE  total > 100
GROUP BY customer_name;
 
-- After view expansion, equivalent to:
SELECT customer_name, SUM(total) AS total_value
FROM   (
    SELECT o.order_id, o.customer_id, o.total, c.name AS customer_name
    FROM   orders o
           JOIN customers c ON o.customer_id = c.id
    WHERE  o.status = 'active'
) AS active_orders
WHERE  total > 100
GROUP BY customer_name;
 
-- The optimizer can then merge and simplify:
SELECT c.name AS customer_name, SUM(o.total) AS total_value
FROM   orders o
       JOIN customers c ON o.customer_id = c.id
WHERE  o.status = 'active'
  AND  o.total > 100
GROUP BY c.name;

Materialized View Handling:

Unlike regular views, materialized views store computed results. Translation differs:

View Type	Translation Behavior
Regular View	Expand definition, optimize combined query
Materialized View	Reference stored data directly (like a table)
Materialized View (stale)	May expand or use stale data based on settings

Some databases perform query rewriting to use materialized views even when the original query doesn't reference them—if the materialized view can answer the query more efficiently.

View Performance

Query Tree / Logical Plan Representation

The translation phase produces a query tree (or logical plan tree) that represents the algebraic expression as a data structure. This tree flows to the optimizer for transformation.

Query Tree Properties:

Nodes represent operators (selection, projection, join, etc.)
Leaves represent base relations (tables, indexes)
Edges represent data flow (tuples flowing from child to parent)
Annotations include schemas, cardinalities, cost estimates

query_tree_structure.c
C (Conceptual)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
/* Logical plan node types */
typedef enum PlanNodeType {
    /* Scan operators */
    T_SeqScan,         /* Sequential table scan */
    T_IndexScan,       /* Index scan */
    T_SubqueryScan,    /* Scan subquery result */
    
    /* Join operators */
    T_Join,            /* Generic join (type specified in node) */
    T_NestLoop,        /* Nested loop join */
    T_MergeJoin,       /* Sort-merge join */
    T_HashJoin,        /* Hash join */
    
    /* Set operators */
    T_SetOp,           /* UNION/INTERSECT/EXCEPT */
    
    /* Transformation operators */
    T_Sort,            /* ORDER BY */
    T_Group,           /* GROUP BY */
    T_Agg,             /* Aggregation */
    T_Unique,          /* DISTINCT elimination */
    T_Limit,           /* LIMIT/OFFSET */
    
    /* Result operators */
    T_Result,          /* Project/compute expressions */
} PlanNodeType;
 
/* Generic plan node */
typedef struct Plan {
    PlanNodeType type;
    
    /* Cost estimates (filled by optimizer) */
    Cost startup_cost;     /* Cost before first tuple */
    Cost total_cost;       /* Cost for all tuples */
    double plan_rows;      /* Estimated row count */
    int plan_width;        /* Estimated row width in bytes */
    
    /* Output schema */
    List *targetlist;      /* Expressions to compute */
    List *qual;            /* Selection predicates */
    
    /* Child nodes */
    struct Plan *lefttree;
    struct Plan *righttree;
} Plan;
 
/* Join-specific node */
typedef struct Join {
    Plan plan;
    JoinType jointype;     /* INNER, LEFT, RIGHT, FULL, SEMI, ANTI */
    List *joinqual;        /* Join conditions */
} Join;
 
/* Aggregation node */
typedef struct Agg {
    Plan plan;
    AggStrategy aggstrategy;  /* PLAIN, SORTED, HASHED */
    int numCols;              /* Number of grouping columns */
    AttrNumber *grpColIdx;    /* Grouping column indices */
    List *aggfunctions;       /* Aggregate function calls */
} Agg;

Viewing Query Trees:

Most databases provide ways to inspect the logical plan:

-- PostgreSQL: View parse tree, rewritten query, plan
SET debug_print_parse = on;
SET debug_print_rewritten = on;
SET debug_print_plan = on;

-- MySQL: Show query tree
EXPLAIN FORMAT=TREE SELECT ...;

-- SQL Server: Show execution plan
SET SHOWPLAN_ALL ON;
GO
SELECT ...;
GO
SET SHOWPLAN_ALL OFF;

Understanding these representations helps debug unexpected query behavior.

Summary: Translation Bridges SQL and Algebra

We've explored the translation phase comprehensively. Let's consolidate the key concepts:

Key Takeaways

•Semantic Analysis — Translation validates query meaning, not just structure. Name resolution, type checking, and access control ensure the query is executable.
•System Catalog — Metadata lookup enables name resolution and type inference. Every identifier must map to a real database object.
•Type Coercion — When types don't match exactly, implicit conversions enable operations on compatible types while maintaining type safety.
•Relational Algebra — The mathematical foundation providing rigorous operator semantics, closure property, and transformation rules.
•Core Operators — Selection (σ), Projection (π), Join (⋈), and set operations form the building blocks of all query plans.
•Subquery Handling — Different subquery types translate to specialized operators; correlated subqueries often undergo decorrelation for efficiency.
•View Expansion — View definitions are substituted inline, then optimized as part of the containing query.
•Query Tree — The logical plan is a tree of operator nodes that flows to the optimizer for cost-based transformation.

What's Next:

Page Complete

3 / 5