Query ProcessingQuery Representation

Query Representation

LevelIntermediate

Duration60 mins

TopicQuery Representation

2 / 5

Query Graphs

Seeing Queries Differently

Relational algebra trees represent queries as hierarchical structures—operations flowing from leaves to root. But this tree-centric view obscures something critical: the relationships between tables.

Consider a query joining five tables. In a tree representation, those tables appear as leaves, and their connections are encoded in join nodes scattered throughout the tree. To understand how tables relate—which tables share join conditions, which predicates apply to which tables—you must mentally reconstruct the relationships by traversing the tree.

Query graphs flip this perspective. Instead of emphasizing the computation (operations flowing through nodes), they emphasize the structure (tables and their connections). This structural view proves invaluable for join ordering, predicate placement, and understanding query complexity at a glance.

What You Will Learn

By the end of this page, you will understand query graphs, their construction from SQL, their role in optimization, and how they complement relational algebra trees. You'll learn to read and reason about query structure in a way that directly informs optimization decisions.

Query Graph Fundamentals

A query graph is a graph-based representation of a SQL query where:

Nodes represent relations (base tables, views, or subqueries)
Edges represent join predicates connecting relations
Additional annotations represent selection predicates, projections, and other conditions

Unlike the hierarchical relational algebra tree, the query graph is a flat structure that shows all tables and their interconnections simultaneously.

Formal Definition:

A query graph G = (V, E, P) where:

V = set of relation nodes (one per table/subquery in FROM clause)
E = set of edges, where each edge connects tables sharing a join predicate
P = set of predicates (both join predicates on edges and selection predicates on nodes)

Query Graph Advantages

•Visual clarity: See all table relationships at once
•Join structure evident: Graph shape reveals query complexity
•Join order flexibility: No implicit ordering—all orders visible
•Cycle detection: Circular joins immediately apparent
•Predicate placement: Clear which conditions apply where

Query Graph Limitations

•Not execution-ready: Doesn't specify operation order
•Aggregation complexity: Hard to represent GROUP BY naturally
•Subquery representation: Nested queries need special handling
•Set operations: UNION, INTERSECT not naturally expressed
•Ordering: ORDER BY separate from graph structure

Complementary Representations:

Query graphs and relational algebra trees are not competitors—they're complementary tools:

Query graphs excel at representing static structure: What tables exist? How are they connected?
Algebra trees excel at representing dynamic computation: What operations in what order?

Optimizers typically maintain both representations, using query graphs for structural analysis (especially join ordering) and algebra trees for transformation and execution planning.

Constructing Query Graphs

Query graph construction follows a systematic process of extracting entities (relations) and relationships (join conditions) from SQL.

Construction Algorithm:

Create a node for each table in the FROM clause (including joined tables and subqueries)
Analyze each predicate in WHERE and JOIN...ON clauses
Classify predicates:
- Join predicate: References columns from 2+ tables → becomes an edge
- Selection predicate: References columns from 1 table → annotates that node
- Complex predicate: May need decomposition
Add edges between tables connected by join predicates
Label edges and nodes with their respective predicates

Example Construction:

query-graph-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Original Query
SELECT c.name, o.order_id, p.product_name, oi.quantity
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE c.region = 'West'
  AND o.order_date > '2024-01-01'
  AND p.category = 'Electronics';
 
-- Step 1: Identify nodes (tables)
-- Nodes: customers (c), orders (o), order_items (oi), products (p)
 
-- Step 2-3: Classify predicates
-- Join predicates:
--   c.customer_id = o.customer_id     → edge: c -- o
--   o.order_id = oi.order_id          → edge: o -- oi
--   oi.product_id = p.product_id      → edge: oi -- p
-- Selection predicates:
--   c.region = 'West'                 → annotates node c
--   o.order_date > '2024-01-01'       → annotates node o
--   p.category = 'Electronics'        → annotates node p
 
-- Step 4-5: Build graph with labeled edges and nodes

Converting Mermaid diagram...

Analysis of the Example Graph:

This query graph immediately reveals several properties:

Chain topology: The tables form a linear chain (c → o → oi → p). This is the most common topology for transactional queries following entity relationships.
No cycles: Each table connects to at most two others, with no circular paths. This simplifies join ordering.
Predicate distribution: Selection predicates appear on three of four tables (c, o, p), indicating opportunity for early filtering.
Critical path: Any join order must respect the chain—you can't directly join customers to products without intermediate tables.
Selectivity opportunities: With predicates on c, o, and p, starting from the most selective table could minimize intermediate result sizes.

Reading Graph Topology

The shape of a query graph tells you about query complexity. Chains are simple. Stars (one central table connected to many) are common in data warehouses. Cycles indicate complex relationships requiring careful handling. Dense graphs with many cross-connections suggest expensive optimizations.

Graph Topologies and Their Implications

Query graph topology—the pattern of connections between tables—has profound implications for optimization difficulty, join algorithm selection, and potential execution strategies.

Common Topology Patterns:

Chain (Linear) Topology:

T1 — T2 — T3 — T4 — ... — Tn

Tables form a single path with no branching. Each table connects to at most two neighbors.

Chain Characteristics

•Join order options: Catalan number growth—C(n) for n tables, but still polynomial compared to general case
•Common occurrence: Following foreign key chains (order → order_items → products)
•Optimization: Dynamic programming finds optimal order efficiently
•Physical execution: Left-deep plans work well; pipelining possible
•Example prevalence: ~60% of real-world OLTP queries

Topology Complexity Comparison
Topology	Join Orders	Optimization Complexity	Common Domain
Chain (n tables)	O(2^n)	Low—DP efficient	OLTP, transactional
Star (n dimensions)	O(n!)	Medium—can exploit structure	OLAP, data warehouse
Snowflake	Varies	Medium-High	Analytics, normalized DW
General/Cyclic	O(n!)	High—NP-hard in general	Complex applications
Clique (all-to-all)	O(n!)	Very High	Rare—usually design issue

Predicates in Query Graphs

Predicates—the boolean conditions in WHERE, HAVING, and JOIN clauses—play a central role in query graphs. Their classification and placement determine filtering opportunities and join optimization.

Predicate Classification

•Local (Selection) Predicates: Reference only one table. Examples: customer.region = 'West', order.amount > 100. Placed as annotations on the corresponding node. Can be evaluated immediately after/during table access.
•Join Predicates: Reference exactly two tables. Examples: order.customer_id = customer.id. Become edges in the query graph. Define the graph structure itself.
•Complex Join Predicates: Reference more than two tables. Example: a.x + b.y > c.z. Rarely occur; require special handling. Often indicate opportunity for query rewriting.
•Post-Join Predicates: Can only be evaluated after certain joins complete. Example: Predicate on computed join result. May float to different levels during optimization.
•Correlated Predicates: Reference outer query in subqueries. Create cross-level dependencies in nested query graphs.

Predicate Decomposition:

Complex predicates often need decomposition before placement. Consider:

WHERE (a.x = b.y AND a.region = 'East')
   OR (c.z > 100)

This predicate cannot be simply placed because:

The OR spans multiple tables (a, b, c)
Converting to CNF (Conjunctive Normal Form) may help but can explode in size
Some conditions can still be extracted (e.g., if a.region is in a subexpression)

Optimizers use predicate extraction rules to identify portions that can be evaluated early.

predicate-analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Predicate Classification Example
 
SELECT *
FROM orders o, customers c, products p, order_items oi
WHERE 
    -- Join predicates (become edges)
    o.customer_id = c.id                    -- Edge: orders — customers
    AND o.order_id = oi.order_id            -- Edge: orders — order_items
    AND oi.product_id = p.id                -- Edge: order_items — products
    
    -- Selection predicates (node annotations)
    AND c.status = 'active'                 -- Node: customers
    AND o.order_date > CURRENT_DATE - 30    -- Node: orders
    AND p.price > 50                        -- Node: products
    
    -- Post-join predicate (evaluated after join)
    AND o.total > c.credit_limit * 0.8      -- Involves both orders AND customers
                                            -- Must be evaluated after o-c join
 
-- The last predicate references both 'o' and 'c', so it cannot be placed
-- on either node independently. It becomes an annotation on the edge,
-- or is evaluated after the join produces combined rows.

Selectivity Analysis

Each predicate has a selectivity—the fraction of rows it eliminates. Highly selective predicates (e.g., filtering to one customer by ID) should be applied as early as possible. Query graphs help identify which predicates can be pushed to base table access and which must wait for joins.

Query Graphs for Join Ordering

The primary use of query graphs in optimization is join ordering—determining the sequence in which tables should be joined. This is one of the most critical optimization decisions, as different orderings can differ in cost by orders of magnitude.

The Join Ordering Problem:

For n tables, there are:

Without restrictions: (2n-2)! / (n-1)! possible orderings (~4^n)
With only left-deep trees: n! orderings
Still exponential, but manageable with dynamic programming

The query graph constrains possibilities: you can only join tables connected by an edge (or their joined results).

Converting Mermaid diagram...

Using the Graph for Ordering:

Given the chain A — B — C — D, possible join orderings include:

Left-to-right: ((A ⋈ B) ⋈ C) ⋈ D
Right-to-left: A ⋈ (B ⋈ (C ⋈ D))
Middle-out: (A ⋈ B) ⋈ (C ⋈ D)
Start from smallest: C ⋈ D, then with B, then with A

The graph structure shows which orderings are valid—you cannot join A directly with D without first involving B and C.

Dynamic Programming Approach:

Optimizers use the graph to enumerate valid subsets:

Start with single-table access costs
For each pair of connected tables, compute join costs
Build up to larger subsets, always respecting connectivity
Track best plan found for each subset of tables
Final answer: best plan for complete table set

Join Order Cost Comparison (Example)
Join Order	Estimated Rows	Estimated Cost	Notes
(((A ⋈ B) ⋈ C) ⋈ D)	500K → 5K → 2K	High	B is large—early join produces large intermediate
(((C ⋈ D) ⋈ B) ⋈ A)	8K → 40K → 35K	Medium	Start small, grow moderately
((A ⋈ B) ⋈ (C ⋈ D))	500K ⋈ 8K	Highest	Two large intermediates for final join
(((C ⋈ B) ⋈ A) ⋈ D)	10K → 80K → 40K	Medium-Low	C as anchor reduces B join size
Start from most selective	Depends on predicates	Often Lowest	Filter early, join filtered results

Cardinality Estimation is Key

Join ordering decisions depend entirely on cardinality estimates. The optimizer needs to predict how many rows each intermediate join produces. Errors in these estimates—from stale statistics or correlated columns—can lead to catastrophically bad plans.

Query Graph Extensions

Basic query graphs represent simple join queries. Real databases extend the model to handle more complex SQL features.

Query Graph Extensions

•Hypergraph Representation: For predicates involving 3+ tables, use hyperedges connecting multiple nodes instead of simple edges. Enables representing complex constraints.
•Outer Join Edges: Distinguish between inner joins (symmetric) and outer joins (directional). Left outer: A ←left B means A's rows preserved even without match in B.
•Subquery Nodes: Treat subqueries as derived relation nodes. The subquery's internal structure may have its own query graph, creating nested representations.
•Aggregate Boundaries: Mark nodes where aggregation must occur. Affects when predicates can be pushed and which optimizations apply.
•Correlation Edges: For correlated subqueries, edges from outer query to inner query represent data dependencies. Affects execution strategy (decorrelation opportunities).
•Anti-Join Edges: For NOT EXISTS / NOT IN subqueries, special edge type indicates anti-join semantics. Different physical operators apply.

complex-query-graph.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Query with outer join and subquery
SELECT c.name, 
       o.order_id,
       (SELECT MAX(oi.price) 
        FROM order_items oi 
        WHERE oi.order_id = o.order_id) as max_item_price
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.region = 'West'
  AND (o.order_id IS NULL OR o.total > 100);
 
-- Extended Query Graph Representation:
-- 
-- Nodes:
--   [customers] with selection: region = 'West'
--   [orders] with selection: order_id IS NULL OR total > 100
--   [order_items] - in scalar subquery
--
-- Edges:
--   customers ←LEFT_OUTER→ orders (customer_id = customer_id)
--   orders ⟿ order_items (correlated: oi.order_id = o.order_id)
--                          (scalar aggregate subquery)
--
-- The LEFT_OUTER edge indicates orders may be NULL
-- The correlated subquery edge shows data dependency
-- Aggregate boundary exists at the subquery level

Outer Join Complexity:

Outer joins complicate query graphs because they are not symmetric:

Inner join: A ⋈ B = B ⋈ A (commutative)
Outer join: A LEFT JOIN B ≠ B LEFT JOIN A (not commutative)

This restricts join reordering. You cannot freely reorder outer joins like inner joins. The query graph must track edge direction and type to ensure valid transformations.

Association patterns for outer joins:

(A LEFT JOIN B) LEFT JOIN C ≠ A LEFT JOIN (B LEFT JOIN C) in general
Special nullability conditions determine when reordering is valid
Optimizers maintain "null-rejecting" predicate information to enable some reorderings

Outer Joins Constrain Optimization

A query with multiple outer joins has far fewer valid join orderings than an equivalent inner-join query. This is a significant optimization constraint. Understanding this helps explain why some queries are harder to optimize and why execution plans may seem suboptimal.

From Query Graph to Algebra Tree

Query graphs and relational algebra trees work together in the optimization process. The query graph provides structural insight for decisions like join ordering, while the algebra tree provides the actual execution structure.

The Transformation Process:

graph-to-tree-transformation.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Conceptual algorithm: Query Graph → Relational Algebra Tree
 
function graphToTree(queryGraph: QueryGraph): AlgebraTree {
    // 1. Determine join order using query graph
    //    (e.g., via dynamic programming on graph structure)
    const joinOrder = findOptimalJoinOrder(queryGraph);
    
    // 2. Build tree bottom-up from join order
    let tree = null;
    for (const step of joinOrder) {
        if (tree === null) {
            // First step: access base table with local predicates
            tree = createScan(step.table, step.localPredicates);
        } else {
            // Join current tree with next table
            const nextScan = createScan(step.table, step.localPredicates);
            tree = createJoin(tree, nextScan, step.joinPredicate);
        }
    }
    
    // 3. Add remaining operations (aggregation, sorting, projection)
    if (queryGraph.hasGroupBy) {
        tree = createGroupBy(tree, queryGraph.groupingCols, queryGraph.aggregates);
    }
    if (queryGraph.hasHaving) {
        tree = createSelect(tree, queryGraph.havingPredicate);
    }
    if (queryGraph.hasOrderBy) {
        tree = createSort(tree, queryGraph.orderByColumns);
    }
    tree = createProject(tree, queryGraph.selectList);
    
    return tree;
}
 
// The result is a fully-formed relational algebra tree
// ready for physical planning and execution

Converting Mermaid diagram...

Key Transformation Points:

Graph remains undirected for joins: The graph doesn't dictate order; the optimizer chooses based on cost.
Tree becomes strictly ordered: Once transformed, the tree specifies exact operation sequence.
Predicates placed appropriately: Local predicates become selection nodes (often pushed to scans); join predicates become join node conditions.
Topology constrains but doesn't determine: The graph's structure limits valid orderings, but cost estimates determine the actual choice.
Both representations remain useful: The graph for understanding structure; the tree for executing operations.

Summary and Next Steps

Query graphs provide a structural lens for understanding and optimizing queries. While relational algebra trees capture computation, query graphs reveal relationships.

Key Takeaways

•Query graphs represent tables as nodes and join predicates as edges, showing query structure at a glance.
•Topology matters: Chain, star, snowflake, and cyclic graphs have different optimization characteristics and complexity.
•Predicates classify into local (selection), join (edges), and complex (multi-table) categories, each with placement implications.
•Join ordering is the primary application—the graph constrains valid orderings while cost models select among them.
•Extensions handle outer joins, subqueries, and correlations by adding edge types and nesting.
•Graphs complement trees: Use graphs for structural analysis, trees for execution specification.
•Graph → Tree transformation is the bridge from understanding structure to planning execution.

What's Next:

With relational algebra trees and query graphs understood, we turn to logical plans—the comprehensive representation that combines algebraic operations with optimization annotations. Logical plans are the working representation optimizers manipulate, transform, and refine into efficient execution strategies.

Page Complete

You now understand query graphs and their role in representing query structure. This knowledge helps you visualize complex queries, understand join ordering decisions, and reason about optimization possibilities. Next: Logical Plans and their transformation.

2 / 5

Loading learning content...

Query ProcessingQuery Representation

Query Representation

LevelIntermediate

Duration60 mins

TopicQuery Representation

2 / 5

Query Graphs

Seeing Queries Differently

What You Will Learn

Query Graph Fundamentals

A query graph is a graph-based representation of a SQL query where:

Nodes represent relations (base tables, views, or subqueries)
Edges represent join predicates connecting relations
Additional annotations represent selection predicates, projections, and other conditions

Unlike the hierarchical relational algebra tree, the query graph is a flat structure that shows all tables and their interconnections simultaneously.

Formal Definition:

A query graph G = (V, E, P) where:

V = set of relation nodes (one per table/subquery in FROM clause)
E = set of edges, where each edge connects tables sharing a join predicate
P = set of predicates (both join predicates on edges and selection predicates on nodes)

Query Graph Advantages

•Visual clarity: See all table relationships at once
•Join structure evident: Graph shape reveals query complexity
•Join order flexibility: No implicit ordering—all orders visible
•Cycle detection: Circular joins immediately apparent
•Predicate placement: Clear which conditions apply where

Query Graph Limitations

•Not execution-ready: Doesn't specify operation order
•Aggregation complexity: Hard to represent GROUP BY naturally
•Subquery representation: Nested queries need special handling
•Set operations: UNION, INTERSECT not naturally expressed
•Ordering: ORDER BY separate from graph structure

Complementary Representations:

Query graphs and relational algebra trees are not competitors—they're complementary tools:

Query graphs excel at representing static structure: What tables exist? How are they connected?
Algebra trees excel at representing dynamic computation: What operations in what order?

Optimizers typically maintain both representations, using query graphs for structural analysis (especially join ordering) and algebra trees for transformation and execution planning.

Constructing Query Graphs

Query graph construction follows a systematic process of extracting entities (relations) and relationships (join conditions) from SQL.

Construction Algorithm:

Create a node for each table in the FROM clause (including joined tables and subqueries)
Analyze each predicate in WHERE and JOIN...ON clauses
Classify predicates:
- Join predicate: References columns from 2+ tables → becomes an edge
- Selection predicate: References columns from 1 table → annotates that node
- Complex predicate: May need decomposition
Add edges between tables connected by join predicates
Label edges and nodes with their respective predicates

Example Construction:

query-graph-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Original Query
SELECT c.name, o.order_id, p.product_name, oi.quantity
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE c.region = 'West'
  AND o.order_date > '2024-01-01'
  AND p.category = 'Electronics';
 
-- Step 1: Identify nodes (tables)
-- Nodes: customers (c), orders (o), order_items (oi), products (p)
 
-- Step 2-3: Classify predicates
-- Join predicates:
--   c.customer_id = o.customer_id     → edge: c -- o
--   o.order_id = oi.order_id          → edge: o -- oi
--   oi.product_id = p.product_id      → edge: oi -- p
-- Selection predicates:
--   c.region = 'West'                 → annotates node c
--   o.order_date > '2024-01-01'       → annotates node o
--   p.category = 'Electronics'        → annotates node p
 
-- Step 4-5: Build graph with labeled edges and nodes

Converting Mermaid diagram...

Analysis of the Example Graph:

This query graph immediately reveals several properties:

Chain topology: The tables form a linear chain (c → o → oi → p). This is the most common topology for transactional queries following entity relationships.
No cycles: Each table connects to at most two others, with no circular paths. This simplifies join ordering.
Predicate distribution: Selection predicates appear on three of four tables (c, o, p), indicating opportunity for early filtering.
Critical path: Any join order must respect the chain—you can't directly join customers to products without intermediate tables.
Selectivity opportunities: With predicates on c, o, and p, starting from the most selective table could minimize intermediate result sizes.

Reading Graph Topology

Graph Topologies and Their Implications

Query graph topology—the pattern of connections between tables—has profound implications for optimization difficulty, join algorithm selection, and potential execution strategies.

Common Topology Patterns:

Chain (Linear) Topology:

T1 — T2 — T3 — T4 — ... — Tn

Tables form a single path with no branching. Each table connects to at most two neighbors.

Chain Characteristics

•Join order options: Catalan number growth—C(n) for n tables, but still polynomial compared to general case
•Common occurrence: Following foreign key chains (order → order_items → products)
•Optimization: Dynamic programming finds optimal order efficiently
•Physical execution: Left-deep plans work well; pipelining possible
•Example prevalence: ~60% of real-world OLTP queries

Topology Complexity Comparison
Topology	Join Orders	Optimization Complexity	Common Domain
Chain (n tables)	O(2^n)	Low—DP efficient	OLTP, transactional
Star (n dimensions)	O(n!)	Medium—can exploit structure	OLAP, data warehouse
Snowflake	Varies	Medium-High	Analytics, normalized DW
General/Cyclic	O(n!)	High—NP-hard in general	Complex applications
Clique (all-to-all)	O(n!)	Very High	Rare—usually design issue

Predicates in Query Graphs

Predicate Classification

•Local (Selection) Predicates: Reference only one table. Examples: customer.region = 'West', order.amount > 100. Placed as annotations on the corresponding node. Can be evaluated immediately after/during table access.
•Join Predicates: Reference exactly two tables. Examples: order.customer_id = customer.id. Become edges in the query graph. Define the graph structure itself.
•Complex Join Predicates: Reference more than two tables. Example: a.x + b.y > c.z. Rarely occur; require special handling. Often indicate opportunity for query rewriting.
•Post-Join Predicates: Can only be evaluated after certain joins complete. Example: Predicate on computed join result. May float to different levels during optimization.
•Correlated Predicates: Reference outer query in subqueries. Create cross-level dependencies in nested query graphs.

Predicate Decomposition:

Complex predicates often need decomposition before placement. Consider:

WHERE (a.x = b.y AND a.region = 'East')
   OR (c.z > 100)

This predicate cannot be simply placed because:

The OR spans multiple tables (a, b, c)
Converting to CNF (Conjunctive Normal Form) may help but can explode in size
Some conditions can still be extracted (e.g., if a.region is in a subexpression)

Optimizers use predicate extraction rules to identify portions that can be evaluated early.

predicate-analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Predicate Classification Example
 
SELECT *
FROM orders o, customers c, products p, order_items oi
WHERE 
    -- Join predicates (become edges)
    o.customer_id = c.id                    -- Edge: orders — customers
    AND o.order_id = oi.order_id            -- Edge: orders — order_items
    AND oi.product_id = p.id                -- Edge: order_items — products
    
    -- Selection predicates (node annotations)
    AND c.status = 'active'                 -- Node: customers
    AND o.order_date > CURRENT_DATE - 30    -- Node: orders
    AND p.price > 50                        -- Node: products
    
    -- Post-join predicate (evaluated after join)
    AND o.total > c.credit_limit * 0.8      -- Involves both orders AND customers
                                            -- Must be evaluated after o-c join
 
-- The last predicate references both 'o' and 'c', so it cannot be placed
-- on either node independently. It becomes an annotation on the edge,
-- or is evaluated after the join produces combined rows.

Selectivity Analysis

Query Graphs for Join Ordering

The Join Ordering Problem:

For n tables, there are:

Without restrictions: (2n-2)! / (n-1)! possible orderings (~4^n)
With only left-deep trees: n! orderings
Still exponential, but manageable with dynamic programming

The query graph constrains possibilities: you can only join tables connected by an edge (or their joined results).

Converting Mermaid diagram...

Using the Graph for Ordering:

Given the chain A — B — C — D, possible join orderings include:

Left-to-right: ((A ⋈ B) ⋈ C) ⋈ D
Right-to-left: A ⋈ (B ⋈ (C ⋈ D))
Middle-out: (A ⋈ B) ⋈ (C ⋈ D)
Start from smallest: C ⋈ D, then with B, then with A

The graph structure shows which orderings are valid—you cannot join A directly with D without first involving B and C.

Dynamic Programming Approach:

Optimizers use the graph to enumerate valid subsets:

Start with single-table access costs
For each pair of connected tables, compute join costs
Build up to larger subsets, always respecting connectivity
Track best plan found for each subset of tables
Final answer: best plan for complete table set

Join Order Cost Comparison (Example)
Join Order	Estimated Rows	Estimated Cost	Notes
(((A ⋈ B) ⋈ C) ⋈ D)	500K → 5K → 2K	High	B is large—early join produces large intermediate
(((C ⋈ D) ⋈ B) ⋈ A)	8K → 40K → 35K	Medium	Start small, grow moderately
((A ⋈ B) ⋈ (C ⋈ D))	500K ⋈ 8K	Highest	Two large intermediates for final join
(((C ⋈ B) ⋈ A) ⋈ D)	10K → 80K → 40K	Medium-Low	C as anchor reduces B join size
Start from most selective	Depends on predicates	Often Lowest	Filter early, join filtered results

Cardinality Estimation is Key

Query Graph Extensions

Basic query graphs represent simple join queries. Real databases extend the model to handle more complex SQL features.

Query Graph Extensions

•Hypergraph Representation: For predicates involving 3+ tables, use hyperedges connecting multiple nodes instead of simple edges. Enables representing complex constraints.
•Outer Join Edges: Distinguish between inner joins (symmetric) and outer joins (directional). Left outer: A ←left B means A's rows preserved even without match in B.
•Subquery Nodes: Treat subqueries as derived relation nodes. The subquery's internal structure may have its own query graph, creating nested representations.
•Aggregate Boundaries: Mark nodes where aggregation must occur. Affects when predicates can be pushed and which optimizations apply.
•Correlation Edges: For correlated subqueries, edges from outer query to inner query represent data dependencies. Affects execution strategy (decorrelation opportunities).
•Anti-Join Edges: For NOT EXISTS / NOT IN subqueries, special edge type indicates anti-join semantics. Different physical operators apply.

complex-query-graph.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Query with outer join and subquery
SELECT c.name, 
       o.order_id,
       (SELECT MAX(oi.price) 
        FROM order_items oi 
        WHERE oi.order_id = o.order_id) as max_item_price
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.region = 'West'
  AND (o.order_id IS NULL OR o.total > 100);
 
-- Extended Query Graph Representation:
-- 
-- Nodes:
--   [customers] with selection: region = 'West'
--   [orders] with selection: order_id IS NULL OR total > 100
--   [order_items] - in scalar subquery
--
-- Edges:
--   customers ←LEFT_OUTER→ orders (customer_id = customer_id)
--   orders ⟿ order_items (correlated: oi.order_id = o.order_id)
--                          (scalar aggregate subquery)
--
-- The LEFT_OUTER edge indicates orders may be NULL
-- The correlated subquery edge shows data dependency
-- Aggregate boundary exists at the subquery level

Outer Join Complexity:

Outer joins complicate query graphs because they are not symmetric:

Inner join: A ⋈ B = B ⋈ A (commutative)
Outer join: A LEFT JOIN B ≠ B LEFT JOIN A (not commutative)

This restricts join reordering. You cannot freely reorder outer joins like inner joins. The query graph must track edge direction and type to ensure valid transformations.

Association patterns for outer joins:

(A LEFT JOIN B) LEFT JOIN C ≠ A LEFT JOIN (B LEFT JOIN C) in general
Special nullability conditions determine when reordering is valid
Optimizers maintain "null-rejecting" predicate information to enable some reorderings

Outer Joins Constrain Optimization

From Query Graph to Algebra Tree

The Transformation Process:

graph-to-tree-transformation.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Conceptual algorithm: Query Graph → Relational Algebra Tree
 
function graphToTree(queryGraph: QueryGraph): AlgebraTree {
    // 1. Determine join order using query graph
    //    (e.g., via dynamic programming on graph structure)
    const joinOrder = findOptimalJoinOrder(queryGraph);
    
    // 2. Build tree bottom-up from join order
    let tree = null;
    for (const step of joinOrder) {
        if (tree === null) {
            // First step: access base table with local predicates
            tree = createScan(step.table, step.localPredicates);
        } else {
            // Join current tree with next table
            const nextScan = createScan(step.table, step.localPredicates);
            tree = createJoin(tree, nextScan, step.joinPredicate);
        }
    }
    
    // 3. Add remaining operations (aggregation, sorting, projection)
    if (queryGraph.hasGroupBy) {
        tree = createGroupBy(tree, queryGraph.groupingCols, queryGraph.aggregates);
    }
    if (queryGraph.hasHaving) {
        tree = createSelect(tree, queryGraph.havingPredicate);
    }
    if (queryGraph.hasOrderBy) {
        tree = createSort(tree, queryGraph.orderByColumns);
    }
    tree = createProject(tree, queryGraph.selectList);
    
    return tree;
}
 
// The result is a fully-formed relational algebra tree
// ready for physical planning and execution

Converting Mermaid diagram...

Key Transformation Points:

Graph remains undirected for joins: The graph doesn't dictate order; the optimizer chooses based on cost.
Tree becomes strictly ordered: Once transformed, the tree specifies exact operation sequence.
Predicates placed appropriately: Local predicates become selection nodes (often pushed to scans); join predicates become join node conditions.
Topology constrains but doesn't determine: The graph's structure limits valid orderings, but cost estimates determine the actual choice.
Both representations remain useful: The graph for understanding structure; the tree for executing operations.

Summary and Next Steps

Query graphs provide a structural lens for understanding and optimizing queries. While relational algebra trees capture computation, query graphs reveal relationships.

Key Takeaways

•Query graphs represent tables as nodes and join predicates as edges, showing query structure at a glance.
•Topology matters: Chain, star, snowflake, and cyclic graphs have different optimization characteristics and complexity.
•Predicates classify into local (selection), join (edges), and complex (multi-table) categories, each with placement implications.
•Join ordering is the primary application—the graph constrains valid orderings while cost models select among them.
•Extensions handle outer joins, subqueries, and correlations by adding edge types and nesting.
•Graphs complement trees: Use graphs for structural analysis, trees for execution specification.
•Graph → Tree transformation is the bridge from understanding structure to planning execution.

What's Next:

Page Complete

2 / 5