Loading learning content...
Relational algebra trees provide a mathematical representation of query semantics. Query graphs reveal structural relationships. But optimizers need more than either alone offers—they need a working representation rich enough to:
Logical plans are this comprehensive representation. They combine the operational structure of algebra trees with the metadata annotations needed for optimization. A logical plan is not just a representation—it's the optimizer's workspace where queries are analyzed, transformed, and refined into efficient execution strategies.
By the end of this page, you will understand what logical plans are, how they differ from physical plans, the properties and annotations they carry, how optimizers transform them, and how they serve as the bridge between query understanding and query execution.
Before diving into logical plans, it's essential to understand the distinction between logical and physical plans. This separation is fundamental to query processing architecture.
Logical Plans:
Physical Plans:
| Logical Operation | Possible Physical Implementations |
|---|---|
| Join (⋈) | Nested Loop Join, Hash Join, Sort-Merge Join, Index Nested Loop |
| Selection (σ) | Filter, Index Scan with predicate, Bitmap Index Scan |
| Table Access | Sequential Scan, Index Scan, Index-Only Scan, Bitmap Heap Scan |
| Aggregation (γ) | Stream Aggregate (if sorted), Hash Aggregate, Partial + Final |
| Sorting (τ) | In-memory Quicksort, External Merge Sort, Index Scan (presorted) |
| Distinct (δ) | Sort + Deduplicate, Hash Deduplicate, Streaming (if sorted) |
| Union (∪) | Append + Deduplicate, or Merge (if sorted inputs) |
Why Separate?
This separation provides critical benefits:
Modularity: Logical optimization ("push selection before join") and physical optimization ("choose hash join over nested loop") are independent concerns.
Search space management: Logical transformations are typically correct regardless of data characteristics; physical choices depend on statistics.
Extensibility: New physical operators can be added without changing logical representation.
Portability: The same logical plan can generate different physical plans for different storage backends or execution environments.
Most optimizers work in phases: (1) Parse SQL to parse tree, (2) Convert to initial logical plan, (3) Apply logical transformations, (4) Generate physical plan alternatives, (5) Cost and select best physical plan. Logical plans are the output of step 2 and workspace for step 3.
A logical plan is typically represented as a tree (or DAG in advanced systems) of logical operators, each encapsulating an operation and its parameters.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
// Conceptual structure of a logical plan nodeinterface LogicalOperator { // Identity operatorType: LogicalOpType; // JOIN, SELECT, PROJECT, AGGREGATE, etc. operatorId: string; // Unique identifier in plan // Tree structure children: LogicalOperator[]; // Input operators (0 for leaves, 1+ for others) // Operation parameters (vary by type) // For SELECT: predicate?: Expression; // For PROJECT: projectList?: ColumnExpression[]; // For JOIN: joinType?: 'INNER' | 'LEFT' | 'RIGHT' | 'FULL' | 'SEMI' | 'ANTI'; joinCondition?: Expression; // For AGGREGATE: groupingKeys?: Column[]; aggregateFunctions?: AggregateCall[]; // For SCAN: tableName?: string; tableAlias?: string; // Schema information outputSchema: Schema; // Columns produced // Statistical properties (derived) estimatedRowCount: number; estimatedDistinctValues: Map<Column, number>; nullableCols: Set<Column>; // Logical properties (derived) uniqueKeys: ColumnSet[]; // Column combinations guaranteed unique functionalDependencies: FDSet; // A → B relationships sortOrdering: SortOrder[]; // If known to be sorted // Cost estimate (computed during optimization) estimatedCost?: Cost;} // Example: Logical plan for SELECT name FROM emp WHERE salary > 50000const examplePlan: LogicalOperator = { operatorType: 'PROJECT', operatorId: 'proj-1', children: [{ operatorType: 'SELECT', operatorId: 'sel-1', predicate: { type: 'comparison', left: 'salary', op: '>', right: 50000 }, children: [{ operatorType: 'SCAN', operatorId: 'scan-1', tableName: 'employees', children: [], outputSchema: ['id', 'name', 'salary', 'dept_id'], estimatedRowCount: 100000, // ... }], outputSchema: ['id', 'name', 'salary', 'dept_id'], estimatedRowCount: 15000, // After selection // ... }], projectList: [{ column: 'name' }], outputSchema: ['name'], estimatedRowCount: 15000, // ...};Key Components Explained:
Operator Type: The logical operation being performed. This determines the operator's semantics regardless of physical implementation.
Children: Input operators whose output this operator consumes. Leaves (scans) have no children; binary operators (joins) have two; most others have one.
Operation Parameters: Type-specific details—predicates for selection, column lists for projection, join conditions and types for joins.
Output Schema: The columns this operator produces. Essential for type checking and propagating attributes.
Statistical Properties: Cardinality estimates, NDV (number of distinct values), null frequencies. These drive cost estimation.
Logical Properties: Invariants like unique keys, sort order, functional dependencies. Enable certain optimizations.
Some properties are specified (operation type, parameters). Others are derived by propagating through the plan—cardinality estimates propagate from leaves upward, transforming at each node based on selectivity estimates. This derivation is a core optimizer activity.
Databases define a set of logical operators corresponding to relational algebra operations and SQL constructs. While implementations vary, a core set appears universally.
VALUES (1, 'a'), (2, 'b').generate_series(), custom UDTFs.Operator Relationships:
Logical operators form a well-defined algebra. Understanding their relationships enables transformations:
These identities form the foundation of logical transformation rules.
Each logical operator node carries logical properties—attributes that describe invariants of its output regardless of physical implementation. These properties enable optimizations and are propagated/computed throughout the plan.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// Property propagation examples // Unique key propagation through JOINfunction propagateUniquenessJoin( leftKeys: ColumnSet[], rightKeys: ColumnSet[], joinType: JoinType, joinCondition: Expression): ColumnSet[] { // Inner join: both sides' unique keys are preserved // (assuming equi-join on key columns) if (joinType === 'INNER') { return [...leftKeys, ...rightKeys]; } // Left outer: only left's uniqueness is preserved // (right side may have nulls = duplicates from left's perspective) if (joinType === 'LEFT') { return leftKeys; } // Full outer: neither side's uniqueness guaranteed return [];} // Sort order propagation through FILTERfunction propagateSortFilter( inputSortOrder: SortSpec[], filterPredicate: Expression): SortSpec[] { // Filtering preserves sort order (removes rows but keeps order) return inputSortOrder;} // Sort order propagation through PROJECTfunction propagateSortProject( inputSortOrder: SortSpec[], projectList: Column[]): SortSpec[] { // Only preserve if all sort columns are in output return inputSortOrder.filter(spec => projectList.some(col => col.equals(spec.column)) );}Property Computation Order:
Properties are computed in a bottom-up pass:
Leaf nodes (scans) get properties from catalog metadata: schema from table definition, cardinality from statistics, unique keys from primary/unique constraints.
Each operator computes its properties from its children's properties using operator-specific rules.
Root node has the final query output properties.
Some optimizations require top-down propagation as well—knowing what properties the parent needs can influence child decisions (e.g., requiring a specific sort order).
If you know a sort operator's input is already sorted on the required columns (via sort order property), you can eliminate the sort. If you know a column has a unique key, you can remove redundant DISTINCT. Properties turn potential optimizations into applicable optimizations.
The power of logical plans lies in transformation rules—systematic ways to rewrite plans while preserving semantics. Optimizers apply these rules to explore the space of equivalent plans, seeking lower-cost alternatives.
123456789101112131415161718192021222324252627282930313233
-- Original Plan (canonical translation)-- -- PROJECT (c.name, o.total)-- |-- FILTER (c.region = 'West' AND o.status = 'completed')-- |-- JOIN (c.id = o.customer_id)-- / \-- SCAN(c) SCAN(o) -- After Selection Pushdown---- PROJECT (c.name, o.total)-- |-- JOIN (c.id = o.customer_id)-- / \-- FILTER FILTER-- (region (status=-- ='West') 'completed')-- | |-- SCAN(c) SCAN(o) -- Why is this better?-- Before: Join full tables, then filter result-- - customers: 1M rows, orders: 5M rows-- - Join produces 5M rows (assume 5 orders/customer)-- - Filter reduces to 200K rows -- After: Filter first, then join filtered results-- - Filter customers: 1M → 200K ('West' region)-- - Filter orders: 5M → 1M ('completed' status)-- - Join filtered: 200K * 1M → ~200K (filtered customers have fewer orders)-- Massive reduction in intermediate data!Not all transformations are always valid. Selection pushdown through outer joins requires null-rejecting predicates. Join reordering doesn't apply to outer joins freely. Aggregate pushdown has complex validity requirements. Optimizers must verify conditions before applying rules.
Modern optimizers don't generate plans one at a time. Instead, they use sophisticated data structures to represent all equivalent plans simultaneously. The key abstraction is the equivalence class or group.
Core Concepts:
(A ⋈ B) ⋈ C and A ⋈ (B ⋈ C) belong to the same group if they produce identical output.123456789101112131415161718192021222324252627282930313233343536373839404142
// Simplified memo structure concept interface Group { groupId: number; logicalProps: LogicalProperties; // Shared by all expressions expressions: GroupExpression[]; // All equivalent ways to compute bestPlan?: Winner; // Best physical plan found} interface GroupExpression { operator: LogicalOperator; childGroups: Group[]; // Children are groups, not expressions!} // Example: For query joining A, B, C// The memo might contain: // Group 0: Base table A// - Expression: TableScan('A') // Group 1: Base table B// - Expression: TableScan('B') // Group 2: Base table C// - Expression: TableScan('C') // Group 3: Join result of A and B// - Expression 1: Join(Group0, Group1, A.id = B.a_id)// - Expression 2: Join(Group1, Group0, A.id = B.a_id) // Commuted // Group 4: Final result: join of Group3 and C// - Expression 1: Join(Group3, Group2, ...) // (A⋈B)⋈C// - Expression 2: ... // Meanwhile, alternative:// Group 5: Join result of B and C // - Expression: Join(Group1, Group2, B.id = C.b_id) // Group 4 can also include:// - Expression 3: Join(Group0, Group5, ...) // A⋈(B⋈C) // The memo compactly represents ALL valid join orderings!Why Memo Structures Matter:
Compact representation: Exponentially many plans in polynomial space. Subplans are shared.
Avoid redundant work: Once a group is explored, its expressions are available to all parents.
Top-down search: Can prune entire groups if they can't beat current best cost.
Memoization: Store intermediate results (best costs for groups)—avoid recomputation.
This approach, pioneered in the Cascades optimizer framework, is used by PostgreSQL, SQL Server, CockroachDB, and many others.
The Cascades optimization framework (evolved from Volcano/EXODUS) introduced memo-based optimization with rules transforming groups. Most modern commercial and open-source databases use variants of this approach. Understanding memos helps you understand optimizer behavior and debug plan issues.
The culmination of logical planning is physical plan generation—selecting concrete algorithms for each logical operator. This transition considers:
| Logical Op | Physical Choice | Key Decision Factors |
|---|---|---|
| JOIN | Nested Loop | Small inner side, index available, low cardinality |
| JOIN | Hash Join | No useful sort, one side fits in memory, equality condition |
| JOIN | Merge Join | Both inputs sorted (or index), many rows, equality condition |
| AGGREGATE | Stream Agg | Input already sorted on grouping columns |
| AGGREGATE | Hash Agg | Unsorted input, enough memory for hash table |
| SORT | In-memory | Small data, fits in sort buffer |
| SORT | External | Large data, must spill to disk |
| SCAN | Seq Scan | No useful index, large fraction of table needed |
| SCAN | Index Scan | Selective predicate matching index, good clustering |
123456789101112131415161718192021222324252627
-- Logical Plan:-- -- PROJECT (name, total)-- |-- AGGREGATE (GROUP BY cust_id; SUM(amount))-- |-- JOIN (orders.cust_id = customers.id)-- / \-- SCAN SCAN-- (orders) (customers) -- Possible Physical Plans: -- Option A: Hash-based-- SequentialScan(orders) -> HashAgg(cust_id, SUM) -- -> HashJoin with SeqScan(customers) -> Project -- Option B: Sort-based (if index on orders.cust_id)-- IndexScan(orders, cust_id_idx) -> StreamAgg(cust_id, SUM)-- -> MergeJoin with IndexScan(customers, pk_idx) -> Project -- Option C: Nested Loop (if customers is tiny)-- SeqScan(customers) -> NestedLoop with -- [for each customer: IndexScan(orders) -> filtered agg] -> Project -- The optimizer costs each option and picks lowest total cost.-- Option depends on table sizes, indexes, memory, statistics.Property Enforcement:
Physical planning may need to enforce properties required by parent operators:
The optimizer considers both:
Cost includes both the operator and any enforcement needed.
Some optimizers track 'interesting orders'—sort orders potentially useful later in the plan. Generating a sorted intermediate result may cost more now but avoids a sort later. This interplay of properties across plan levels requires global optimization, not just local greedy choices.
Logical plans are the heart of query optimization—the representation where queries are understood, transformed, and prepared for execution. They bridge the gap between declarative SQL and imperative physical execution.
What's Next:
With logical plans understood, we dive deeper into operator nodes—the individual building blocks of plans. We'll examine each operator's semantics, properties, and behavior in detail, building comprehensive understanding of how queries decompose into executable operations.
You now understand logical plans—their structure, properties, transformations, and role in optimization. This knowledge is essential for understanding EXPLAIN output, diagnosing query performance, and appreciating how optimizers find efficient execution strategies. Next: Deep dive into Operator Nodes.