Heuristic Optimization - Learning Module

Loading content...

0/241

Limitations

When Rules Aren't Enough

Heuristic optimization is powerful—selection pushdown, early projection, and join ordering rules dramatically improve most queries. But these rules encode general wisdom, not specific knowledge. They work well when their assumptions hold, but fail—sometimes catastrophically—when reality deviates from expectations.

Understanding where heuristics break down is essential for database practitioners. This knowledge helps you recognize when to trust automatic optimization, when to provide hints, and when to restructure queries or schemas to help the optimizer succeed.

What You Will Learn

By the end of this page, you will understand the fundamental limitations of rule-based optimization, specific scenarios where common heuristics fail, how cost-based optimization addresses these limitations, and practical strategies for working around heuristic limitations.

The Fundamental Limitation: No Data Awareness

The core limitation of heuristic optimization is simple: rules don't know your data. Heuristics encode patterns that are usually beneficial, but "usually" is not "always."

The Selectivity Blindness Problem

Consider the heuristic "push selections down." It assumes filtering reduces data. But what if the filter matches everything?

selectivity_blindness.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Table: transactions (100 million rows)
-- Column: is_valid (99.9% are 'Y', 0.1% are 'N')
 
-- Query 1: Select invalid transactions
SELECT * FROM transactions WHERE is_valid = 'N';
-- Selectivity: 0.1% → 100,000 rows
-- Index on is_valid: EXCELLENT choice
-- Heuristic: Push down, use index ✓
 
-- Query 2: Select valid transactions  
SELECT * FROM transactions WHERE is_valid = 'Y';
-- Selectivity: 99.9% → 99,900,000 rows
-- Index on is_valid: TERRIBLE choice (worse than full scan!)
-- Heuristic: Push down, use index ✗
 
-- The heuristic "push selection and use index" is applied equally,
-- but one query benefits and one suffers.
-- Without knowing selectivity, the optimizer can't choose correctly.

The Join Ordering Dilemma

Join ordering heuristics like "start with the smallest table" fail when filtered cardinality differs dramatically from table size:

join_order_heuristic_success.sql
1
2
3
4
5
6
7
8
9
10
11
-- Scenario A: Heuristic succeeds
-- customers: 100,000 rows
-- orders: 10,000,000 rows
-- Filter: none
 
-- Heuristic: Start with customers (smaller)
-- Build hash on 100K customers
-- Probe with 10M orders
-- Result: 100K hash table entries
 
-- This is optimal!

join_order_heuristic_failure.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Scenario B: Heuristic fails
-- customers: 100,000 rows  
-- orders: 10,000,000 rows
-- Filter: orders.date = TODAY (100 rows!)
 
-- Heuristic: Still starts with customers
-- Build hash on 100K customers
-- Probe with 100 filtered orders
-- Result: 100K hash entries for 100 probes!
 
-- Better: Start with filtered orders
-- Build hash on 100 orders
-- Probe with customers (semi-join optimization)
-- Result: 100 hash entries, 100K probes

The heuristic "start with the smallest table" doesn't account for filters that dramatically change effective cardinality. A table might have 10 million rows but filter down to 100—making it effectively smaller than a "small" table with 100,000 rows.

The Unpredictability Tax

When heuristics fail, they don't fail gracefully. A wrong join order can be 1000× slower than the right one. A poorly chosen index scan can transform a millisecond query into a minute-long operation. The cost of heuristic failure is often catastrophic, not incremental.

Data Distribution Assumptions

Heuristics implicitly assume certain data distributions. When real data violates these assumptions, optimization decisions become suboptimal.

The Uniformity Assumption

Many heuristics assume uniform data distribution—that any value is equally likely:

uniformity_assumption_failure.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Table: sales (10 million rows)
-- Column: product_id (1000 distinct products)
-- Uniform assumption: Each product has ~10,000 sales
 
-- Reality: Pareto distribution (80-20 rule)
-- Top 10 products: 5 million sales (500K each)
-- Bottom 900 products: 1 million sales (~1,100 each)
 
-- Query for popular product:
SELECT * FROM sales WHERE product_id = 1;
-- Expected (uniform): 10,000 rows
-- Actual: 500,000 rows (50× more!)
-- Wrong plan chosen based on wrong estimate
 
-- Query for unpopular product:
SELECT * FROM sales WHERE product_id = 999;
-- Expected (uniform): 10,000 rows  
-- Actual: 1,100 rows (9× fewer!)
-- Index would be great, but optimizer might not realize
 
-- Consequence: Same heuristic applied to both queries,
-- but optimal plans are completely different!

Correlation Blindness

Heuristics treat predicates independently, but real data often has correlated columns:

correlation_blindness.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Table: employees
-- department: 10 distinct values (10% each under uniformity)
-- job_title: 100 distinct values (1% each under uniformity)
 
-- Independent assumption:
-- P(department='Engineering') = 10%
-- P(job_title='Software Engineer') = 1%
-- P(both) = 10% × 1% = 0.1%
 
-- Reality: Columns are correlated!
-- All 'Software Engineer' titles are in 'Engineering'
-- P(department='Engineering' AND job_title='Software Engineer') = 1%
-- That's 10× higher than the independent estimate!
 
SELECT * FROM employees 
WHERE department = 'Engineering' 
  AND job_title = 'Software Engineer';
 
-- Optimizer estimates: 0.1% of table (100 rows from 100K)
-- Actual result: 1% of table (1000 rows)
-- 10× underestimate leads to wrong plan selection

NULL Value Handling

Heuristics often ignore or mishandle NULL values, which have special semantics in SQL:

NULL-Related Heuristic Failures
Scenario	Heuristic Assumption	Reality	Impact
NOT IN with NULLs	NOT IN behaves like anti-join	Any NULL makes NOT IN return FALSE	Wrong results or missed optimization
Index on nullable column	Index useful for equality	IS NULL may not use index efficiently	Full scan when index expected
Aggregate on nullable column	COUNT(col) = COUNT(*)	COUNT(col) excludes NULLs	Wrong cardinality estimates
Outer join predicates	Predicate pushdown is safe	Pushdown may eliminate NULL-extended rows	Wrong results

Why Cost-Based Helps

Cost-based optimizers address distribution issues by maintaining histograms (recording actual value frequencies), correlation statistics (for column pairs), and NULL frequency metadata. This data-awareness enables informed decisions where heuristics must guess.

Physical Resource Blindness

Heuristics operate on logical query structure, ignoring physical resources like memory, I/O bandwidth, and CPU capacity. This blindness leads to suboptimal decisions in resource-constrained scenarios.

Memory-Insensitive Join Choice

The heuristic "use hash join for equi-joins" ignores available memory:

memory_insensitive_join.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Hash join assumption: Build table fits in memory
-- Available memory: 1GB (work_mem or equivalent)
 
-- Scenario A: Small build side
SELECT * FROM small_table s  -- 100MB build side
JOIN large_table l ON s.id = l.small_id;
-- Hash join: Build 100MB hash table, probe in memory
-- Excellent performance!
 
-- Scenario B: Large build side  
SELECT * FROM medium_table m  -- 10GB build side
JOIN huge_table h ON m.id = h.medium_id;
-- Hash join: Build 10GB hash table...
-- But only 1GB memory available!
-- Spills to disk repeatedly
-- Performance: 10-100× worse than expected
 
-- Better for Scenario B: Sort-merge join
-- If data is already sorted or indexes exist
-- Avoids memory spills, streams through data
 
-- Heuristic blindly chose hash join for both,
-- but sort-merge would be faster for Scenario B

I/O Pattern Ignorance

Heuristics don't consider whether data is in cache, on SSD, or on spinning disk:

I/O Considerations Ignored by Heuristics

•Random vs. Sequential Access — Index lookups favor sequential table layouts; heuristics may choose index when full scan would be faster due to sequential I/O.
•Cache Residency — Frequently accessed tables may be cached in memory; heuristics can't account for cache hit probability.
•Storage Tiering — Hot data on SSD vs. cold data on HDD changes access costs dramatically; heuristics use uniform I/O cost.
•Network I/O — In distributed systems, data locality matters enormously; heuristics may not consider cross-node transfer costs.
•Disk Queue Depth — Parallel I/O capabilities vary; heuristics typically model single-threaded I/O.

Parallelism Decisions

Heuristics rarely incorporate parallelism considerations:

parallelism_blindness.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Consider two equivalent plans for aggregation:
 
-- Plan A: Sort-based aggregation
-- Requires full sort before aggregate
-- Sort cannot parallelize past reduce step
-- Must merge sorted chunks sequentially
 
-- Plan B: Hash-based aggregation  
-- Hash partitions independently
-- Each partition aggregates in parallel
-- Combines at end
 
-- On single-core: Plan A might be faster (lower overhead)
-- On 64-core: Plan B is likely 10× faster (parallelizable)
 
-- Heuristic might prefer sort-based (simpler algorithm)
-- without considering available parallelism
 
-- Similarly for joins:
-- Nested loop: Limited parallelism
-- Hash join: Highly parallelizable
-- Sort-merge: Moderate parallelism
 
-- The "best" algorithm depends on available cores,
-- something heuristics don't consider

Resource-Aware Optimization

Modern cost-based optimizers include resource models: memory budgets for sorts/hashes, I/O cost parameters for different storage types, and parallelism factors. This enables intelligent algorithm selection that heuristics cannot provide.

Complex Query Patterns That Defeat Heuristics

Certain query patterns are intrinsically difficult for heuristic optimization. These patterns require sophisticated analysis that simple rules cannot provide.

Multi-Way Join Ordering

With more than a few tables, join ordering becomes combinatorially complex:

multiway_join_complexity.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Query joining 8 tables
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
JOIN categories cat ON p.category_id = cat.id
JOIN suppliers s ON p.supplier_id = s.id  
JOIN regions r ON c.region_id = r.id
JOIN warehouses w ON o.warehouse_id = w.id
JOIN shipping sh ON o.shipping_method = sh.id
WHERE cat.name = 'Electronics'
  AND r.name = 'North America';
 
-- Possible join orderings: (2n-2)! / (n-1)!
-- For n=8: 135,135 possible orderings!
 
-- Heuristic approach: Apply rules greedily
-- "Start with smallest filtered table" → categories (filter applied)
-- Then join products, then... which next?
 
-- Problem: Greedy choices early on may lock in bad decisions
-- Optimal might require starting with customers (for region filter)
-- then joining orders, then products, then categories
 
-- Heuristics can't explore the full search space
-- May miss orderings that are 100× faster

Correlated Subqueries

Heuristics may fail to decorrelate subqueries optimally:

correlated_subquery_failure.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Complex correlated subquery
SELECT o.id, o.total,
    (SELECT AVG(o2.total) 
     FROM orders o2 
     WHERE o2.customer_id = o.customer_id
       AND o2.order_date < o.order_date) as prior_avg
FROM orders o
WHERE o.total > 1000;
 
-- Heuristic: Execute subquery for each outer row
-- For 1M orders: 1M subquery executions!
 
-- Optimal: Decorrelate to window function
SELECT id, total,
    AVG(total) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ) as prior_avg
FROM orders
WHERE total > 1000;
 
-- But automatic decorrelation is complex!
-- Requires recognizing the pattern
-- Verifying semantic equivalence
-- Complex subqueries may not decorrelate at all

Recursive Queries and CTEs

Recursive patterns pose particular challenges for heuristic optimization:

recursive_cte_challenges.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Recursive CTE for tree traversal
WITH RECURSIVE descendants AS (
    -- Base case
    SELECT id, name, parent_id, 0 as depth
    FROM categories
    WHERE id = 1
    
    UNION ALL
    
    -- Recursive case
    SELECT c.id, c.name, c.parent_id, d.depth + 1
    FROM categories c
    JOIN descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants WHERE depth <= 3;
 
-- Challenges for heuristics:
-- 1. Unknown iteration count (depends on tree depth)
-- 2. Unknown intermediate result size (depends on branching)
-- 3. Termination condition affects resource allocation
-- 4. Should results be materialized? Depends on reuse
 
-- Broader issue: Recursive semantics are fundamentally
-- different from single-pass relational operations
-- Heuristics designed for single-pass don't apply
 
-- Some optimizations possible:
-- - Push depth filter into recursion (WHERE depth <= 3)
-- - Recognize linear vs. exponential growth patterns
-- But requires specialized analysis

The Optimizer's Horizon

Optimizers have a 'horizon'—the complexity limit beyond which they give up on exhaustive analysis and fall back to heuristics. Complex queries with many joins, subqueries, or recursive elements often exceed this horizon, receiving suboptimal plans. Breaking queries into simpler parts can help.

Edge Cases and Semantic Subtleties

Some heuristic transformations that appear safe are actually unsafe in edge cases. These semantic subtleties cause rare but serious bugs.

NOT IN with NULL Values

The classic pitfall of SQL semantics:

not_in_null_pitfall.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Heuristic: Convert NOT IN to anti-join
-- Looks equivalent, but...
 
-- Original query
SELECT * FROM orders 
WHERE customer_id NOT IN (
    SELECT id FROM blocked_customers
);
 
-- Heuristic transformation to anti-join:
SELECT o.* FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM blocked_customers b WHERE b.id = o.customer_id
);
 
-- PROBLEM: These are NOT equivalent if blocked_customers.id can be NULL!
 
-- NOT IN semantics:
-- If blocked_customers contains [1, 2, NULL]:
-- customer_id = 3: NOT IN returns UNKNOWN (due to NULL comparison)
-- UNKNOWN in WHERE = FALSE → row excluded!
-- Result: NO rows returned (or very few)
 
-- Anti-join semantics:
-- Checks if customer_id matches any blocked id
-- NULL in blocked list doesn't match anything
-- customer_id = 3: No match found → row included
-- Result: Normal filtering behavior
 
-- The heuristic transformation changes semantics!
-- Correct conversion must handle NULLs explicitly:
SELECT * FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM blocked_customers b 
    WHERE b.id = o.customer_id
)
AND o.customer_id IS NOT NULL;  -- Explicit NULL handling

Aggregate Pushdown Through Outer Joins

Pushing aggregates below outer joins requires careful semantic analysis:

aggregate_outer_join_pushdown.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Potentially unsafe aggregate pushdown
-- Original: Aggregate after left join
SELECT c.id, SUM(o.total)
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id;
 
-- Customers with no orders: SUM(o.total) = NULL
 
-- Heuristic: Push aggregate into subquery
SELECT c.id, order_totals.total_sum
FROM customers c
LEFT JOIN (
    SELECT customer_id, SUM(total) as total_sum
    FROM orders
    GROUP BY customer_id
) order_totals ON c.id = order_totals.customer_id;
 
-- Customers with no orders: total_sum = NULL (from LEFT JOIN)
-- Same semantics! Transformation is safe here.
 
-- BUT consider COUNT:
SELECT c.id, COUNT(o.id)
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id;
 
-- Customers with no orders: COUNT(o.id) = 0 (counts NULLs as 0)
 
-- Naive pushdown:
SELECT c.id, order_counts.cnt
FROM customers c
LEFT JOIN (
    SELECT customer_id, COUNT(id) as cnt
    FROM orders
    GROUP BY customer_id
) order_counts ON c.id = order_counts.customer_id;
 
-- Customers with no orders: cnt = NULL (no row in subquery!)
-- WRONG semantics! Should be 0, not NULL!
 
-- Correct transformation requires COALESCE:
SELECT c.id, COALESCE(order_counts.cnt, 0)
FROM customers c
LEFT JOIN (...) order_counts ON ...

DISTINCT and Projection Ordering

The order of projection and DISTINCT can affect results:

distinct_projection_order.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Table: events (user_id, event_type, event_data)
-- Multiple events per user, event_data may vary
 
-- Query: Distinct users with their (any) event data
SELECT DISTINCT user_id, event_data
FROM events
WHERE event_type = 'login';
 
-- Returns: Multiple rows per user (different event_data values)
 
-- If optimizer did late projection:
-- 1. Filter to 'login' events
-- 2. Apply DISTINCT on (user_id, event_data) — all columns
-- 3. Project to (user_id, event_data)
-- Result: Multiple rows per user
 
-- If optimizer incorrectly applied early projection + DISTINCT:
-- 1. Filter to 'login' events  
-- 2. Project to (user_id, event_data)... then somehow DISTINCT?
-- Actually, this particular reordering isn't done,
-- but the principle matters for GROUP BY
 
-- Related issue with GROUP BY:
SELECT user_id, event_data  -- Which event_data for each user?
FROM events
GROUP BY user_id;
 
-- This query is actually invalid in standard SQL!
-- event_data is non-deterministic for each group
-- Some databases allow it (returns arbitrary row)
-- Optimizer can't safely optimize non-deterministic selections

Testing Edge Cases

Production optimizers undergo extensive testing with edge cases: empty tables, NULL-heavy data, single-row tables, all-identical values, extreme selectivities, and pathological distributions. Despite this, bugs in obscure edge cases are discovered regularly, especially after new heuristics are added.

Working Around Heuristic Limitations

When heuristics fail, practitioners have several strategies to guide the optimizer toward better plans.

Updating Statistics

The first defense against poor plans is accurate statistics:

statistics_update.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- PostgreSQL: Update statistics
ANALYZE table_name;
 
-- MySQL: Update statistics
ANALYZE TABLE table_name;
 
-- SQL Server: Update statistics
UPDATE STATISTICS table_name;
 
-- Oracle: Gather statistics
EXEC DBMS_STATS.GATHER_TABLE_STATS('schema', 'table_name');
 
-- Extended statistics for correlated columns (PostgreSQL):
CREATE STATISTICS stat_name (dependencies) 
ON department, job_title FROM employees;
ANALYZE employees;
 
-- Histogram creation for skewed columns:
-- Most databases create automatically during ANALYZE
-- Some allow explicit histogram specification
 
-- Force fresh statistics on critical tables
ALTER TABLE orders ALTER COLUMN product_id SET STATISTICS 1000;
ANALYZE orders;

Query Hints and Plan Forcing

When statistics aren't enough, hints can override optimizer decisions:

query_hints_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- PostgreSQL: Disable specific operations
SET enable_seqscan = off;  -- Force index usage
SET enable_hashjoin = off;  -- Force merge or nested loop
 
-- MySQL: Optimizer hints
SELECT /*+ NO_HASH_JOIN(o) */ *
FROM orders o
JOIN customers c ON o.customer_id = c.id;
 
SELECT /*+ INDEX(orders idx_orders_date) */ *
FROM orders WHERE order_date = '2024-01-15';
 
-- SQL Server: Query hints
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
OPTION (MERGE JOIN);  -- Force merge join
 
SELECT * FROM orders
WITH (INDEX(idx_orders_date))  -- Force index
WHERE order_date = '2024-01-15';
 
-- Oracle: Hints
SELECT /*+ LEADING(c o) USE_HASH(o) */ *
FROM customers c
JOIN orders o ON c.id = o.customer_id;
 
-- PostgreSQL: pg_hint_plan extension
/*+ SeqScan(o) HashJoin(o c) Leading(c o) */
SELECT * FROM orders o JOIN customers c ON ...

Query Restructuring

Sometimes the best solution is rewriting the query to be more optimizer-friendly:

query_restructuring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Problem: Optimizer doesn't push predicate into view
-- Original:
SELECT * FROM complex_view WHERE category = 'Electronics';
 
-- Fix: Inline view and add predicate directly
SELECT ... 
FROM base_table1 t1
JOIN base_table2 t2 ON ...
WHERE t2.category = 'Electronics';  -- Optimizer can now push
 
-- Problem: Subquery correlation prevents optimization
-- Original:
SELECT *, (SELECT SUM(amount) FROM payments p WHERE p.order_id = o.id)
FROM orders o;
 
-- Fix: Use LEFT JOIN for better optimization
SELECT o.*, p.total_payments
FROM orders o
LEFT JOIN (
    SELECT order_id, SUM(amount) as total_payments
    FROM payments GROUP BY order_id
) p ON p.order_id = o.id;
 
-- Problem: OR prevents predicate pushdown
-- Original:
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'West' OR o.priority = 'high';
 
-- Fix: Use UNION to enable pushdown
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'West'
UNION
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id  
WHERE o.priority = 'high';

Restructuring Patterns

•Split OR to UNION — Enables independent optimization of each branch
•Flatten correlated subqueries — Convert to joins with explicit GROUP BY
•Materialize CTEs explicitly — Use temp tables when CTE reuse is beneficial
•Add redundant predicates — Repeat filters that optimizer might not infer
•Denormalize for common patterns — Add columns to avoid frequent joins

Hints Are Technical Debt

Query hints lock in a plan that may become suboptimal as data grows or distribution changes. Use hints sparingly, document why each hint exists, and periodically review whether hints are still necessary. Prefer fixing underlying issues (statistics, schema design) over permanent hints.

Summary: Limitations of Heuristic Optimization

Heuristic optimization provides powerful, reliable improvements for most queries. But understanding its limitations is essential for diagnosing performance problems and knowing when to intervene.

Key Takeaways

•Heuristics lack data awareness — They apply uniform rules without knowing selectivity, distribution, or correlation, leading to wrong decisions when data violates assumptions.
•Physical resource blindness causes problems — Memory availability, I/O characteristics, and parallelism affect optimal algorithm choice, but heuristics ignore these factors.
•Complex patterns exceed heuristic capabilities — Multi-way joins, correlated subqueries, and recursive CTEs require sophisticated analysis that simple rules cannot provide.
•Edge cases break seemingly safe transformations — NULL handling, outer join semantics, and DISTINCT behavior create subtle correctness issues in aggressive optimization.
•Cost-based optimization addresses these gaps — By maintaining statistics, modeling resources, and enumerating alternatives, cost-based methods make informed decisions.
•Practitioners have workarounds — Statistics updates, hints, and query restructuring can guide optimizers past heuristic limitations when necessary.

Module Complete:

You've now completed the Heuristic Optimization module. You understand the rule-based optimization paradigm, common heuristic transformations, the deep mechanics of selection and projection pushdown, and the limitations that necessitate cost-based approaches. This foundation prepares you for studying cost-based optimization in the next module.

Module Complete

You now have comprehensive knowledge of heuristic query optimization—its power, its techniques, and its limitations. This understanding is crucial for both working with database systems effectively and appreciating the design of modern hybrid optimizers.