Loading content...
Hierarchical data is everywhere. Organization charts show employees reporting to managers who report to executives. File systems nest folders within folders. Product categories contain subcategories. Bill of materials list components made from sub-components. Social networks link friends to friends of friends.
Standard SQL struggles with these structures. Without recursion, you'd need separate queries for each level—and you often don't know how many levels exist. Recursive CTEs solve this elegantly by allowing a CTE to reference itself, iteratively building results until a termination condition is met.
By the end of this page, you will understand recursive CTE syntax and execution mechanics, master the anchor-recursive pattern, navigate hierarchical data like org charts and bill of materials, prevent infinite loops, and apply recursion to real-world problems including graph traversal and series generation.
To understand why recursive CTEs are necessary, consider a classic hierarchical data problem: an organization chart.
The Data Structure:
123456789101112131415161718192021222324252627282930313233343536
-- Self-referential table: each employee has a manager (another employee)CREATE TABLE employees ( employee_id INT PRIMARY KEY, name VARCHAR(100), title VARCHAR(100), manager_id INT REFERENCES employees(employee_id), hire_date DATE); -- Sample data representing a corporate hierarchyINSERT INTO employees VALUES (1, 'Alice Chen', 'CEO', NULL, '2015-01-15'), (2, 'Bob Smith', 'VP Engineering', 1, '2016-03-01'), (3, 'Carol Davis', 'VP Sales', 1, '2016-04-15'), (4, 'Dan Lee', 'Engineering Manager', 2, '2017-06-01'), (5, 'Eve Wilson', 'Sales Manager', 3, '2017-08-15'), (6, 'Frank Brown', 'Senior Engineer', 4, '2018-02-01'), (7, 'Grace Kim', 'Engineer', 4, '2019-03-15'), (8, 'Henry Patel', 'Sales Rep', 5, '2019-07-01'), (9, 'Iris Wong', 'Junior Engineer', 6, '2020-01-15'), (10, 'Jack Miller', 'Intern', 7, '2023-06-01'); /* Hierarchy (CEO at top): Alice (CEO) ├── Bob (VP Eng) │ └── Dan (Eng Mgr) │ ├── Frank (Sr Eng) │ │ └── Iris (Jr Eng) │ └── Grace (Eng) │ └── Jack (Intern) └── Carol (VP Sales) └── Eve (Sales Mgr) └── Henry (Sales Rep)*/The Problem with Non-Recursive Approaches:
Suppose you want to find all employees under Bob (employee_id = 2), no matter how many levels deep. Without recursion:
123456789101112131415161718192021222324252627282930313233343536
-- Level 1: Direct reports to BobSELECT * FROM employees WHERE manager_id = 2;-- Returns: Dan -- Level 2: Reports to DanSELECT * FROM employees WHERE manager_id = 4;-- Returns: Frank, Grace -- Level 3: Reports to Frank or GraceSELECT * FROM employees WHERE manager_id IN (6, 7);-- Returns: Iris, Jack -- But what if there's a Level 4, 5, 6...?-- You don't know when to stop, and you need separate queries for each level! -- Attempted solution: hardcode multiple joins (ugly and limited)SELECT e1.name as level1FROM employees e1WHERE e1.manager_id = 2 UNION ALL SELECT e2.name as level2FROM employees e2JOIN employees e1 ON e2.manager_id = e1.employee_idWHERE e1.manager_id = 2 UNION ALL SELECT e3.name as level3FROM employees e3JOIN employees e2 ON e3.manager_id = e2.employee_idJOIN employees e1 ON e2.manager_id = e1.employee_idWHERE e1.manager_id = 2; -- This is unmaintainable, brittle, and fails when depth exceeds hardcoded levelsHierarchies have unknown depth. A company might have 3 levels today and 12 levels tomorrow. Hardcoding joins for each level is fragile. Recursive CTEs solve this by continuing automatically until no more rows are found.
A recursive CTE has a specific structure with two parts connected by UNION ALL:
1234567891011121314151617181920
-- Recursive CTE Syntax StructureWITH RECURSIVE cte_name AS ( -- PART 1: ANCHOR MEMBER -- The starting point - a non-recursive query -- This defines the initial row(s) before recursion begins SELECT initial_columns FROM base_table WHERE starting_condition UNION ALL -- or UNION (deduplicates, slower) -- PART 2: RECURSIVE MEMBER -- References the CTE itself to build additional rows -- Executes repeatedly until it returns no rows SELECT derived_columns FROM some_table INNER JOIN cte_name ON join_condition -- Self-reference! WHERE continuation_condition)SELECT * FROM cte_name;| Component | Purpose | Requirements |
|---|---|---|
| RECURSIVE keyword | Enables self-reference | Required in most databases (optional in SQL Server) |
| Anchor member | Defines starting row(s) | Must NOT reference the CTE itself |
| UNION ALL/UNION | Combines anchor and recursive results | UNION ALL is typical; UNION deduplicates |
| Recursive member | Generates additional row(s) each iteration | MUST reference the CTE exactly once |
| Termination | When recursive member returns 0 rows | Implicit; prevent infinite loops! |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- EXAMPLE: Find all employees under Bob (employee_id = 2)WITH RECURSIVE subordinates AS ( -- ANCHOR: Start with Bob himself SELECT employee_id, name, title, manager_id, 0 AS depth, -- Track how deep in hierarchy name AS path -- Build a path string FROM employees WHERE employee_id = 2 UNION ALL -- RECURSIVE: Find employees whose manager is someone already in result SELECT e.employee_id, e.name, e.title, e.manager_id, s.depth + 1, s.path || ' → ' || e.name FROM employees e INNER JOIN subordinates s ON e.manager_id = s.employee_id)SELECT employee_id, REPEAT(' ', depth) || name as indented_name, -- Visual indentation title, depth, pathFROM subordinatesORDER BY path; /*RESULT:employee_id | indented_name | title | depth | path---------------------------------------------------------------------------2 | Bob Smith | VP Engineering | 0 | Bob Smith4 | Dan Lee | Engineering Mgr | 1 | Bob Smith → Dan Lee6 | Frank Brown | Senior Engineer | 2 | Bob Smith → Dan Lee → Frank Brown9 | Iris Wong | Junior Engineer | 3 | Bob Smith → Dan Lee → Frank Brown → Iris Wong7 | Grace Kim | Engineer | 2 | Bob Smith → Dan Lee → Grace Kim10 | Jack Miller | Intern | 3 | Bob Smith → Dan Lee → Grace Kim → Jack Miller*/Understanding how recursive CTEs execute is crucial for writing correct queries and predicting performance. The execution follows an iterative pattern:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
-- Tracing execution for: subordinates of employee_id = 2 WITH RECURSIVE subordinates AS ( SELECT employee_id, name, manager_id, 0 as depth FROM employees WHERE employee_id = 2 UNION ALL SELECT e.employee_id, e.name, e.manager_id, s.depth + 1 FROM employees e INNER JOIN subordinates s ON e.manager_id = s.employee_id)SELECT * FROM subordinates; /*EXECUTION TRACE: ═══ ITERATION 0 (Anchor) ═══Working Table after anchor:| employee_id | name | manager_id | depth ||-------------|------------|------------|-------|| 2 | Bob Smith | 1 | 0 | ═══ ITERATION 1 ═══Recursive member joins employees to working table:→ Finds: Dan (manager_id = 2)New working table rows:| 4 | Dan Lee | 2 | 1 | ═══ ITERATION 2 ═══Recursive member joins employees to working table (now containing Dan):→ Finds: Frank, Grace (manager_id = 4)New working table rows:| 6 | Frank Brown| 4 | 2 || 7 | Grace Kim | 4 | 2 | ═══ ITERATION 3 ═══Recursive member joins employees to working table (now Frank, Grace):→ Finds: Iris (manager_id = 6), Jack (manager_id = 7)New working table rows:| 9 | Iris Wong | 6 | 3 || 10 | Jack Miller| 7 | 3 | ═══ ITERATION 4 ═══Recursive member joins employees to working table (now Iris, Jack):→ Finds: nobody has manager_id = 9 or 10New working table rows: (empty) ═══ TERMINATION ═══Recursive member returned 0 rows → recursion stops ═══ FINAL RESULT ═══UNION of all iterations (6 rows total)*/Each iteration, only the NEW rows drive the next iteration (working table). But the FINAL result includes ALL rows from all iterations. This is why the recursive member sees only the latest level, not all accumulated rows—preventing redundant re-processing.
Hierarchical data traversal is the primary use case for recursive CTEs. Let's explore the essential patterns.
Find all descendants (direct and indirect children) of a given node:
123456789101112131415161718192021222324252627282930
-- Find all subcategories under 'Electronics' (category_id = 5)WITH RECURSIVE category_tree AS ( -- Anchor: Start with Electronics SELECT category_id, category_name, parent_category_id, 1 AS level, ARRAY[category_id] AS path_ids FROM categories WHERE category_id = 5 UNION ALL -- Recursive: Find children of current level SELECT c.category_id, c.category_name, c.parent_category_id, ct.level + 1, ct.path_ids || c.category_id FROM categories c INNER JOIN category_tree ct ON c.parent_category_id = ct.category_id)SELECT REPEAT('--', level - 1) || category_name AS hierarchy, level, path_idsFROM category_treeORDER BY path_ids;Recursive CTEs can run forever if the recursion never terminates. This happens when:
An infinite recursive CTE will consume all available CPU and memory until the database kills it or the server crashes. Always implement safeguards when working with data that might contain cycles.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
-- PROBLEM: Cyclic data causes infinite loop-- Imagine: Alice manages Bob, Bob manages Carol, Carol manages Alice (!) -- SAFEGUARD 1: Maximum depth limitWITH RECURSIVE dangerous_tree AS ( SELECT employee_id, name, manager_id, 0 AS depth FROM employees WHERE employee_id = 1 UNION ALL SELECT e.employee_id, e.name, e.manager_id, dt.depth + 1 FROM employees e INNER JOIN dangerous_tree dt ON e.manager_id = dt.employee_id WHERE dt.depth < 100 -- SAFEGUARD: Stop after 100 levels)SELECT * FROM dangerous_tree; -- SAFEGUARD 2: Track visited nodes (PostgreSQL)WITH RECURSIVE safe_tree AS ( SELECT employee_id, name, manager_id, ARRAY[employee_id] AS visited -- Track visited IDs FROM employees WHERE employee_id = 1 UNION ALL SELECT e.employee_id, e.name, e.manager_id, st.visited || e.employee_id FROM employees e INNER JOIN safe_tree st ON e.manager_id = st.employee_id WHERE e.employee_id != ALL(st.visited) -- Not already visited!)SELECT * FROM safe_tree; -- SAFEGUARD 3: Track path as string (cross-platform)WITH RECURSIVE safe_tree AS ( SELECT employee_id, name, manager_id, CAST(employee_id AS VARCHAR(1000)) AS path FROM employees WHERE employee_id = 1 UNION ALL SELECT e.employee_id, e.name, e.manager_id, st.path || ',' || CAST(e.employee_id AS VARCHAR(10)) FROM employees e INNER JOIN safe_tree st ON e.manager_id = st.employee_id WHERE POSITION(CAST(e.employee_id AS VARCHAR) IN st.path) = 0 -- Not in path)SELECT * FROM safe_tree; -- DATABASE-SPECIFIC: CYCLE detection clause (PostgreSQL 14+, SQL:2016)WITH RECURSIVE tree AS ( SELECT employee_id, name, manager_id FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.employee_id, e.name, e.manager_id FROM employees e INNER JOIN tree t ON e.manager_id = t.employee_id)CYCLE employee_id SET is_cycle USING cycle_path -- Built-in protection!SELECT * FROM tree WHERE NOT is_cycle;Recursive CTEs aren't limited to tree structures. They can solve any problem involving iteration or accumulation—series generation, graph traversal, running calculations, and more.
Generate number or date series without needing a pre-populated table:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Generate numbers 1 to 100WITH RECURSIVE numbers AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM numbers WHERE n < 100)SELECT n FROM numbers; -- Generate dates for the past 30 daysWITH RECURSIVE date_series AS ( SELECT CURRENT_DATE AS dt UNION ALL SELECT dt - INTERVAL '1 day' FROM date_series WHERE dt > CURRENT_DATE - INTERVAL '30 days')SELECT dt FROM date_series ORDER BY dt; -- Generate hourly time slots for a dayWITH RECURSIVE hours AS ( SELECT TIMESTAMP '2024-01-15 00:00:00' AS slot_start UNION ALL SELECT slot_start + INTERVAL '1 hour' FROM hours WHERE slot_start < TIMESTAMP '2024-01-15 23:00:00')SELECT slot_start, slot_start + INTERVAL '1 hour' AS slot_endFROM hours; -- Practical use: Fill gaps in time series dataWITH RECURSIVE all_dates AS ( SELECT DATE '2024-01-01' AS dt UNION ALL SELECT dt + 1 FROM all_dates WHERE dt < DATE '2024-01-31'),daily_sales AS ( SELECT order_date, SUM(amount) as revenue FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31' GROUP BY order_date)-- Every day appears, even if no salesSELECT ad.dt, COALESCE(ds.revenue, 0) as revenueFROM all_dates adLEFT JOIN daily_sales ds ON ad.dt = ds.order_dateORDER BY ad.dt;Recursive CTEs can be expensive. Each iteration involves re-evaluating the recursive member, and deep hierarchies or wide graphs can produce enormous intermediate results.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- OPTIMIZATION 1: Limit depth earlyWITH RECURSIVE tree AS ( SELECT id, parent_id, 0 AS depth FROM nodes WHERE parent_id IS NULL UNION ALL SELECT n.id, n.parent_id, t.depth + 1 FROM nodes n INNER JOIN tree t ON n.parent_id = t.id WHERE t.depth < 5 -- Stop at depth 5, not after finding millions of rows)SELECT * FROM tree WHERE depth = 5; -- Filter in main query after collecting -- OPTIMIZATION 2: Carry only needed columns-- Bad: Wide CTEWITH RECURSIVE heavy AS ( SELECT * FROM big_table WHERE id = 1 -- All 50 columns! UNION ALL SELECT bt.* FROM big_table bt INNER JOIN heavy h ON bt.parent_id = h.id)SELECT id, name FROM heavy; -- Good: Narrow CTEWITH RECURSIVE light AS ( SELECT id, parent_id FROM big_table WHERE id = 1 -- Only 2 columns UNION ALL SELECT bt.id, bt.parent_id FROM big_table bt INNER JOIN light l ON bt.parent_id = l.id)SELECT bt.id, bt.name -- Get other columns at the endFROM light lJOIN big_table bt ON l.id = bt.id; -- OPTIMIZATION 3: Add indexes on join columnsCREATE INDEX idx_parent_id ON nodes(parent_id);-- The base table is indexed; the recursive join benefits -- OPTIMIZATION 4: Use UNION ALL unless deduplication is necessary-- UNION ALL is usually faster; only use UNION if duplicates are possible and problematicWITH RECURSIVE tree AS ( ... UNION ALL -- Not UNION ...)For frequently-traversed hierarchies, consider storing a 'materialized path' column (e.g., '/1/2/4/6/9' for Iris). This enables fast subtree queries with LIKE '/1/2/4/%' without recursion. Trade-off: path must be updated when hierarchy changes.
We've explored the most powerful feature of CTEs—recursion. Let's consolidate the key insights:
What's Next:
Now that you've mastered CTEs—from basic syntax through recursion—the final page provides a comprehensive comparison: CTE vs Subquery. When should you use each? What are the trade-offs? How do they differ in performance, readability, and capability? We'll consolidate everything into practical decision guidelines.
You now understand recursive CTE syntax, execution mechanics, hierarchical traversal patterns, cycle prevention, and non-hierarchical recursion applications. You can traverse trees, generate series, and navigate graphs—capabilities beyond standard SQL.