Loading content...
In programming, we rarely write all our code in a single function. We decompose complex logic into named functions, each with a clear purpose, each testable in isolation. This modular approach makes code maintainable, understandable, and reusable.
SQL, historically, lacked this capability. Every derived table was an anonymous inline subquery—unnamed, untestable, and invisible to the surrounding query's context. Named subqueries change everything. CTEs allow you to give names to intermediate result sets, transforming SQL from a monolithic block into a structured, modular composition of clearly-defined data transformations.
By the end of this page, you will understand how to think about CTEs as modular building blocks, when to extract subqueries into named CTEs versus keeping them inline, how to design CTE chains that tell a coherent data story, and patterns for building complex analytical queries from simple, composable parts.
A CTE is fundamentally a named subquery—a complete SELECT statement given an identifier that can be referenced elsewhere in the query. But calling them "named subqueries" understates their transformative impact on how we write and think about SQL.
Consider what naming does in programming:
calculate_tax(amount) tells you something; (amount * 0.0825) makes you guessThe paradigm shift:
Without CTEs, SQL is expression-oriented—you build complex expressions by nesting simpler ones. The outer expression's reader must mentally unwrap each layer.
With CTEs, SQL becomes declaration-oriented—you declare named intermediate results, then compose them. The reader follows a sequence of named transformations.
123456789101112131415161718192021222324252627282930313233343536373839404142
-- EXPRESSION-ORIENTED (nested, anonymous)-- Reader must decode from inside-outSELECT department_name, (SELECT AVG(salary) FROM employees e WHERE e.department_id = d.department_id AND e.employee_id IN ( SELECT employee_id FROM performance_reviews WHERE rating >= 4 AND review_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 YEAR) )) as avg_top_performer_salaryFROM departments d; -- DECLARATION-ORIENTED (named, sequential)-- Reader follows a narrativeWITH recent_top_performers AS ( SELECT employee_id FROM performance_reviews WHERE rating >= 4 AND review_date >= CURRENT_DATE - INTERVAL '1 year' ), top_performer_salaries AS ( SELECT e.department_id, e.salary FROM employees e INNER JOIN recent_top_performers rtp USING (employee_id) ), department_averages AS ( SELECT department_id, AVG(salary) as avg_salary FROM top_performer_salaries GROUP BY department_id ) SELECT d.department_name, da.avg_salary as avg_top_performer_salaryFROM departments dLEFT JOIN department_averages da USING (department_id);Can a colleague understand your query's purpose by reading only the CTE names and the final SELECT? If yes, your names are good. If they need to read every CTE's body, your names need work.
Let's dissect what makes an effective named subquery. A well-designed CTE has specific characteristics that make it useful as a modular component.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
-- A well-designed named subquery has these properties: WITH quarterly_revenue_by_product ( -- 1. EXPLICIT COLUMN NAMES (when beneficial) product_id, product_name, category, quarter, revenue, units_sold) AS ( -- 2. SINGLE RESPONSIBILITY -- Does one thing: calculates quarterly revenue per product SELECT p.product_id, p.product_name, p.category, DATE_TRUNC('quarter', o.order_date) as quarter, SUM(oi.quantity * oi.unit_price) as revenue, SUM(oi.quantity) as units_sold FROM products p INNER JOIN order_items oi ON p.product_id = oi.product_id INNER JOIN orders o ON oi.order_id = o.order_id WHERE o.status = 'completed' -- 3. APPROPRIATE SCOPE -- Not overly broad (all history) or overly narrow (hardcoded dates) AND o.order_date >= DATE_TRUNC('year', CURRENT_DATE - INTERVAL '2 years') GROUP BY p.product_id, p.product_name, p.category, DATE_TRUNC('quarter', o.order_date)) -- 4. CLEAR CONTRACT-- Users of this CTE know exactly what columns they get-- and what each row represents (one product-quarter combination) SELECT product_name, quarter, revenue, revenue - LAG(revenue) OVER ( PARTITION BY product_id ORDER BY quarter ) as revenue_changeFROM quarterly_revenue_by_productWHERE category = 'Electronics'ORDER BY quarter, revenue DESC;| Property | Description | Why It Matters |
|---|---|---|
| Single Responsibility | CTE does exactly one logical operation | Easier to understand, test, and modify independently |
| Clear Contract | Predictable columns and row semantics | Users don't need to read the implementation |
| Appropriate Scope | Neither too broad nor too narrow | Reusable across multiple main queries |
| Meaningful Name | Name describes WHAT, not HOW | Self-documenting; reduces need for comments |
| Explicit Columns (when needed) | Column list in CTE definition | Prevents ambiguity with expressions/calculations |
The "What vs How" naming principle:
A CTE name should describe what result set it represents, not how it computes that result:
| ❌ How (Implementation-Focused) | ✅ What (Intent-Focused) |
|---|---|
orders_joined_with_customers | customer_orders |
sales_grouped_by_month | monthly_sales |
products_filtered_by_active | active_products |
users_where_login_recent | recently_active_users |
aggregated_stats | department_performance_metrics |
Not every subquery should become a CTE. Over-using CTEs leads to verbose queries where the structure obscures rather than clarifies. Here are the signals that indicate extraction to a named CTE is beneficial:
WHERE id IN (SELECT id FROM other_table) needs no CTEEXISTS (SELECT 1 FROM ...) is clear enough inline123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
-- ❌ OVER-EXTRACTION: CTE adds verbosity without valueWITH active_filter AS ( SELECT user_id FROM users WHERE status = 'active')SELECT * FROM orders WHERE user_id IN (SELECT user_id FROM active_filter); -- ✅ BETTER: Simple enough to be inlineSELECT * FROM orders WHERE user_id IN (SELECT user_id FROM users WHERE status = 'active'); -- ✅ GOOD EXTRACTION: Multiple referencesWITH active_users AS ( SELECT user_id, username, email, region FROM users WHERE status = 'active' AND last_login >= CURRENT_DATE - INTERVAL '30 days')SELECT r.region_name, COUNT(au.user_id) as active_count, (SELECT SUM(amount) FROM orders o WHERE o.user_id IN (SELECT user_id FROM active_users a2 WHERE a2.region = r.region_id)) as region_revenueFROM regions rLEFT JOIN active_users au ON r.region_id = au.region; -- ✅ GOOD EXTRACTION: Complex logic with business meaningWITH customer_lifetime_value AS ( -- Complex calculation deserves a name SELECT c.customer_id, c.customer_name, c.acquisition_date, SUM(o.amount) as total_revenue, COUNT(DISTINCT o.order_id) as total_orders, AVG(o.amount) as avg_order_value, MAX(o.order_date) as last_order_date, DATEDIFF(day, c.acquisition_date, MAX(o.order_date)) as customer_lifespan_days, SUM(o.amount) / NULLIF( DATEDIFF(month, c.acquisition_date, CURRENT_DATE), 0 ) as monthly_revenue_rate FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name, c.acquisition_date)SELECT *FROM customer_lifetime_valueWHERE monthly_revenue_rate > 100ORDER BY total_revenue DESC;The most effective use of named subqueries is to tell a data story—a logical narrative that transforms raw data into actionable insights through a sequence of named steps. Each CTE represents a chapter in this story.
The narrative pattern:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
-- A COMPLETE DATA STORY: Identifying at-risk high-value customers-- Business question: Which premium customers are showing signs of churn? WITH -- Chapter 1: Define our universe (base data) premium_customers AS ( SELECT customer_id, customer_name, email, account_manager_id, DATE_DIFF('day', created_at, CURRENT_DATE) as tenure_days FROM customers WHERE tier = 'premium' AND status = 'active' ), -- Chapter 2: Understand their history (enrichment) customer_order_history AS ( SELECT pc.customer_id, pc.customer_name, pc.email, pc.account_manager_id, pc.tenure_days, COUNT(o.order_id) as total_orders, SUM(o.amount) as lifetime_value, AVG(o.amount) as avg_order_value, MAX(o.order_date) as last_order_date, DATE_DIFF('day', MAX(o.order_date), CURRENT_DATE) as days_since_last_order FROM premium_customers pc LEFT JOIN orders o ON pc.customer_id = o.customer_id GROUP BY pc.customer_id, pc.customer_name, pc.email, pc.account_manager_id, pc.tenure_days ), -- Chapter 3: Calculate behavioral signals (transformation) customer_engagement_signals AS ( SELECT coh.*, -- Calculate expected order frequency CASE WHEN tenure_days > 90 THEN total_orders * 90.0 / tenure_days ELSE total_orders END as orders_per_90_days, -- Calculate recency score (0-100, higher = more recent = better) GREATEST(0, 100 - days_since_last_order * 2) as recency_score, -- Calculate value score (percentile within cohort) PERCENT_RANK() OVER (ORDER BY lifetime_value) * 100 as value_percentile FROM customer_order_history coh ), -- Chapter 4: Apply churn risk criteria (focus) at_risk_customers AS ( SELECT *, -- Risk scoring algorithm CASE WHEN days_since_last_order > 60 AND orders_per_90_days > 3 THEN 'CRITICAL' WHEN days_since_last_order > 45 AND value_percentile > 75 THEN 'HIGH' WHEN days_since_last_order > 30 AND value_percentile > 50 THEN 'MEDIUM' WHEN days_since_last_order > 21 THEN 'LOW' ELSE 'STABLE' END as risk_level FROM customer_engagement_signals -- Only customers who previously ordered regularly WHERE total_orders >= 3 AND tenure_days >= 60 ), -- Chapter 5: Prepare actionable output (aggregation/formatting) prioritized_risk_report AS ( SELECT arc.customer_id, arc.customer_name, arc.email, am.account_manager_name, arc.lifetime_value, arc.last_order_date, arc.days_since_last_order, arc.risk_level, -- Calculate priority score for sorting CASE arc.risk_level WHEN 'CRITICAL' THEN 100 WHEN 'HIGH' THEN 75 WHEN 'MEDIUM' THEN 50 WHEN 'LOW' THEN 25 ELSE 0 END + (arc.value_percentile / 2) as priority_score FROM at_risk_customers arc LEFT JOIN account_managers am ON arc.account_manager_id = am.account_manager_id WHERE arc.risk_level != 'STABLE' ) -- The Final Reveal: Actionable resultsSELECT customer_id, customer_name, email, account_manager_name, lifetime_value, last_order_date, days_since_last_order, risk_levelFROM prioritized_risk_reportORDER BY priority_score DESC, lifetime_value DESCLIMIT 100;Notice how someone can understand this query's purpose by reading only the CTE names: premium_customers → customer_order_history → customer_engagement_signals → at_risk_customers → prioritized_risk_report. The narrative flows logically, each step building on the previous.
Like function signatures in programming, CTEs have implicit "interfaces"—the columns they expose and the semantics of their rows. Well-designed CTE interfaces make your query components reusable and understandable.
| Principle | Description | Example |
|---|---|---|
| Include keys | Always include natural/surrogate keys for joining | customer_id, order_id |
| Name calculations | Give computed columns descriptive names | avg_order_value, days_since_last_login |
| Preserve granularity | Document what each row represents | One row per customer-month |
| Limit scope | Only include columns needed downstream | Avoid SELECT * |
| Use consistent types | Dates as dates, amounts as decimals | No stringly-typed dates |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
-- ❌ POOR INTERFACE: Unclear what we're gettingWITH data AS ( SELECT * FROM orders o JOIN customers c ON o.customer_id = c.customer_id)SELECT * FROM data;-- What columns exist? What does each row represent? -- ✅ GOOD INTERFACE: Clear contractWITH customer_order_summary ( -- Explicit column list documents the interface customer_id, -- PK: unique per row total_orders, -- Count of completed orders total_revenue, -- Sum in USD first_order_date, -- Earliest order last_order_date, -- Most recent order avg_order_value -- Mean order value in USD) AS ( SELECT customer_id, COUNT(*) as total_orders, SUM(amount) as total_revenue, MIN(order_date) as first_order_date, MAX(order_date) as last_order_date, AVG(amount) as avg_order_value FROM orders WHERE status = 'completed' GROUP BY customer_id)-- Consumer knows exactly what they're gettingSELECT customer_id, total_revenue, avg_order_valueFROM customer_order_summaryWHERE total_orders >= 5; -- ✅ GOOD: Documenting row semantics in the nameWITH -- Name indicates: one row per product per day daily_product_sales AS ( SELECT product_id, DATE(order_date) as sale_date, SUM(quantity) as units_sold, SUM(quantity * unit_price) as revenue FROM order_items oi JOIN orders o USING (order_id) GROUP BY product_id, DATE(order_date) ), -- Name indicates: one row per product per month monthly_product_sales AS ( SELECT product_id, DATE_TRUNC('month', sale_date) as sale_month, SUM(units_sold) as units_sold, SUM(revenue) as revenue, COUNT(*) as selling_days FROM daily_product_sales GROUP BY product_id, DATE_TRUNC('month', sale_date) ) SELECT * FROM monthly_product_sales;Always be clear about what each row represents. A CTE named 'order_data' is ambiguous—does it have one row per order, per order item, per customer, or per day? Names like 'order_line_items' or 'daily_order_totals' eliminate this ambiguity.
One of the most practical benefits of named subqueries is testability. Each CTE can be validated independently before being combined into the final query. This is particularly valuable when debugging complex queries.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
-- The full query we're debuggingWITH base_orders AS ( SELECT order_id, customer_id, amount, order_date FROM orders WHERE order_date >= '2024-01-01' AND status = 'completed' ), customer_aggregates AS ( SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent, AVG(amount) as avg_order FROM base_orders GROUP BY customer_id ), high_value_customers AS ( SELECT customer_id, total_spent, order_count FROM customer_aggregates WHERE total_spent > 1000 AND order_count >= 3 ) SELECT c.customer_name, hvc.total_spent, hvc.order_countFROM high_value_customers hvcJOIN customers c USING (customer_id); -- TESTING STRATEGY: Build up incrementally -- TEST 1: Validate base_ordersWITH base_orders AS ( SELECT order_id, customer_id, amount, order_date FROM orders WHERE order_date >= '2024-01-01' AND status = 'completed')-- Check row count and sampleSELECT COUNT(*) as total_rows, MIN(order_date) as earliest_date, MAX(order_date) as latest_date, COUNT(DISTINCT customer_id) as unique_customersFROM base_orders; -- TEST 2: Validate customer_aggregatesWITH base_orders AS ( SELECT order_id, customer_id, amount, order_date FROM orders WHERE order_date >= '2024-01-01' AND status = 'completed' ), customer_aggregates AS ( SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent, AVG(amount) as avg_order FROM base_orders GROUP BY customer_id )-- Verify aggregation logic for a known customerSELECT * FROM customer_aggregatesWHERE customer_id = 'KNOWN_TEST_CUSTOMER_ID';-- Cross-check: manually verify against raw orders -- TEST 3: Validate threshold logicWITH base_orders AS (...), customer_aggregates AS (...), high_value_customers AS ( SELECT customer_id, total_spent, order_count FROM customer_aggregates WHERE total_spent > 1000 AND order_count >= 3 )-- Check edge casesSELECT COUNT(*) as qualifying_customers, MIN(total_spent) as min_spent_in_set, MIN(order_count) as min_orders_in_setFROM high_value_customers;-- Verify: min_spent should be > 1000, min_orders should be >= 3During debugging, temporarily change your main query to SELECT * FROM [cte_name] LIMIT 100. This lets you inspect any intermediate step without modifying the CTE definitions. Once verified, restore the original main query.
Over time, certain CTE patterns emerge repeatedly. Recognizing these patterns accelerates query development and ensures consistent, battle-tested approaches.
The Filter Pattern isolates row selection logic into a named CTE, making the filtering criteria explicit and reusable.
1234567891011121314151617
-- Filter Pattern: Named row selectionWITH eligible_products AS ( SELECT product_id, product_name, category_id FROM products WHERE status = 'active' AND inventory_count > 0 AND category_id NOT IN ( SELECT category_id FROM restricted_categories ))SELECT ep.product_name, COUNT(oi.order_item_id) as times_ordered, SUM(oi.quantity) as total_unitsFROM eligible_products epLEFT JOIN order_items oi ON ep.product_id = oi.product_idGROUP BY ep.product_id, ep.product_name;We've explored CTEs as named, modular building blocks for SQL queries. Let's consolidate the key insights:
What's Next:
Now that you understand CTEs as named, modular subqueries, the next page explores Multiple CTEs—how to define and compose multiple CTEs in a single query, manage dependencies between them, and build sophisticated data pipelines entirely within a single SQL statement.
You now understand how to think about CTEs as modular building blocks, when to extract logic into named subqueries, how to design CTE interfaces, and patterns for building complex queries from simple, composable parts. You're ready to explore multi-CTE compositions.