Loading learning content...
Every SQL query you execute follows an intricate journey from human-readable text to machine-executable operations. For decades, databases relied on interpreted execution—traversing operator trees at runtime, invoking virtual function calls for each tuple processed. This approach, while flexible, left significant performance on the table.
Query compilation represents a fundamental paradigm shift. Instead of interpreting queries, modern databases compile them into optimized machine code, eliminating interpretation overhead and unlocking performance gains that were previously unattainable. This technique has become the defining characteristic of high-performance analytical databases and is increasingly adopted by traditional OLTP systems.
By the end of this page, you will understand the fundamental principles of query compilation, including the distinction between interpretation and compilation, the phases of the compilation pipeline, code generation strategies, and how leading database systems implement compiled query execution.
To appreciate query compilation, we must first understand what it replaces. Traditional database query execution follows the Volcano iterator model (also known as the pull-based model), introduced by Goetz Graefe in the 1990s. In this model:
next() callsnext() until data is exhaustedWhile elegant and composable, this approach introduces substantial overhead:
Virtual function call overhead:
Each next() call is typically a virtual function invocation. For a query processing millions of tuples, this translates to millions of indirect calls—each incurring branch misprediction penalties and instruction cache pollution.
Tuple-at-a-time processing:
Processing one tuple per next() call prevents modern CPU optimization. Tight inner loops cannot leverage SIMD instructions, and the overhead of function calls dominates actual computation time.
Interpretation overhead: The executor must repeatedly examine operator types, dispatch to appropriate handlers, and manage generic data structures—work that could be eliminated with compile-time knowledge.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
// Traditional Volcano Iterator Model// Each operator inherits from a base class with virtual next() class Operator {public: virtual ~Operator() = default; virtual void open() = 0; virtual Tuple* next() = 0; // Virtual call overhead! virtual void close() = 0;}; class TableScan : public Operator { Table* table; size_t position = 0;public: TableScan(Table* t) : table(t) {} void open() override { position = 0; } Tuple* next() override { // Each call processes exactly ONE tuple if (position >= table->size()) return nullptr; return table->getTuple(position++); } void close() override {}}; class Filter : public Operator { Operator* child; Predicate* predicate;public: Filter(Operator* c, Predicate* p) : child(c), predicate(p) {} Tuple* next() override { // Another virtual call, plus predicate evaluation while (Tuple* tuple = child->next()) { // Virtual call! if (predicate->evaluate(tuple)) { // Another virtual call! return tuple; } } return nullptr; } // ... open/close implementations}; // Query execution: millions of virtual calls for large tablesvoid executeQuery(Operator* root) { root->open(); while (Tuple* tuple = root->next()) { // Virtual call per tuple! outputTuple(tuple); } root->close();}Studies show that interpreted execution spends only 10-20% of CPU time on actual data processing. The remaining 80-90% goes to interpretation overhead—virtual function dispatch, type checking, data structure management, and memory indirection. Query compilation aims to eliminate this tax entirely.
Query compilation transforms a declarative SQL query into imperative, executable code that can be run directly by the CPU without interpretation overhead. The compiled code is specifically generated for the exact query at hand, allowing aggressive optimizations impossible with generic interpreted execution.
The core insight:
At query compilation time, we know everything about the query:
This complete knowledge enables generating specialized code rather than relying on generic, parameterized operators.
Query compilation follows a multi-stage pipeline that transforms SQL text into executable machine code. Understanding this pipeline is essential for grasping how databases achieve high-performance execution.
Stage 1: Parsing and Analysis
The SQL query is parsed into an Abstract Syntax Tree (AST), then analyzed for semantic correctness. This includes name resolution, type checking, and authorization verification. The output is a validated query representation.
Stage 2: Logical Planning
The query is transformed into a logical plan expressed in relational algebra. This plan represents what the query computes, not how it will be executed.
Stage 3: Optimization
The query optimizer explores equivalent logical plans and selects the most efficient physical plan based on cost estimation. This includes selecting access methods, join algorithms, and operator ordering.
Stage 4: Code Generation
The physical plan is transformed into executable code. This is where compilation diverges from interpretation—instead of building an operator tree, we generate specialized code.
Stage 5: Compilation and Execution
The generated code is compiled (via JIT or ahead-of-time compilation) and executed. The result is streamed to the client.
| Stage | Input | Output | Key Transformations |
|---|---|---|---|
| Parsing | SQL text | AST | Lexical analysis, syntax validation, tree construction |
| Semantic Analysis | AST | Validated AST | Name resolution, type checking, privilege verification |
| Logical Planning | Validated AST | Logical Plan | Relational algebra translation, view expansion |
| Optimization | Logical Plan | Physical Plan | Cost estimation, plan enumeration, access path selection |
| Code Generation | Physical Plan | IR/Source Code | Operator fusion, specialization, vectorization |
| Compilation | IR/Source Code | Machine Code | JIT compilation, register allocation, instruction selection |
| Execution | Machine Code | Query Results | Direct CPU execution, minimal interpretation |
Query compilation adds upfront latency (typically 10-100ms) before query execution begins. This is acceptable for long-running analytical queries where compilation time is amortized over billions of tuples. For short OLTP queries, databases use prepared statements and plan caching to avoid repeated compilation.
There are multiple approaches to generating executable code from query plans. Each strategy offers different trade-offs between compilation speed, execution speed, and implementation complexity.
Approach 1: Transpilation to C/C++
Generate C or C++ source code, then invoke a system compiler (gcc, clang). This leverages decades of compiler optimization but incurs high compilation latency (seconds per query).
Approach 2: LLVM JIT Compilation
Generate LLVM Intermediate Representation (IR) and use LLVM's JIT compiler. Faster than transpilation (~100ms), with excellent optimization quality.
Approach 3: Bytecode Interpretation
Generate bytecode for a custom virtual machine. Extremely fast code generation but lower execution performance than native compilation.
Approach 4: Adaptive Compilation
Start with interpretation or bytecode, then compile to native code for hot queries. Balances compilation latency with execution performance.
123456789101112131415161718192021222324252627282930313233343536
// Example: Generating C code for a simple query// SELECT name, salary FROM employees WHERE dept_id = 5 AND salary > 50000 // Generated code eliminates all interpretation overhead:void execute_query_12345( Table* employees, OutputBuffer* output) { // Column offsets resolved at compile time constexpr int NAME_OFFSET = 0; constexpr int SALARY_OFFSET = 32; constexpr int DEPT_OFFSET = 40; // Scan with inlined predicates - no virtual calls for (size_t i = 0; i < employees->num_rows; ++i) { char* row = employees->data + (i * employees->row_size); // Predicate evaluation - fully inlined int32_t dept_id = *reinterpret_cast<int32_t*>(row + DEPT_OFFSET); if (dept_id != 5) continue; // Constant folded int64_t salary = *reinterpret_cast<int64_t*>(row + SALARY_OFFSET); if (salary <= 50000) continue; // Constant folded // Project - only needed columns output->append(row + NAME_OFFSET, 32); // name output->append(&salary, 8); // salary }} // Contrast with interpreted version which would require:// - Virtual calls to TableScan::next()// - Virtual calls to Filter::next() with predicate object// - Virtual calls to Project::next() with column list// - Type dispatch for each column access// - Runtime column offset lookup| Strategy | Compile Time | Execution Speed | Optimization Quality | Implementation Complexity |
|---|---|---|---|---|
| Transpilation (C/C++) | 1-10 seconds | Excellent | Best (mature compilers) | Moderate |
| LLVM JIT | 50-200ms | Excellent | Excellent | High |
| Custom JIT | 5-20ms | Very Good | Good | Very High |
| Bytecode + VM | 1-5ms | Good | Limited | Moderate |
| Adaptive | Variable | Adapts to workload | Varies | Very High |
Query compilation often employs the push model (also called data-centric or producer-driven execution), which inverts the traditional Volcano pull model. Instead of operators pulling data from children, the producer pushes data toward consumers.
Why push is better for compilation:
In the push model, control flow is inverted. The innermost loop is the data source (scan), and all downstream operators are inlined into that loop. This enables loop fusion—combining multiple operators into a single, cache-efficient loop.
The key insight (from Thomas Neumann's seminal work):
Data should be pushed from one operator to the next, keeping tuples in CPU registers as long as possible.
This approach minimizes materialization, maximizes register utilization, and enables aggressive compiler optimizations.
12345678910111213141516171819202122232425262728293031323334353637383940
// Push Model with Loop Fusion// Query: SELECT SUM(price) FROM orders WHERE status = 'shipped' // PULL MODEL (Volcano) - operator boundaries prevent optimizationclass Sum { Operator* child; int64_t sum = 0;public: void execute() { while (Tuple* t = child->next()) { // Virtual call sum += t->getInt64("price"); // Virtual call } }}; // PUSH MODEL (Compiled) - single fused loopvoid execute_sum_of_shipped_orders(Table* orders, int64_t* result) { int64_t sum = 0; // Register variable! // All operators fused into single loop for (size_t i = 0; i < orders->num_rows; ++i) { // Table scan auto* row = orders->getRow(i); // Filter (inlined) if (row->status != Status::SHIPPED) continue; // Aggregation (inlined) - accumulator stays in register sum += row->price; } *result = sum;} // Benefits:// 1. No function call overhead// 2. 'sum' stays in CPU register throughout// 3. No intermediate materialization// 4. Compiler can auto-vectorize the loop// 5. Excellent cache localityNot all operators can be fused. Pipeline breakers like sorts, hash builds, and subquery results require materialization, breaking the pipeline. Compiled databases identify pipeline boundaries and generate separate code fragments for each pipeline, passing materialized results between them.
Modern CPUs support SIMD (Single Instruction, Multiple Data) instructions that operate on multiple values simultaneously. A single AVX-512 instruction can process 16 32-bit integers or 8 64-bit integers in parallel.
Vectorization adapts query execution to exploit SIMD capabilities. Rather than processing one tuple at a time, vectorized execution processes vectors (batches) of column values, enabling the CPU to parallelize computation.
Query compilation can generate vectorized code that:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// Vectorized Predicate Evaluation using AVX-512// Evaluates: salary > 50000 for 16 integers simultaneously #include <immintrin.h> // Scalar version - processes 1 value per iterationvoid filter_salary_scalar( const int32_t* salaries, size_t count, bool* result) { for (size_t i = 0; i < count; ++i) { result[i] = salaries[i] > 50000; }} // Vectorized version - processes 16 values per iterationvoid filter_salary_simd( const int32_t* salaries, size_t count, bool* result) { // Broadcast threshold to all lanes __m512i threshold = _mm512_set1_epi32(50000); size_t i = 0; for (; i + 16 <= count; i += 16) { // Load 16 salary values __m512i values = _mm512_loadu_si512(&salaries[i]); // Compare all 16 simultaneously (single instruction!) __mmask16 mask = _mm512_cmpgt_epi32_mask(values, threshold); // Store 16 boolean results _mm512_mask_storeu_epi8(&result[i], mask, _mm512_set1_epi8(1)); } // Handle remaining elements for (; i < count; ++i) { result[i] = salaries[i] > 50000; }} // Performance: SIMD version achieves ~10-16x throughput// on predicate evaluation compared to scalar code| Instruction Set | Register Width | 32-bit Ints per Op | 64-bit Ints per Op | Availability |
|---|---|---|---|---|
| SSE2 | 128-bit | 4 | 2 | Universal (2001+) |
| AVX2 | 256-bit | 8 | 4 | Common (2013+) |
| AVX-512 | 512-bit | 16 | 8 | Modern servers (2017+) |
| ARM NEON | 128-bit | 4 | 2 | ARM processors |
| SVE/SVE2 | Variable | Variable | Variable | ARM v8.2+ |
When query code is generated as tight loops over columnar data, modern compilers (LLVM, GCC) can automatically apply SIMD optimizations. This means compiled queries often gain vectorization 'for free' without explicit SIMD intrinsics, as long as the loop structure is amenable to auto-vectorization.
Query compilation has been adopted by numerous database systems, each with distinct implementation strategies. Understanding these implementations provides insight into the practical considerations of query compilation.
| Database | Compilation Strategy | Target | Notable Features |
|---|---|---|---|
| HyPer/Umbra | LLVM JIT | Native code | Pioneered data-centric compilation; morsel-driven parallelism |
| Apache Spark | Whole-stage codegen (Janino) | JVM bytecode | Tungsten engine; fuses multiple operators |
| ClickHouse | LLVM JIT (optional) | Native code | Vectorized + compiled hybrid; column-oriented |
| DuckDB | Vectorized + interpretation | Specialized | Optimized for embedded use; considers compilation |
| PostgreSQL | JIT via LLVM | Native code | Expression compilation; optional for complex queries |
| MemSQL/SingleStore | LLVM JIT | Native code | Full query compilation; code generation |
| Presto/Trino | Bytecode generation | JVM bytecode | Adaptive compilation based on query complexity |
Case Study: HyPer's Data-Centric Compilation
HyPer (developed by Thomas Neumann at TU Munich) pioneered modern query compilation. Its key innovations:
Data-centric execution: Tuples are kept in CPU registers as long as possible, pushed through pipelines without materialization.
LLVM-based code generation: Queries are translated to LLVM IR, then JIT-compiled to native code. Compilation takes ~100ms per query.
Morsel-driven parallelism: Work is divided into morsels (small chunks), enabling parallel execution with good load balancing.
Adaptive re-compilation: Execution can switch between interpreted and compiled modes based on runtime characteristics.
HyPer demonstrated that compiled queries can achieve 10-100x performance improvements over interpreted execution for analytical workloads.
PostgreSQL 11+ includes optional JIT compilation via LLVM. When enabled (jit = on), PostgreSQL compiles expression evaluation, tuple deforming, and other hot paths. JIT is triggered when estimated query cost exceeds jit_above_cost (default: 100000). This hybrid approach offers compilation benefits without universal compilation overhead.
We've explored the fundamental concepts of query compilation—a transformative technique that converts SQL queries into highly optimized machine code. Let's consolidate the key insights:
What's next:
Compiled queries are most valuable when compilation cost is amortized over repeated executions. The next page explores prepared statements—the mechanism by which applications reuse compiled query plans, avoiding redundant parsing and optimization while maintaining security against SQL injection.
You now understand the fundamentals of query compilation, including why interpretation is costly, how compilation eliminates overhead, and how modern databases generate and execute compiled queries. Next, we'll explore prepared statements and their role in efficient query execution.