Loading content...
In the realm of database query execution, selection is the operation that filters rows based on predicates—the fundamental act of answering "which rows satisfy this condition?" Before we explore sophisticated index-based techniques, we must deeply understand the most fundamental selection algorithm: the linear scan, also known as a table scan or sequential scan.
The linear scan is deceptively simple in concept—read every row and check the predicate. Yet this simplicity belies profound engineering implications. Understanding linear scans is essential because they represent the baseline against which all other access methods are measured, they remain the optimal choice in many real-world scenarios, and their behavior defines the lower bound of what any selection algorithm must achieve.
By the end of this page, you will understand the complete mechanics of linear scan implementation, analyze its I/O and CPU cost models with mathematical precision, recognize when linear scans outperform index-based alternatives, and appreciate the optimization techniques that make sequential scans remarkably efficient in production systems.
A linear scan algorithm operates on a conceptually straightforward principle: systematically read every tuple in a relation, evaluate the selection predicate against each tuple, and return those tuples that satisfy the predicate. However, the physical implementation involves intricate coordination between the storage layer, buffer pool, and query executor.
The Execution Model:
When the query executor initiates a linear scan, it follows a carefully orchestrated sequence of operations that spans multiple layers of the database system architecture:
Physical Storage Considerations:
The efficiency of linear scan is intimately tied to how data is physically organized on storage. In heap-organized tables (the most common storage model), tuples are stored in pages in insertion order without any particular sorting. This has critical implications:
Spatial Locality: Consecutive tuples reside in adjacent memory locations within a page, enabling efficient CPU cache utilization during predicate evaluation.
I/O Coalescing: The storage subsystem can coalesce multiple page reads into larger sequential I/O operations, dramatically improving throughput compared to random access patterns.
Prefetching Optimization: The predictable access pattern enables aggressive read-ahead prefetching, where the storage layer anticipates future page requests and loads them before they are needed.
1234567891011121314151617181920212223242526272829
-- Conceptual Linear Scan Algorithm-- For relation R with selection predicate P PROCEDURE LinearScan(R, P): -- Phase 1: Initialization cursor ← OpenHeapScan(R) result ← EmptyResultSet() -- Phase 2: Sequential Page Processing WHILE HasNextPage(cursor): page ← FetchNextPage(cursor) -- Buffer pool access -- Phase 3: Tuple Iteration Within Page FOR EACH tuple IN page.tuples: -- Phase 4: Predicate Evaluation IF EvaluatePredicate(tuple, P): -- Phase 5: Result Collection result.Add(tuple) -- Release page pin when done (in buffer pool) UnpinPage(page) -- Phase 6: Cleanup CloseScan(cursor) RETURN result -- Example execution for: SELECT * FROM Employees WHERE salary > 100000-- R = Employees table-- P = (salary > 100000) predicateModern database systems process data one page at a time, not one tuple at a time. This page-oriented architecture aligns with the fundamental unit of disk I/O and buffer management. The query executor never directly accesses disk—all data access flows through the buffer pool, which manages the mapping between logical page identifiers and physical memory frames.
The dominant cost factor in linear scan operations is disk I/O. Understanding the precise I/O behavior requires analyzing both the best-case and worst-case scenarios, as well as the factors that influence actual performance in production systems.
Fundamental Cost Formula:
Let us define the key parameters:
The I/O cost of a linear scan equals the number of pages that must be read from disk. In the worst case (cold buffer pool):
| Scenario | I/O Cost | Explanation |
|---|---|---|
| Cold buffer pool (no pages cached) | B page reads | Every page must be fetched from disk |
| Warm buffer pool (all pages cached) | 0 page reads | All pages served from memory |
| Partial caching (fraction f cached) | (1-f) × B page reads | Only uncached pages require disk I/O |
| Sequential read optimization | B × t_transfer + t_seek | Single seek, followed by sequential transfer |
| Random read (worst case) | B × (t_transfer + t_seek) | Each page requires separate seek |
Sequential vs. Random I/O:
The distinction between sequential and random I/O is crucial for understanding linear scan performance. When data pages are stored contiguously on disk (which heap files typically maintain), the linear scan benefits from sequential I/O optimization:
Single Seek Amortization: Only one disk head positioning operation is needed for the entire scan, amortizing the seek cost across all pages.
Read-Ahead Prefetching: The storage subsystem detects the sequential access pattern and proactively loads upcoming pages into memory before they are requested.
Large Block Transfers: Modern I/O subsystems coalesce multiple consecutive page requests into larger transfer units (often 1MB or more), maximizing disk bandwidth utilization.
Concrete Example:
Consider a table with 10,000 pages on an SSD:
For spinning disk with the same table:
The 2× penalty for random access on spinning disk makes sequential scan's predictable access pattern extremely valuable.
In production systems, the effective I/O cost depends heavily on buffer pool hit rates. For frequently accessed tables, many or all pages may already reside in memory, reducing effective I/O cost dramatically. Query optimizers must estimate buffer pool hit rates when comparing linear scan costs against index-based alternatives.
123456789101112131415161718192021222324252627282930
-- I/O Cost Calculation for Linear Scan -- Given parameters:-- Pages in relation (B) = 10,000-- Page transfer time (SSD) = 0.1 ms-- Seek time (SSD) = ~0 ms -- Sequential Scan I/O Cost:-- Total Pages = B = 10,000-- Seeks = 1 (for sequential read)-- Transfer Time = B × t_transfer = 10,000 × 0.1ms = 1000ms = 1 sec -- Cost Formula (general):-- Sequential: t_seek + B × t_transfer-- Random: B × (t_seek + t_transfer) -- PostgreSQL EXPLAIN output showing scan cost:EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM employees WHERE department = 'Engineering'; -- Output interpretation:-- Seq Scan on employees (cost=0.00..1542.00 rows=15000 width=64)-- ↑ ↑-- startup cost total cost---- Buffers: shared hit=842 read=700-- ↑ ↑-- pages in cache pages from disk -- The 'cost' unit is typically in "page fetches" or abstract units-- calibrated to the specific system's I/O characteristicsWhile I/O cost dominates for disk-based databases, CPU cost becomes increasingly significant with:
CPU Cost Components:
The CPU cost of a linear scan comprises several distinct components that accumulate per-tuple:
Per-Tuple CPU Cost Model:
Let:
Total CPU cost = T × (c_tuple + c_pred) + (s × T) × c_output
Practical Implications:
The linear relationship between CPU cost and tuple count means that linear scans have O(T) CPU complexity. This becomes problematic for large tables even when data is entirely cached in memory. However, modern CPUs can evaluate millions of simple predicates per second, making CPU cost negligible for moderately-sized tables with simple predicates.
| Predicate Type | Example | Relative CPU Cost | Notes |
|---|---|---|---|
| Simple equality | id = 42 | 1× (baseline) | Direct comparison, highly optimized |
| Range comparison | salary > 50000 | 1-2× | Single comparison operation |
| String equality | name = 'Alice' | 2-5× | Character-by-character comparison |
| LIKE with prefix | name LIKE 'A%' | 3-10× | Prefix match + length check |
| LIKE with wildcard | name LIKE '%son%' | 10-100× | Requires substring search |
| Regular expression | name ~ '^[A-Z].*' | 50-500× | Regex engine invocation |
| User-defined function | my_check(data) | 100-10000× | Function call + arbitrary logic |
A query with a regex predicate on a cached table can be CPU-bound rather than I/O-bound. The optimizer must account for predicate complexity when estimating total cost. This is why many databases track 'operator cost' statistics for different comparison types.
Despite its apparent simplicity, linear scan implementations incorporate sophisticated optimizations that dramatically improve real-world performance. Database vendors have refined these techniques over decades of engineering.
1. Prefetching and Read-Ahead:
Modern linear scan implementations employ multi-level prefetching to hide I/O latency:
12345678910111213141516171819202122232425262728293031323334353637383940
-- Prefetching Strategy in Linear Scan -- Level 1: OS-level read-ahead-- The operating system detects sequential access patterns-- and prefetches pages into the filesystem cache.-- Typically configured via: blockdev --setra <sectors> <device> -- Level 2: Database buffer pool prefetching-- The database explicitly requests future pages before they're needed. PROCEDURE LinearScanWithPrefetch(R, P): cursor ← OpenHeapScan(R) prefetch_distance ← 32 -- Look ahead 32 pages result ← EmptyResultSet() -- Issue initial prefetch requests FOR i FROM 1 TO prefetch_distance: IF HasPage(cursor, i): AsyncPrefetch(cursor, i) -- Non-blocking prefetch current_page ← 0 WHILE HasNextPage(cursor): page ← FetchNextPage(cursor) -- Should be in buffer pool now -- Maintain prefetch window IF HasPage(cursor, current_page + prefetch_distance): AsyncPrefetch(cursor, current_page + prefetch_distance) -- Normal tuple processing FOR EACH tuple IN page.tuples: IF EvaluatePredicate(tuple, P): result.Add(tuple) UnpinPage(page) current_page ← current_page + 1 RETURN result -- Prefetching reduces effective I/O latency from ~0.1ms to near-zero-- by overlapping I/O with CPU processing.2. SIMD-Accelerated Predicate Evaluation:
Modern databases leverage Single Instruction Multiple Data (SIMD) instructions to evaluate predicates on multiple tuples simultaneously. This is particularly effective for:
3. Late Materialization:
Instead of constructing full tuples immediately, late materialization defers column access until absolutely necessary. For highly selective queries, this avoids accessing columns that will never be used in the output.
4. Parallel Sequential Scan:
Multiple worker threads can scan different portions of the table simultaneously. The scan is divided into page ranges, with each worker responsible for a portion:
1234567891011121314151617181920212223
-- Parallel Sequential Scan Strategy -- PostgreSQL example enabling parallel scan:SET max_parallel_workers_per_gather = 4;SET parallel_tuple_cost = 0.01;SET parallel_setup_cost = 1000; -- Query that may use parallel scan:EXPLAIN (ANALYZE) SELECT COUNT(*) FROM large_table WHERE status = 'active'; -- Output showing parallel execution:-- Gather (cost=1000.00..15432.00 rows=1 width=8)-- Workers Planned: 4-- Workers Launched: 4-- -> Partial Aggregate (cost=0.00..14432.00 rows=1 width=8)-- -> Parallel Seq Scan on large_table-- Filter: (status = 'active')-- Rows Removed by Filter: 125000-- Workers: 4 -- Each worker scans approximately (Total Pages / Workers) pages-- Speedup approaches linear with worker count for I/O-bound scans-- CPU-bound scans may see diminishing returns due to synchronizationThese optimizations compound multiplicatively. A parallel scan with 4 workers, combined with prefetching and SIMD predicate evaluation, can execute 10-20× faster than a naive single-threaded implementation, even though the algorithmic complexity remains O(B) for I/O and O(T) for CPU.
Contrary to intuition, linear scans frequently outperform index-based access methods. Understanding when to prefer linear scans is crucial for query optimization and index design. The optimizer must evaluate multiple factors to make this determination.
The Selectivity Threshold:
The most important factor is predicate selectivity—the fraction of tuples satisfying the selection condition. Index scans incur random I/O overhead, while linear scans benefit from sequential I/O. At some selectivity threshold, the random I/O cost of index access exceeds the sequential I/O cost of a full scan.
| Storage Type | Typical Threshold | Reasoning |
|---|---|---|
| Spinning disk (HDD) | 5-10% | High seek penalty makes even moderate hit rates favor sequential I/O |
| Solid-state drive (SSD) | 10-20% | Lower seek penalty raises threshold, but random I/O still slower |
| NVMe SSD | 20-30% | Fast random I/O further raises threshold |
| In-memory tables | 30-50% | No I/O penalty; CPU overhead of index traversal becomes dominant cost |
Scenarios Where Linear Scan Wins:
WHERE status IN ('active', 'pending')Modern query optimizers estimate the cost of both index scan and linear scan paths, choosing whichever has lower estimated cost. This decision depends on selectivity estimates, table statistics, buffer pool state, and hardware characteristics. Inaccurate statistics can lead to suboptimal choices—a common source of query performance problems.
While the core linear scan algorithm is universal, different database systems implement it with subtle variations that affect performance characteristics. Understanding these variations helps when tuning queries across different platforms.
PostgreSQL's Sequential Scan:
1234567891011121314151617181920212223
-- PostgreSQL Sequential Scan Characteristics -- Force sequential scan for demonstration:SET enable_indexscan = off;SET enable_bitmapscan = off; -- View execution details:EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT * FROM orders WHERE total > 1000; -- Sample output:-- Seq Scan on orders (cost=0.00..35784.00 rows=142857 width=84)-- (actual time=0.025..312.456 rows=143102 loops=1)-- Filter: (total > 1000)-- Rows Removed by Filter: 856898-- Buffers: shared hit=18234 read=7654 -- Key PostgreSQL features:-- 1. Buffer ring: Large scans use a 256KB ring buffer to avoid-- polluting the shared buffer pool-- 2. Synchronized scans: Multiple concurrent scans can share I/O-- 3. Visibility map: VACUUM ALL scans skip all-visible pages-- 4. Parallel execution: Configurable worker count for large scansMySQL/InnoDB's Table Scan:
12345678910111213141516171819
-- MySQL InnoDB Table Scan Characteristics -- Force table scan:SELECT /*+ NO_INDEX(orders) */ * FROM orders WHERE total > 1000; -- View execution plan:EXPLAIN FORMAT=TREESELECT * FROM orders WHERE total > 1000; -- Sample output:-- -> Filter: (orders.total > 1000) (cost=85432.10 rows=142857)-- -> Table scan on orders (cost=85432.10 rows=1000000) -- Key MySQL/InnoDB features:-- 1. Clustered index scan: InnoDB tables are organized as B+tree-- indexed by primary key; "table scan" = clustered index scan-- 2. Read-ahead: Automatic prefetching triggered by sequential access-- 3. Change buffer: Delayed secondary index updates don't affect scan-- 4. Buffer pool: Single unified cache for data and indexesOracle's Full Table Scan:
1234567891011121314151617181920
-- Oracle Full Table Scan Characteristics -- Force full table scan:SELECT /*+ FULL(orders) */ * FROM orders WHERE total > 1000; -- View execution plan:EXPLAIN PLAN FOR SELECT * FROM orders WHERE total > 1000;SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); -- Sample output:-- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|-- | 0 | SELECT STATEMENT | | 143K | 11M | 25432 (1)|-- | 1 | TABLE ACCESS FULL| ORDERS | 143K | 11M | 25432 (1)| -- Key Oracle features:-- 1. Direct path reads: Large scans bypass buffer cache, reading-- directly from disk to PGA-- 2. Parallel query: Automatic parallelization for large scans-- 3. DB_FILE_MULTIBLOCK_READ_COUNT: Configures pages per I/O-- 4. In-Memory option: Can scan columnar in-memory representation| Feature | PostgreSQL | MySQL/InnoDB | Oracle |
|---|---|---|---|
| Storage organization | Heap (unordered) | Clustered (PK order) | Heap (unordered) |
| Buffer management | Ring buffer for large scans | Buffer pool LRU | Direct path option |
| Parallel execution | Parallel workers | Limited parallel DML | Parallel query slaves |
| Prefetching | Explicit + OS-level | Read-ahead threads | Multi-block reads |
| Visibility handling | Visibility map + hint bits | MVCC with undo logs | MVCC with undo tablespace |
Armed with deep understanding of linear scan mechanics, let us synthesize practical guidelines for database practitioners. These guidelines help in query tuning, index design decisions, and capacity planning.
Query Tuning Guidelines:
Linear scans are not inherently bad—they're a legitimate access method with specific optimal use cases. The goal is not to eliminate all table scans, but to ensure each scan is appropriate for its context. A table scan returning 50% of a large table is correct; one returning 0.001% is probably a missing index.
Performance Monitoring Queries:
1234567891011121314151617181920212223242526272829303132333435
-- PostgreSQL: Find tables with excessive sequential scans SELECT schemaname, relname AS table_name, seq_scan, seq_tup_read, idx_scan, CASE WHEN seq_scan > 0 THEN ROUND(seq_tup_read::numeric / seq_scan, 2) ELSE 0 END AS avg_rows_per_scan, n_live_tup AS estimated_rowsFROM pg_stat_user_tablesWHERE seq_scan > 100 -- Tables with many scans AND n_live_tup > 10000 -- Larger tables AND (idx_scan = 0 OR seq_scan > idx_scan * 10) -- Heavily favoring scansORDER BY seq_tup_read DESCLIMIT 20; -- Interpretation:-- High seq_scan with low avg_rows_per_scan suggests-- inefficient scans that should use indexes.-- High seq_scan with high avg_rows_per_scan is likely-- intentional (analytical queries on large tables). -- MySQL: Similar analysis using performance_schemaSELECT object_schema, object_name, count_read AS table_scan_rows_read, count_fetch AS index_rows_fetchedFROM performance_schema.table_io_waits_summary_by_tableWHERE object_schema NOT IN ('mysql', 'performance_schema', 'sys')ORDER BY count_read DESC;We have explored the linear scan algorithm in comprehensive depth, transforming what might seem like a trivial operation into a nuanced understanding of database access fundamentals. Let us consolidate the essential knowledge:
What's Next:
With linear scan as our baseline understanding, we proceed to explore Index Selection—how databases leverage indexes to avoid full table scans when predicates are selective. Understanding the contrast between these access methods is fundamental to query optimization.
You now possess expert-level understanding of linear scan implementation—the foundational access method against which all others are measured. This knowledge forms the essential baseline for analyzing index-based selection strategies and making informed access path decisions.