Loading content...
Every database query ultimately needs to examine data stored on disk. The table scan—also known as a full table scan or sequential scan—is the most fundamental access method in relational database systems. It works exactly as its name suggests: the database reads every single row in a table, examines each one, and returns those that satisfy the query conditions.
While table scans are often viewed negatively (associated with poor performance), they are far more nuanced than their reputation suggests. Understanding when table scans are appropriate—and when they are catastrophic—is essential knowledge for any database professional.
By the end of this page, you will understand exactly how table scans work at the storage layer, why they can be both the best and worst access method choice, how to recognize when table scans are appropriate, and how modern databases optimize sequential access patterns for maximum throughput.
A table scan is conceptually simple but involves sophisticated machinery at the storage engine level. Let's trace exactly what happens when a database executes a table scan:
Step 1: Page-Level Organization
Database tables are not stored as flat files but as collections of fixed-size pages (typically 4KB, 8KB, or 16KB depending on the DBMS). Each page contains multiple rows, along with metadata such as page headers, slot directories, and free space pointers.
When executing a table scan, the database doesn't read individual rows—it reads entire pages. This is crucial because disk I/O operates at the block level, not the byte level. Reading one byte costs the same as reading 8KB if that's your page size.
1234567891011121314151617181920
function TableScan(table, predicate): result = [] // Iterate through all pages belonging to this table for each page_id in table.page_list: // Read page from disk into buffer pool page = buffer_pool.fetch(page_id) // Iterate through all rows (tuples) in the page for each row in page.rows: // Apply the WHERE clause predicate if predicate.evaluate(row) == true: result.append(row) // Page may be unpinned from buffer (LRU eviction) buffer_pool.unpin(page_id) return resultStep 2: Buffer Pool Interaction
The database doesn't directly read from disk for every page. Instead, it uses a buffer pool—an in-memory cache of recently accessed pages. When a table scan requests a page:
For a fresh table scan on a cold cache, nearly every page will be a buffer miss. For repeated scans of the same table, many pages may already be cached.
Step 3: Sequential Access Optimization
Modern disk subsystems and operating systems are highly optimized for sequential access patterns. When a database detects that it's reading pages sequentially (adjacent page_id values), several optimizations kick in:
On traditional hard drives, sequential reads can achieve 100-200 MB/s, while random reads may only achieve 1-5 MB/s due to seek overhead. This 20-100x performance difference is why table scans often outperform index scans when a large fraction of the table must be read.
Understanding the cost of a table scan is essential for query optimization. The cost model must account for both I/O and CPU components.
I/O Cost
The I/O cost of a table scan is straightforward: you must read every page of the table. If a table has B pages, and each page read costs t_I/O time units, the I/O cost is:
I/O Cost = B × t_I/O
However, this formula assumes all pages require disk I/O. In reality, some pages may be in the buffer pool. A more accurate model incorporates the buffer hit rate (h):
I/O Cost = B × (1 - h) × t_disk + B × h × t_memory
Where:
t_disk ≈ 5-10ms for HDD, 0.1-0.5ms for SSDt_memory ≈ 0.0001ms (100 nanoseconds)h varies from 0% (cold cache) to approaching 100% (hot table)| Scenario | Pages (B) | Buffer Hit Rate | Disk I/O Time | Total I/O Cost |
|---|---|---|---|---|
| Small table, cold cache | 100 | 0% | 5ms/page | 500ms |
| Small table, warm cache | 100 | 80% | 5ms/page | 100ms |
| Large table, cold cache | 100,000 | 0% | 5ms/page | 500 seconds (8.3 min) |
| Large table, SSD | 100,000 | 0% | 0.2ms/page | 20 seconds |
| Large table, fully cached | 100,000 | 100% | N/A | 10ms (memory only) |
CPU Cost
Beyond I/O, the CPU must examine every row to evaluate the WHERE clause predicate. If a table has N rows, and evaluating the predicate costs t_cpu per row:
CPU Cost = N × t_cpu
For simple predicates (equality checks, comparisons), t_cpu is negligible. For complex predicates involving string matching, function calls, or subqueries, CPU cost can become significant.
Total Cost Formula
Combining both components:
Total Table Scan Cost = Pages × t_I/O + Rows × t_cpu
Or with buffer considerations:
Total Cost = Pages × [(1-h) × t_disk + h × t_memory] + Rows × t_cpu
For typical OLTP workloads on spinning disks, I/O cost dominates by 4-6 orders of magnitude. A 5ms disk read is 50,000 times slower than a 100-nanosecond memory access. This is why reducing I/O is the primary focus of query optimization.
Despite their reputation, table scans are often the best choice. Understanding these scenarios is crucial for proper query tuning.
Scenario 1: High Selectivity Inversion
When a query must return a large fraction of the table's rows (typically >10-20%), table scans outperform index scans. Consider a query that returns 30% of rows:
123456789
-- Returns ~40% of rows (assume status distribution is skewed)-- Table scan likely optimal hereSELECT * FROM orders WHERE status != 'pending'; -- Returns all rows - table scan is the ONLY optionSELECT * FROM orders; -- Aggregation over entire table - no alternative to full scanSELECT COUNT(*), SUM(total_amount) FROM orders;Scenario 2: Small Tables
For very small tables (fitting in a few pages), table scans are almost always optimal. The overhead of using an index—traversing B-tree levels, following pointers—exceeds the cost of just reading all rows directly.
Rule of thumb: If a table fits in 10-20 pages (80-320KB at 8KB page size), index overhead likely exceeds index benefit.
| Table Size | Pages (8KB) | Rows (~100B each) | Recommended Access |
|---|---|---|---|
| < 100 KB | < 12 | < 1,000 | Always table scan |
| 100 KB - 1 MB | 12 - 125 | 1,000 - 10,000 | Table scan unless selectivity < 1% |
| 1 MB - 100 MB | 125 - 12,500 | 10K - 1M | Depends on selectivity (~5-15% threshold) |
100 MB | 12,500 | 1M | Index scan for selectivity < 10-20% |
Scenario 3: No Suitable Index Exists
If no index supports the query's filter conditions, a table scan is the only option. This is the most common reason for table scans in practice.
Scenario 4: Clustered Table Organization
If the table is physically sorted by (clustered on) the column being queried, a table scan can efficiently skip irrelevant sections through early termination.
Scenario 5: Analytical Workloads (OLAP)
Data warehousing and analytics queries typically aggregate, transform, or analyze entire data sets. These workloads are inherently full-scan operations, which is why columnar databases optimize specifically for this pattern.
Not every table scan indicates a missing index. Some queries genuinely need to examine most of the table. Adding an index may actually slow these queries down while consuming storage and slowing writes. Always analyze selectivity before creating indexes.
While table scans have legitimate uses, they can also cripple system performance. Recognizing these anti-patterns is critical.
Anti-Pattern 1: Needle in a Haystack
Querying for a single row (or tiny fraction of rows) in a large table via table scan is the classic performance disaster:
123456789101112131415
-- Finding one user in 10 million: O(n) vs O(log n)-- Without index: Reads potentially ALL 10 million rows-- With index: Reads ~4 tree levels + 1 data page = 5 pagesSELECT * FROM users WHERE user_id = 12345; -- Pagination on large tables without proper indexing-- This gets progressively worse as offset increasesSELECT * FROM logs ORDER BY created_at LIMIT 20 OFFSET 1000000; -- Join with table scan on large table-- Forces full scan for EACH row in the driving tableSELECT o.*, c.name FROM orders o JOIN customers c ON c.customer_id = o.customer_id-- If customers has no index on customer_id, disaster ensuesAnti-Pattern 2: Table Scan in Loop / Nested Operations
When a table scan occurs inside a nested loop join or application-level iteration, the cost multiplies:
Total Cost = Outer Loop Iterations × Table Scan Cost
If the outer loop runs 10,000 times and each table scan takes 1 second, total time is 10,000 seconds (2.8 hours)!
Anti-Pattern 3: Buffer Pool Pollution
Large table scans can evict other frequently-accessed pages from the buffer pool, causing collateral damage to unrelated queries. A single analytical query scanning a terabyte table can flush the entire cache, suddenly making all OLTP queries hit disk.
Modern databases address this with scan-resistant buffer pools that use separate buffers for sequential scans, or algorithms like LRU-K that don't promote sequential scan pages as aggressively.
The most dangerous table scans are those that work fine in development (small data) but fail in production (large data). Always test queries with production-scale data volumes. A 10ms query on 1,000 rows becomes a 10,000ms query on 1,000,000 rows if it's doing a table scan.
Modern databases implement several optimizations and variants of the basic table scan.
Parallel Sequential Scan
For large tables, databases can divide the table into segments and scan multiple segments concurrently using separate worker processes. PostgreSQL, Oracle, and SQL Server all support parallel table scans.
With P parallel workers and B pages:
Parallel Scan Time ≈ (B / P) × t_I/O
However, parallelism is limited by:
123456789101112
-- PostgreSQL parallel scan execution planEXPLAIN (ANALYZE) SELECT COUNT(*) FROM large_table; /*Finalize Aggregate (actual=1 loops=1) -> Gather (actual=4 loops=1) Workers Planned: 3 Workers Launched: 3 -> Partial Aggregate (actual=1 loops=4) -> Parallel Seq Scan on large_table (actual=250000 loops=4) -- Each worker scans ~250K rows of the 1M total*/Synchronized Scans
When multiple queries scan the same large table concurrently, modern databases can synchronize them. Instead of each query starting from page 1, a new scan joins an already-in-progress scan at its current position, wraps around the table, and finishes where it started.
This provides:
Sample Scans
For approximate analytics, databases can scan a random sample of pages rather than all pages:
-- PostgreSQL tablesample
SELECT AVG(amount) FROM orders TABLESAMPLE SYSTEM (10); -- 10% of pages
Sample scans trade accuracy for speed—useful for exploratory data analysis.
Each major RDBMS implements table scans with subtle differences. Understanding these helps in cross-platform optimization.
| DBMS | Operator Name | Parallel Support | Special Features |
|---|---|---|---|
| PostgreSQL | Seq Scan / Parallel Seq Scan | Yes (configurable workers) | Synchronized scans, TABLESAMPLE |
| MySQL (InnoDB) | Full Table Scan | Limited (8.0+) | Clustered index scan (always uses PK order) |
| Oracle | TABLE ACCESS FULL | Yes (DOP parameter) | Direct path reads, serial/parallel modes |
| SQL Server | Table Scan / Clustered Index Scan | Yes (MaxDOP) | Batch mode, columnstore integration |
| SQLite | SCAN TABLE | No | Very simple implementation |
PostgreSQL Details
PostgreSQL's sequential scan reads pages in physical order (using the visibility map to skip pages containing only dead tuples during certain operations). The seq_page_cost parameter (default 1.0) allows tuning the optimizer's cost estimate.
MySQL/InnoDB Details
InnoDB tables are organized as clustered indexes (B+ tree ordered by primary key). A 'full table scan' in InnoDB is actually a full clustered index scan—it reads leaf pages in primary key order, not physical order. This means:
Oracle Details
Oracle distinguishes between buffered scans (read through buffer cache) and direct path reads (bypass cache, read directly to process memory). Direct path reads are faster for large scans because they avoid buffer pool contention, but they skip the cache entirely.
In execution plans, look for 'Seq Scan' (PostgreSQL), 'Full Table Scan' or 'type: ALL' (MySQL), 'TABLE ACCESS FULL' (Oracle), or 'Table Scan' (SQL Server). These all indicate the full table scan access method.
Let's examine real execution plans and analyze when table scans are appropriate.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
-- Create a sample table with 1 million rowsCREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL, status VARCHAR(20) NOT NULL, total_amount DECIMAL(10,2) NOT NULL, created_at TIMESTAMP DEFAULT NOW()); -- Insert test dataINSERT INTO orders (customer_id, status, total_amount)SELECT (random() * 10000)::int, (ARRAY['pending','processing','shipped','delivered'])[1 + (random()*3)::int], (random() * 1000)::decimal(10,2)FROM generate_series(1, 1000000); -- Force statistics updateANALYZE orders; -- Example 1: Full aggregation (table scan is optimal)EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*), AVG(total_amount) FROM orders; /*Aggregate (cost=20834.00..20834.01 rows=1) (actual time=347.123..347.124 rows=1) Buffers: shared hit=8334 -> Seq Scan on orders (cost=0.00..18334.00 rows=1000000) (actual time=0.011..127.456 rows=1000000) Buffers: shared hit=8334Planning Time: 0.052 msExecution Time: 347.156 ms*/ -- Example 2: Low selectivity (table scan appropriate)EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE status = 'delivered'; /*Seq Scan on orders (cost=0.00..20834.00 rows=250032) (actual time=0.017..312.456 rows=250089) Filter: ((status)::text = 'delivered'::text) Rows Removed by Filter: 749911 Buffers: shared hit=8334*/-- ~25% selectivity = table scan is faster than index + heap lookups -- Example 3: High selectivity (index would be better)EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE order_id = 500000; /*Seq Scan on orders (cost=0.00..20834.00 rows=1) (actual time=156.234..223.456 rows=1) Filter: (order_id = 500000) Rows Removed by Filter: 999999 Buffers: shared hit=8334*/-- Finding 1 row scanned ALL 1 million - catastrophically inefficient! -- Create index to fixCREATE INDEX idx_orders_id ON orders(order_id); EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE order_id = 500000; /*Index Scan using idx_orders_id on orders (cost=0.42..8.44 rows=1) (actual time=0.019..0.020 rows=1) Index Cond: (order_id = 500000) Buffers: shared hit=4*/-- Now: 4 buffer hits instead of 8334!You now have comprehensive knowledge of table scans—the most fundamental access method in relational databases. Let's consolidate the key takeaways:
What's Next:
Now that we understand table scans, we'll explore index scans—the primary alternative access method. Index scans trade sequential access simplicity for targeted, selective data retrieval, enabling efficient lookups in tables of any size.
You now understand table scans at a deep level—their mechanics, cost models, optimal use cases, and dangerous anti-patterns. This knowledge is fundamental to understanding all other access methods and query optimization decisions.