Loading learning content...
In the world of database performance, few phenomena are as simultaneously simple to understand and devastating in impact as the full table scan. Also known as a sequential scan or table scan, this operation reads every single row in a table to find the data you need—regardless of whether you're looking for one row or a million.
Imagine searching for a single book in a library by walking through every aisle, examining every shelf, and reading the title of every book—even when you know the exact title you need. That's a full table scan. A librarian who knows the Dewey Decimal System (an index) finds your book in seconds. Without that organizational system, finding a single book in a million-volume library could take days.
Full table scans are the #1 cause of slow queries in production databases. They're often invisible—queries work perfectly during development with 1,000 rows but collapse under 10 million rows in production. Understanding when they occur, why they're harmful, and how to eliminate them is foundational to writing performant SQL.
By the end of this page, you will understand exactly what happens during a full table scan, recognize the conditions that trigger them, diagnose them using execution plans, and apply proven strategies to eliminate or mitigate their impact. You'll also understand the rare situations where full table scans are actually the optimal choice.
To truly understand full table scans, we must first understand how data is physically stored in database systems and what happens mechanically when the database engine executes a scan.
Physical Storage Model:
Relational databases store data in fixed-size units called pages (or blocks). The typical page size is 8KB (SQL Server, PostgreSQL) or 16KB (Oracle, MySQL InnoDB). Each page contains:
Pages are organized into extents (groups of 8 contiguous pages in SQL Server, for example), and extents belong to segments or tablespaces. When you create a table, the database allocates extents as needed to store your data.
The Scan Mechanism:
During a full table scan, the database engine:
123456789101112131415161718
-- This query triggers a full table scan if no index exists on emailSELECT customer_id, first_name, last_name, emailFROM customersWHERE email = 'john.doe@example.com'; -- What happens internally:-- 1. Database identifies all extents belonging to 'customers' table-- 2. For each page in each extent:-- a. Load page into buffer pool (I/O operation)-- b. For each row on the page:-- - Read the email column value-- - Compare to 'john.doe@example.com'-- - If match, add to result set-- 3. Continue until ALL pages have been processed-- 4. Return matching row(s) -- With 10 million customers stored across 125,000 pages:-- The database must read ALL 125,000 pages to find ONE email.The devastating inefficiency of full table scans comes from I/O amplification. To find a single customer by email in a 10-million-row table, the database may need to read ~1GB of data from disk. With an index, it might read only 3-4 pages (~32KB)—a 30,000x improvement. At scale, this difference determines whether queries take milliseconds or minutes.
Buffer Pool Considerations:
When a page is read from disk, it's stored in the buffer pool (or buffer cache)—a region of memory that caches frequently accessed pages. If a page is already in the buffer pool, no disk I/O is required.
However, full table scans are particularly harmful to the buffer pool:
Some databases mitigate this with small table scans (keeping tiny tables in cache) or direct path reads (bypassing the buffer pool entirely for large scans). But these are band-aids, not solutions.
Full table scans don't happen randomly—they're triggered by specific conditions that the query optimizer evaluates. Understanding these conditions is essential for prevention.
The Query Optimizer's Decision:
When you submit a query, the optimizer generates multiple potential execution plans and estimates the cost of each. It chooses the plan with the lowest estimated cost. A full table scan is selected when:
Let's examine each scenario in detail:
UPPER(email) = 'JOHN@EXAMPLE.COM') prevents index usage. (Covered in detail in Page 3.)LIKE '%smith%' cannot use standard B-tree indexes because the search isn't prefix-based.123456789101112131415161718192021222324252627
-- Scenario 1: Missing Index (most common)SELECT * FROM orders WHERE customer_name = 'Acme Corp';-- No index on customer_name → full table scan -- Scenario 2: Low Selectivity (index exists but ignored)SELECT * FROM users WHERE is_active = 1;-- If 90% of users are active, scanning is cheaper than -- reading 90% of index entries + row lookups -- Scenario 3: Leading WildcardSELECT * FROM products WHERE description LIKE '%wireless%';-- Cannot seek in B-tree; must scan all values -- Scenario 4: Function on ColumnSELECT * FROM employees WHERE YEAR(hire_date) = 2023;-- Index on hire_date is useless; function evaluated per-row -- Scenario 5: Implicit ConversionSELECT * FROM accounts WHERE account_number = 12345;-- If account_number is VARCHAR, the integer 12345 causes conversion-- String-to-int conversion breaks index usage -- Scenario 6: Complex ORSELECT * FROM orders WHERE customer_id = 100 OR shipping_zip = '90210';-- Unless both columns are indexed AND the optimizer uses Index Merge,-- this may trigger a full scanDatabases typically switch from index seek to table scan when a query retrieves more than 10-30% of rows. This threshold varies by database and is influenced by clustering factor, row size, and index structure. The optimizer uses statistics to estimate selectivity—if statistics are stale, it may make the wrong choice.
Identifying full table scans is the critical first step toward eliminating them. The primary tool for detection is the execution plan (also called query plan or explain plan), which reveals exactly how the database will execute your query.
Reading Execution Plans:
Every major database provides a mechanism to view execution plans. The full table scan appears with different names:
| Database | Full Scan Indicator | Command |
|---|---|---|
| PostgreSQL | Seq Scan | EXPLAIN ANALYZE |
| MySQL | ALL in type column | EXPLAIN |
| SQL Server | Table Scan or Clustered Index Scan | SET SHOWPLAN_TEXT ON or graphical plan |
| Oracle | TABLE ACCESS FULL | EXPLAIN PLAN FOR + DBMS_XPLAN |
Let's examine detection in each major database:
123456789101112131415161718192021222324252627282930
-- PostgreSQL: Detecting full table scans with EXPLAIN ANALYZEEXPLAIN ANALYZESELECT * FROM customers WHERE email = 'test@example.com'; -- Output showing a full table scan (Seq Scan):/*Seq Scan on customers (cost=0.00..1834.00 rows=1 width=256) (actual time=15.432..18.201 rows=1 loops=1) Filter: ((email)::text = 'test@example.com'::text) Rows Removed by Filter: 99999Planning Time: 0.152 msExecution Time: 18.245 ms*/ -- Key indicators:-- 1. "Seq Scan" = Sequential Scan = Full Table Scan-- 2. "Rows Removed by Filter: 99999" — read 100K rows to find 1-- 3. High actual time relative to rows returned -- With an index, the same query shows:/*Index Scan using idx_customers_email on customers (cost=0.42..8.44 rows=1 width=256) (actual time=0.028..0.029 rows=1 loops=1) Index Cond: ((email)::text = 'test@example.com'::text)Planning Time: 0.185 msExecution Time: 0.052 ms*/ -- Performance difference: 18ms vs 0.05ms (360x faster)In SQL Server and some other databases, a 'Clustered Index Scan' is functionally equivalent to a table scan—it reads every row in the table through the clustered index structure. Don't be fooled by the word 'Index'; if you see 'SCAN' (not 'SEEK'), it's reading the entire object.
The impact of full table scans extends far beyond the individual query. Understanding their full cost helps prioritize optimization efforts.
Direct Costs:
| Resource | Impact | Scaling Behavior |
|---|---|---|
| Disk I/O | Reads entire table from storage | Linear with table size—10x data = 10x I/O |
| CPU | Evaluates predicate against every row | Linear with row count |
| Memory (Buffer Pool) | Loads all pages into cache, displacing useful data | Pollutes cache with single-use pages |
| Query Duration | Proportional to table size, not result size | 10x data = ~10x slower, regardless of selectivity |
| Network | If scanning on remote storage (cloud/SAN) | Bandwidth and latency amplified |
Indirect/Systemic Costs:
The damage extends beyond the offending query:
Lock Contention — Long-running scans may hold locks that block other queries. In SQL Server, a scan in serializable isolation can escalate to table-level locks.
Buffer Pool Pollution — Scanned pages evict frequently-used pages from cache. After a large scan completes, other queries experience cache misses, degrading overall system performance.
Resource Starvation — CPU and I/O consumed by scans are unavailable for other workloads. A single bad query can degrade an entire database server.
Query Queue Delays — When scans consume resources, subsequent queries wait longer to execute. This cascades into application-level timeouts and user-perceived slowness.
Unpredictable Performance — Scans that were 'fast enough' at launch become unacceptable as data grows. This creates hidden technical debt that manifests as production incidents.
Full table scans often go unnoticed in development—test databases are small. The query takes 50ms on 1,000 rows. But with 5 million production rows, that same query takes 2 minutes. This is why performance testing with production-scale data is essential.
Quantifying the Cost (Real-World Example):
Consider a production e-commerce system:
| Metric | With Full Scan | With Index |
|---|---|---|
| Table Size | 20 million orders | 20 million orders |
| Storage | ~40 GB | ~40 GB |
| Query Time | 45 seconds | 3 milliseconds |
| I/O Operations | ~5 million page reads | 4 page reads |
| CPU Time | 12 seconds | < 1 millisecond |
| Memory Impact | Evicts 2GB from buffer pool | 32 KB cached |
| Concurrent Capacity | 2 queries/minute | 10,000 queries/second |
The same hardware can serve 300,000x more queries per second when full table scans are eliminated. This isn't optimization—it's the difference between a system that works and one that doesn't.
Once detected, eliminating full table scans typically involves one of several strategies. The right approach depends on why the scan is occurring.
Strategy 1: Create an Appropriate Index
The most common solution. If a query filters, joins, or orders on a column (or set of columns) that lacks an index, create one:
123456789101112131415161718192021
-- Problem: Full table scan on email lookupSELECT * FROM customers WHERE email = 'test@example.com'; -- Solution: Create an index on the email columnCREATE INDEX idx_customers_email ON customers(email); -- Now the query uses an index seek instead of a table scan. -- For composite predicates, create a composite index:SELECT * FROM orders WHERE customer_id = 100 AND status = 'pending'; -- Create composite index (order matters for leftmost prefix rule)CREATE INDEX idx_orders_customer_status ON orders(customer_id, status); -- For ORDER BY or range queries, ensure index supports the access pattern:SELECT * FROM transactions WHERE created_at >= '2024-01-01' ORDER BY created_at; CREATE INDEX idx_transactions_created ON transactions(created_at);Strategy 2: Rewrite the Query to Enable Index Usage
Some queries are written in ways that prevent index usage even when an index exists. Rewriting can unlock index access:
12345678910111213141516171819202122232425262728293031
-- Problem: Function on column prevents index useSELECT * FROM employees WHERE YEAR(hire_date) = 2023; -- Solution: Rewrite as range predicate (index-friendly)SELECT * FROM employees WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01'; -- Problem: Leading wildcard prevents index useSELECT * FROM products WHERE name LIKE '%wireless%'; -- Solution: Use full-text search indexCREATE FULLTEXT INDEX idx_products_name ON products(name);SELECT * FROM products WHERE MATCH(name) AGAINST('wireless'); -- Alternative: Pre-compute searchable tokensALTER TABLE products ADD COLUMN search_tokens TEXT;-- Populate with keywords, create index on search_tokens -- Problem: OR across different columnsSELECT * FROM orders WHERE customer_id = 100 OR shipping_zip = '90210'; -- Solution 1: Split into UNION (each uses its own index)SELECT * FROM orders WHERE customer_id = 100UNIONSELECT * FROM orders WHERE shipping_zip = '90210'; -- Solution 2: Create indexes on both columnsCREATE INDEX idx_orders_customer ON orders(customer_id);CREATE INDEX idx_orders_zip ON orders(shipping_zip);-- Optimizer may use Index Merge strategyStrategy 3: Use Covering Indexes
A covering index contains all columns needed by the query. The database can satisfy the query entirely from the index without accessing the table (no bookmark lookup or heap fetch):
123456789101112131415161718192021
-- Query retrieving specific columnsSELECT customer_id, email, status FROM customers WHERE email = 'test@example.com'; -- Standard index (requires table lookup for other columns)CREATE INDEX idx_customers_email ON customers(email);-- Process: Index seek on email → lookup row for customer_id, status -- Covering index (includes all needed columns)CREATE INDEX idx_customers_email_covering ON customers(email) INCLUDE (customer_id, status);-- Process: Index seek on email → return columns directly from index-- No table access required: faster and less I/O -- In some databases (MySQL, PostgreSQL), you can include columns:CREATE INDEX idx_customers_email_covered ON customers(email, customer_id, status);-- Or use INCLUDE syntax (SQL Server, PostgreSQL 11+):CREATE INDEX idx_customers_email_inc ON customers(email) INCLUDE (customer_id, status);Strategy 4: Partition Large Tables
For very large tables, partitioning divides data into smaller physical segments. Queries that filter on the partition key access only relevant partitions (partition pruning):
1234567891011121314151617181920212223
-- Create partitioned table (PostgreSQL example)CREATE TABLE orders ( order_id BIGINT, customer_id INT, order_date DATE, total DECIMAL(10,2)) PARTITION BY RANGE (order_date); -- Create partitions by yearCREATE TABLE orders_2022 PARTITION OF orders FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');CREATE TABLE orders_2023 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');CREATE TABLE orders_2024 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); -- Query with partition key in predicateSELECT * FROM orders WHERE order_date >= '2024-01-01';-- Only scans orders_2024 partition, not entire table -- Without partitioning: 100M rows scanned-- With partitioning: 2M rows scanned (only current year)-- 50x reduction in scan scopeWhile full table scans are generally harmful, there are legitimate scenarios where they're the optimal choice. Understanding these exceptions prevents over-indexing and misguided optimization efforts.
Legitimate Use Cases:
123456789101112131415161718192021222324252627
-- Example 1: Very small lookup table (100 rows)SELECT * FROM country_codes WHERE code = 'US';-- Table has 100 rows, fits in 1 page. Index overhead isn't worth it.-- Full scan reads 1 page; index would read 2+ pages (index + table). -- Example 2: High selectivity (returning most rows)SELECT * FROM users WHERE is_verified = 1;-- If 80% of users are verified, index scan would require:-- 800,000 index lookups + 800,000 random table accesses-- Full scan reads table sequentially: faster for high-percentage access. -- Example 3: Aggregate over entire tableSELECT AVG(order_total), COUNT(*) FROM orders;-- Must read every row to compute average. No index helps.-- Full scan is the only sensible strategy. -- Example 4: Reporting query on entire datasetSELECT DATE_TRUNC('month', order_date) as month, SUM(total) as monthly_revenueFROM ordersGROUP BY DATE_TRUNC('month', order_date);-- Aggregating all orders by month—full scan is appropriate. -- Example 5: Bulk data exportCOPY customers TO '/backup/customers.csv' WITH CSV HEADER;-- Exporting all rows; full scan is the only option.The crossover point where scans beat indexes varies by database, hardware, and table characteristics. Generally, queries returning <5% of rows should use indexes; >30% often warrant scans. The 5-30% range is a gray zone where the optimizer's statistics-based decision should be respected.
Forcing the Optimizer's Hand:
In rare cases, you may know something the optimizer doesn't (stale statistics, unusual data patterns). Most databases offer hints to override optimizer decisions:
-- PostgreSQL: Disable sequential scans to force index use
SET enable_seqscan = off;
SELECT * FROM customers WHERE status = 'active';
SET enable_seqscan = on;
-- MySQL: Use index hint
SELECT * FROM customers FORCE INDEX (idx_status) WHERE status = 'active';
-- SQL Server: Table hint
SELECT * FROM customers WITH (INDEX(idx_status)) WHERE status = 'active';
-- Oracle: Optimizer hint
SELECT /*+ INDEX(customers idx_status) */ * FROM customers WHERE status = 'active';
Warning: Hints are a last resort. They bypass the optimizer's intelligence and create maintenance burden. Always prefer fixing statistics, rewriting queries, or adjusting indexes before using hints.
Eliminating existing full table scans is only half the battle. You need systems to prevent new ones from reaching production and to detect regressions.
Proactive Strategies:
1234567891011121314151617181920212223242526272829303132333435363738394041
-- PostgreSQL: Find slow queries with full scansSELECT query, calls, mean_exec_time, total_exec_timeFROM pg_stat_statementsWHERE query LIKE '%Seq Scan%' -- This is illustrative; actual detection requires plan analysisORDER BY total_exec_time DESCLIMIT 20; -- PostgreSQL: Tables without statistics recently updatedSELECT schemaname, relname, last_analyze, last_autoanalyzeFROM pg_stat_user_tablesWHERE last_analyze IS NULL OR last_analyze < NOW() - INTERVAL '7 days'ORDER BY n_live_tup DESC; -- SQL Server: Find missing indexes suggested by optimizerSELECT mig.index_group_handle, mid.statement AS table_name, mid.equality_columns, mid.inequality_columns, mid.included_columns, migs.user_seeks, migs.avg_total_user_cost * migs.avg_user_impact AS improvement_measureFROM sys.dm_db_missing_index_groups migJOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handleJOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handleORDER BY improvement_measure DESC; -- MySQL: Find queries doing full scans from slow query log-- Enable: SET GLOBAL slow_query_log = 'ON';-- SET GLOBAL log_queries_not_using_indexes = 'ON';Modern observability tools (Datadog, New Relic, AWS Performance Insights, pganalyze) can automatically detect and alert on queries showing full table scans. Integrate these into your monitoring stack to catch issues before users do.
Full table scans are the most common and impactful performance issue in database systems. Understanding them thoroughly is essential for any engineer working with data at scale.
You now understand full table scans—what they are, why they happen, how to detect them, and how to eliminate them. Next, we'll examine implicit type conversions: subtle mismatches that silently disable indexes and cause unexpected scans.