Database Management SystemsCommon Performance Issues

Common SQL Performance Issues

LevelAdvanced

Duration90 mins

TopicCommon Performance Issues

1 / 5

Full Table Scans

The Silent Performance Killer

In the world of database performance, few phenomena are as simultaneously simple to understand and devastating in impact as the full table scan. Also known as a sequential scan or table scan, this operation reads every single row in a table to find the data you need—regardless of whether you're looking for one row or a million.

Imagine searching for a single book in a library by walking through every aisle, examining every shelf, and reading the title of every book—even when you know the exact title you need. That's a full table scan. A librarian who knows the Dewey Decimal System (an index) finds your book in seconds. Without that organizational system, finding a single book in a million-volume library could take days.

Full table scans are the #1 cause of slow queries in production databases. They're often invisible—queries work perfectly during development with 1,000 rows but collapse under 10 million rows in production. Understanding when they occur, why they're harmful, and how to eliminate them is foundational to writing performant SQL.

What You Will Learn

By the end of this page, you will understand exactly what happens during a full table scan, recognize the conditions that trigger them, diagnose them using execution plans, and apply proven strategies to eliminate or mitigate their impact. You'll also understand the rare situations where full table scans are actually the optimal choice.

Anatomy of a Full Table Scan

To truly understand full table scans, we must first understand how data is physically stored in database systems and what happens mechanically when the database engine executes a scan.

Physical Storage Model:

Relational databases store data in fixed-size units called pages (or blocks). The typical page size is 8KB (SQL Server, PostgreSQL) or 16KB (Oracle, MySQL InnoDB). Each page contains:

A page header with metadata
Multiple data rows (as many as fit within the page size)
Free space for future insertions
Row offsets for locating records within the page

Pages are organized into extents (groups of 8 contiguous pages in SQL Server, for example), and extents belong to segments or tablespaces. When you create a table, the database allocates extents as needed to store your data.

The Scan Mechanism:

During a full table scan, the database engine:

Identifies the first extent allocated to the table
Reads the first page in that extent into memory (the buffer pool)
Examines every row on that page, evaluating the WHERE clause predicate against each
Moves to the next page and repeats
Continues through all extents until every page has been read
Returns rows that matched the predicate

full_table_scan_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- This query triggers a full table scan if no index exists on email
SELECT customer_id, first_name, last_name, email
FROM customers
WHERE email = 'john.doe@example.com';
 
-- What happens internally:
-- 1. Database identifies all extents belonging to 'customers' table
-- 2. For each page in each extent:
--    a. Load page into buffer pool (I/O operation)
--    b. For each row on the page:
--       - Read the email column value
--       - Compare to 'john.doe@example.com'
--       - If match, add to result set
-- 3. Continue until ALL pages have been processed
-- 4. Return matching row(s)
 
-- With 10 million customers stored across 125,000 pages:
-- The database must read ALL 125,000 pages to find ONE email.

The I/O Amplification Problem

The devastating inefficiency of full table scans comes from I/O amplification. To find a single customer by email in a 10-million-row table, the database may need to read ~1GB of data from disk. With an index, it might read only 3-4 pages (~32KB)—a 30,000x improvement. At scale, this difference determines whether queries take milliseconds or minutes.

Buffer Pool Considerations:

When a page is read from disk, it's stored in the buffer pool (or buffer cache)—a region of memory that caches frequently accessed pages. If a page is already in the buffer pool, no disk I/O is required.

However, full table scans are particularly harmful to the buffer pool:

They can flush frequently-used pages from the cache to make room for scan pages
The scanned pages are often used once and never again, wasting valuable cache space
This phenomenon is called buffer pool pollution

Some databases mitigate this with small table scans (keeping tiny tables in cache) or direct path reads (bypassing the buffer pool entirely for large scans). But these are band-aids, not solutions.

When Full Table Scans Occur

Full table scans don't happen randomly—they're triggered by specific conditions that the query optimizer evaluates. Understanding these conditions is essential for prevention.

The Query Optimizer's Decision:

When you submit a query, the optimizer generates multiple potential execution plans and estimates the cost of each. It chooses the plan with the lowest estimated cost. A full table scan is selected when:

No suitable index exists for the query's predicates
An index exists but is estimated to be less efficient than a scan
The query explicitly prevents index usage through functions, type mismatches, or other anti-patterns

Let's examine each scenario in detail:

Primary Causes of Full Table Scans

•Missing Index — The most common cause. No index covers the columns referenced in WHERE, JOIN, or ORDER BY clauses. The optimizer has no choice but to scan.
•Low Selectivity — The query returns a high percentage of rows (e.g., >10-20%). Even with an index, the optimizer determines that a sequential scan is faster than thousands of index lookups.
•Statistics Staleness — Table statistics are outdated, causing the optimizer to make poor estimates about data distribution and row counts.
•Function on Indexed Column — Applying a function to an indexed column (e.g., UPPER(email) = 'JOHN@EXAMPLE.COM') prevents index usage. (Covered in detail in Page 3.)
•Implicit Type Conversion — Column and value types don't match, forcing conversion that breaks index eligibility. (Covered in detail in Page 2.)
•Leading Wildcard in LIKE — Patterns like LIKE '%smith%' cannot use standard B-tree indexes because the search isn't prefix-based.
•OR Conditions Spanning Multiple Columns — Complex OR predicates may not have index support, especially when columns aren't in the same composite index.
•NOT IN / NOT EXISTS with Large Sets — Anti-joins often require full scanning when the exclusion set is large.
•Table Too Small — Very small tables (fitting in a few pages) are faster to scan than to access via index.

full_scan_triggers.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Scenario 1: Missing Index (most common)
SELECT * FROM orders WHERE customer_name = 'Acme Corp';
-- No index on customer_name → full table scan
 
-- Scenario 2: Low Selectivity (index exists but ignored)
SELECT * FROM users WHERE is_active = 1;
-- If 90% of users are active, scanning is cheaper than 
-- reading 90% of index entries + row lookups
 
-- Scenario 3: Leading Wildcard
SELECT * FROM products WHERE description LIKE '%wireless%';
-- Cannot seek in B-tree; must scan all values
 
-- Scenario 4: Function on Column
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
-- Index on hire_date is useless; function evaluated per-row
 
-- Scenario 5: Implicit Conversion
SELECT * FROM accounts WHERE account_number = 12345;
-- If account_number is VARCHAR, the integer 12345 causes conversion
-- String-to-int conversion breaks index usage
 
-- Scenario 6: Complex OR
SELECT * FROM orders 
WHERE customer_id = 100 OR shipping_zip = '90210';
-- Unless both columns are indexed AND the optimizer uses Index Merge,
-- this may trigger a full scan

The Selectivity Threshold

Databases typically switch from index seek to table scan when a query retrieves more than 10-30% of rows. This threshold varies by database and is influenced by clustering factor, row size, and index structure. The optimizer uses statistics to estimate selectivity—if statistics are stale, it may make the wrong choice.

Detecting Full Table Scans

Identifying full table scans is the critical first step toward eliminating them. The primary tool for detection is the execution plan (also called query plan or explain plan), which reveals exactly how the database will execute your query.

Reading Execution Plans:

Every major database provides a mechanism to view execution plans. The full table scan appears with different names:

Database	Full Scan Indicator	Command
PostgreSQL	`Seq Scan`	`EXPLAIN ANALYZE`
MySQL	`ALL` in type column	`EXPLAIN`
SQL Server	`Table Scan` or `Clustered Index Scan`	`SET SHOWPLAN_TEXT ON` or graphical plan
Oracle	`TABLE ACCESS FULL`	`EXPLAIN PLAN FOR` + `DBMS_XPLAN`

Let's examine detection in each major database:

postgresql_detect_scan.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- PostgreSQL: Detecting full table scans with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM customers WHERE email = 'test@example.com';
 
-- Output showing a full table scan (Seq Scan):
/*
Seq Scan on customers  (cost=0.00..1834.00 rows=1 width=256)
                       (actual time=15.432..18.201 rows=1 loops=1)
  Filter: ((email)::text = 'test@example.com'::text)
  Rows Removed by Filter: 99999
Planning Time: 0.152 ms
Execution Time: 18.245 ms
*/
 
-- Key indicators:
-- 1. "Seq Scan" = Sequential Scan = Full Table Scan
-- 2. "Rows Removed by Filter: 99999" — read 100K rows to find 1
-- 3. High actual time relative to rows returned
 
-- With an index, the same query shows:
/*
Index Scan using idx_customers_email on customers
  (cost=0.42..8.44 rows=1 width=256)
  (actual time=0.028..0.029 rows=1 loops=1)
  Index Cond: ((email)::text = 'test@example.com'::text)
Planning Time: 0.185 ms
Execution Time: 0.052 ms
*/
 
-- Performance difference: 18ms vs 0.05ms (360x faster)

Clustered Index Scan vs. Table Scan

In SQL Server and some other databases, a 'Clustered Index Scan' is functionally equivalent to a table scan—it reads every row in the table through the clustered index structure. Don't be fooled by the word 'Index'; if you see 'SCAN' (not 'SEEK'), it's reading the entire object.

The True Cost of Full Table Scans

The impact of full table scans extends far beyond the individual query. Understanding their full cost helps prioritize optimization efforts.

Direct Costs:

Direct Performance Impact of Full Table Scans
Resource	Impact	Scaling Behavior
Disk I/O	Reads entire table from storage	Linear with table size—10x data = 10x I/O
CPU	Evaluates predicate against every row	Linear with row count
Memory (Buffer Pool)	Loads all pages into cache, displacing useful data	Pollutes cache with single-use pages
Query Duration	Proportional to table size, not result size	10x data = ~10x slower, regardless of selectivity
Network	If scanning on remote storage (cloud/SAN)	Bandwidth and latency amplified

Indirect/Systemic Costs:

The damage extends beyond the offending query:

Lock Contention — Long-running scans may hold locks that block other queries. In SQL Server, a scan in serializable isolation can escalate to table-level locks.
Buffer Pool Pollution — Scanned pages evict frequently-used pages from cache. After a large scan completes, other queries experience cache misses, degrading overall system performance.
Resource Starvation — CPU and I/O consumed by scans are unavailable for other workloads. A single bad query can degrade an entire database server.
Query Queue Delays — When scans consume resources, subsequent queries wait longer to execute. This cascades into application-level timeouts and user-perceived slowness.
Unpredictable Performance — Scans that were 'fast enough' at launch become unacceptable as data grows. This creates hidden technical debt that manifests as production incidents.

The Production Time Bomb

Full table scans often go unnoticed in development—test databases are small. The query takes 50ms on 1,000 rows. But with 5 million production rows, that same query takes 2 minutes. This is why performance testing with production-scale data is essential.

Quantifying the Cost (Real-World Example):

Consider a production e-commerce system:

Metric	With Full Scan	With Index
Table Size	20 million orders	20 million orders
Storage	~40 GB	~40 GB
Query Time	45 seconds	3 milliseconds
I/O Operations	~5 million page reads	4 page reads
CPU Time	12 seconds	< 1 millisecond
Memory Impact	Evicts 2GB from buffer pool	32 KB cached
Concurrent Capacity	2 queries/minute	10,000 queries/second

The same hardware can serve 300,000x more queries per second when full table scans are eliminated. This isn't optimization—it's the difference between a system that works and one that doesn't.

Eliminating Full Table Scans

Once detected, eliminating full table scans typically involves one of several strategies. The right approach depends on why the scan is occurring.

Strategy 1: Create an Appropriate Index

The most common solution. If a query filters, joins, or orders on a column (or set of columns) that lacks an index, create one:

create_index_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Problem: Full table scan on email lookup
SELECT * FROM customers WHERE email = 'test@example.com';
 
-- Solution: Create an index on the email column
CREATE INDEX idx_customers_email ON customers(email);
 
-- Now the query uses an index seek instead of a table scan.
 
-- For composite predicates, create a composite index:
SELECT * FROM orders 
WHERE customer_id = 100 AND status = 'pending';
 
-- Create composite index (order matters for leftmost prefix rule)
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
 
-- For ORDER BY or range queries, ensure index supports the access pattern:
SELECT * FROM transactions 
WHERE created_at >= '2024-01-01' 
ORDER BY created_at;
 
CREATE INDEX idx_transactions_created ON transactions(created_at);

Strategy 2: Rewrite the Query to Enable Index Usage

Some queries are written in ways that prevent index usage even when an index exists. Rewriting can unlock index access:

query_rewrite_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Problem: Function on column prevents index use
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
 
-- Solution: Rewrite as range predicate (index-friendly)
SELECT * FROM employees 
WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01';
 
-- Problem: Leading wildcard prevents index use
SELECT * FROM products WHERE name LIKE '%wireless%';
 
-- Solution: Use full-text search index
CREATE FULLTEXT INDEX idx_products_name ON products(name);
SELECT * FROM products WHERE MATCH(name) AGAINST('wireless');
 
-- Alternative: Pre-compute searchable tokens
ALTER TABLE products ADD COLUMN search_tokens TEXT;
-- Populate with keywords, create index on search_tokens
 
-- Problem: OR across different columns
SELECT * FROM orders 
WHERE customer_id = 100 OR shipping_zip = '90210';
 
-- Solution 1: Split into UNION (each uses its own index)
SELECT * FROM orders WHERE customer_id = 100
UNION
SELECT * FROM orders WHERE shipping_zip = '90210';
 
-- Solution 2: Create indexes on both columns
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_zip ON orders(shipping_zip);
-- Optimizer may use Index Merge strategy

Strategy 3: Use Covering Indexes

A covering index contains all columns needed by the query. The database can satisfy the query entirely from the index without accessing the table (no bookmark lookup or heap fetch):

covering_index_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Query retrieving specific columns
SELECT customer_id, email, status 
FROM customers 
WHERE email = 'test@example.com';
 
-- Standard index (requires table lookup for other columns)
CREATE INDEX idx_customers_email ON customers(email);
-- Process: Index seek on email → lookup row for customer_id, status
 
-- Covering index (includes all needed columns)
CREATE INDEX idx_customers_email_covering 
ON customers(email) INCLUDE (customer_id, status);
-- Process: Index seek on email → return columns directly from index
-- No table access required: faster and less I/O
 
-- In some databases (MySQL, PostgreSQL), you can include columns:
CREATE INDEX idx_customers_email_covered 
ON customers(email, customer_id, status);
-- Or use INCLUDE syntax (SQL Server, PostgreSQL 11+):
CREATE INDEX idx_customers_email_inc 
ON customers(email) INCLUDE (customer_id, status);

Strategy 4: Partition Large Tables

For very large tables, partitioning divides data into smaller physical segments. Queries that filter on the partition key access only relevant partitions (partition pruning):

partitioning_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Create partitioned table (PostgreSQL example)
CREATE TABLE orders (
    order_id BIGINT,
    customer_id INT,
    order_date DATE,
    total DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
 
-- Create partitions by year
CREATE TABLE orders_2022 PARTITION OF orders
    FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');
CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
 
-- Query with partition key in predicate
SELECT * FROM orders WHERE order_date >= '2024-01-01';
-- Only scans orders_2024 partition, not entire table
 
-- Without partitioning: 100M rows scanned
-- With partitioning: 2M rows scanned (only current year)
-- 50x reduction in scan scope

Index Strategy Guidelines

Index columns that appear in WHERE, JOIN, and ORDER BY clauses. 2. For composite indexes, put equality predicates first, then ranges. 3. Consider covering indexes for frequently-run queries. 4. Monitor index usage—unused indexes consume resources during writes. 5. Balance read performance against write overhead; don't over-index.

When Full Table Scans Are Acceptable

While full table scans are generally harmful, there are legitimate scenarios where they're the optimal choice. Understanding these exceptions prevents over-indexing and misguided optimization efforts.

Legitimate Use Cases:

When Scans Are Actually Optimal

•Very Small Tables — If a table fits in 2-3 pages (a few hundred rows), scanning is faster than index traversal. The overhead of index navigation exceeds the cost of reading a few pages.
•High Selectivity Queries — When a query returns 30-50%+ of all rows, an index actually slows down the query. Each index entry requires a separate random I/O to fetch the row. Sequential reading is faster than thousands of random lookups.
•Analytic/Reporting Workloads — Aggregate queries over entire datasets (SUM, AVG over all rows) must read all data by definition. No index helps when you need every value.
•Data Warehouse Full-Table Operations — ETL processes that load, transform, or export entire tables legitimately scan everything. These are batch operations designed for full access.
•One-Time or Rare Queries — Ad-hoc analysis queries that run once don't justify index creation. The overhead of building and maintaining an index exceeds the cost of occasional scans.
•Data Migration and Bulk Operations — Moving data between tables or systems inherently requires reading all rows. Partitioned operations may help but full access is still needed.

legitimate_scan_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Example 1: Very small lookup table (100 rows)
SELECT * FROM country_codes WHERE code = 'US';
-- Table has 100 rows, fits in 1 page. Index overhead isn't worth it.
-- Full scan reads 1 page; index would read 2+ pages (index + table).
 
-- Example 2: High selectivity (returning most rows)
SELECT * FROM users WHERE is_verified = 1;
-- If 80% of users are verified, index scan would require:
-- 800,000 index lookups + 800,000 random table accesses
-- Full scan reads table sequentially: faster for high-percentage access.
 
-- Example 3: Aggregate over entire table
SELECT AVG(order_total), COUNT(*) FROM orders;
-- Must read every row to compute average. No index helps.
-- Full scan is the only sensible strategy.
 
-- Example 4: Reporting query on entire dataset
SELECT 
    DATE_TRUNC('month', order_date) as month,
    SUM(total) as monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date);
-- Aggregating all orders by month—full scan is appropriate.
 
-- Example 5: Bulk data export
COPY customers TO '/backup/customers.csv' WITH CSV HEADER;
-- Exporting all rows; full scan is the only option.

The Selectivity Judgment Call

The crossover point where scans beat indexes varies by database, hardware, and table characteristics. Generally, queries returning <5% of rows should use indexes; >30% often warrant scans. The 5-30% range is a gray zone where the optimizer's statistics-based decision should be respected.

Forcing the Optimizer's Hand:

In rare cases, you may know something the optimizer doesn't (stale statistics, unusual data patterns). Most databases offer hints to override optimizer decisions:

-- PostgreSQL: Disable sequential scans to force index use
SET enable_seqscan = off;
SELECT * FROM customers WHERE status = 'active';
SET enable_seqscan = on;

-- MySQL: Use index hint
SELECT * FROM customers FORCE INDEX (idx_status) WHERE status = 'active';

-- SQL Server: Table hint
SELECT * FROM customers WITH (INDEX(idx_status)) WHERE status = 'active';

-- Oracle: Optimizer hint
SELECT /*+ INDEX(customers idx_status) */ * FROM customers WHERE status = 'active';

Warning: Hints are a last resort. They bypass the optimizer's intelligence and create maintenance burden. Always prefer fixing statistics, rewriting queries, or adjusting indexes before using hints.

Monitoring and Prevention

Eliminating existing full table scans is only half the battle. You need systems to prevent new ones from reaching production and to detect regressions.

Proactive Strategies:

Scan Prevention Practices

•Mandatory EXPLAIN in Code Reviews — Require execution plan analysis for any new or modified query before merging. Make 'no table scans' a code review criterion.
•CI/CD Query Analysis — Integrate query plan analysis into your build pipeline. Tools like pg_stat_statements, MySQL Performance Schema, or third-party analyzers can flag problematic patterns.
•Production-Scale Test Data — Test with datasets matching production scale. Catch scans that appear only at volume.
•Alerting on Query Duration — Set thresholds for query execution time. Scans often manifest as duration anomalies.
•Regular Statistics Maintenance — Schedule statistics updates (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server) to keep the optimizer informed.
•Index Usage Monitoring — Track which indexes are used and which aren't. Missing index advisors exist in most databases.

monitoring_queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- PostgreSQL: Find slow queries with full scans
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%Seq Scan%'  -- This is illustrative; actual detection requires plan analysis
ORDER BY total_exec_time DESC
LIMIT 20;
 
-- PostgreSQL: Tables without statistics recently updated
SELECT 
    schemaname,
    relname,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE last_analyze IS NULL 
   OR last_analyze < NOW() - INTERVAL '7 days'
ORDER BY n_live_tup DESC;
 
-- SQL Server: Find missing indexes suggested by optimizer
SELECT 
    mig.index_group_handle,
    mid.statement AS table_name,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns,
    migs.user_seeks,
    migs.avg_total_user_cost * migs.avg_user_impact AS improvement_measure
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs 
    ON migs.group_handle = mig.index_group_handle
JOIN sys.dm_db_missing_index_details mid 
    ON mig.index_handle = mid.index_handle
ORDER BY improvement_measure DESC;
 
-- MySQL: Find queries doing full scans from slow query log
-- Enable: SET GLOBAL slow_query_log = 'ON';
-- SET GLOBAL log_queries_not_using_indexes = 'ON';

Automate Detection

Modern observability tools (Datadog, New Relic, AWS Performance Insights, pganalyze) can automatically detect and alert on queries showing full table scans. Integrate these into your monitoring stack to catch issues before users do.

Summary: Full Table Scans

Full table scans are the most common and impactful performance issue in database systems. Understanding them thoroughly is essential for any engineer working with data at scale.

Key Takeaways

•A full table scan reads every page of a table — Performance degrades linearly with table size, regardless of how many rows match your query.
•Scans occur when indexes are missing, ignored, or unusable — Missing indexes, low selectivity, stale statistics, functions on columns, and type mismatches are primary causes.
•Detection requires execution plan analysis — Look for 'Seq Scan', 'Table Scan', 'Clustered Index Scan', or 'TABLE ACCESS FULL' depending on your database.
•The cost extends beyond query duration — Buffer pool pollution, lock contention, and resource starvation affect the entire system.
•Solutions include indexing, query rewriting, and partitioning — Choose based on root cause. Covering indexes eliminate table lookups entirely.
•Some scans are legitimate — Small tables, high-selectivity queries, and analytic workloads may warrant full scans.
•Prevention requires process — Embed plan analysis in code review, test with production-scale data, and monitor query performance proactively.

Page Complete

You now understand full table scans—what they are, why they happen, how to detect them, and how to eliminate them. Next, we'll examine implicit type conversions: subtle mismatches that silently disable indexes and cause unexpected scans.

1 / 5

Loading learning content...

Database Management SystemsCommon Performance Issues

Common SQL Performance Issues

LevelAdvanced

Duration90 mins

TopicCommon Performance Issues

1 / 5

Full Table Scans

The Silent Performance Killer

What You Will Learn

Anatomy of a Full Table Scan

To truly understand full table scans, we must first understand how data is physically stored in database systems and what happens mechanically when the database engine executes a scan.

Physical Storage Model:

Relational databases store data in fixed-size units called pages (or blocks). The typical page size is 8KB (SQL Server, PostgreSQL) or 16KB (Oracle, MySQL InnoDB). Each page contains:

A page header with metadata
Multiple data rows (as many as fit within the page size)
Free space for future insertions
Row offsets for locating records within the page

The Scan Mechanism:

During a full table scan, the database engine:

Identifies the first extent allocated to the table
Reads the first page in that extent into memory (the buffer pool)
Examines every row on that page, evaluating the WHERE clause predicate against each
Moves to the next page and repeats
Continues through all extents until every page has been read
Returns rows that matched the predicate

full_table_scan_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- This query triggers a full table scan if no index exists on email
SELECT customer_id, first_name, last_name, email
FROM customers
WHERE email = 'john.doe@example.com';
 
-- What happens internally:
-- 1. Database identifies all extents belonging to 'customers' table
-- 2. For each page in each extent:
--    a. Load page into buffer pool (I/O operation)
--    b. For each row on the page:
--       - Read the email column value
--       - Compare to 'john.doe@example.com'
--       - If match, add to result set
-- 3. Continue until ALL pages have been processed
-- 4. Return matching row(s)
 
-- With 10 million customers stored across 125,000 pages:
-- The database must read ALL 125,000 pages to find ONE email.

The I/O Amplification Problem

Buffer Pool Considerations:

However, full table scans are particularly harmful to the buffer pool:

They can flush frequently-used pages from the cache to make room for scan pages
The scanned pages are often used once and never again, wasting valuable cache space
This phenomenon is called buffer pool pollution

When Full Table Scans Occur

Full table scans don't happen randomly—they're triggered by specific conditions that the query optimizer evaluates. Understanding these conditions is essential for prevention.

The Query Optimizer's Decision:

No suitable index exists for the query's predicates
An index exists but is estimated to be less efficient than a scan
The query explicitly prevents index usage through functions, type mismatches, or other anti-patterns

Let's examine each scenario in detail:

Primary Causes of Full Table Scans

•Missing Index — The most common cause. No index covers the columns referenced in WHERE, JOIN, or ORDER BY clauses. The optimizer has no choice but to scan.
•Low Selectivity — The query returns a high percentage of rows (e.g., >10-20%). Even with an index, the optimizer determines that a sequential scan is faster than thousands of index lookups.
•Statistics Staleness — Table statistics are outdated, causing the optimizer to make poor estimates about data distribution and row counts.
•Function on Indexed Column — Applying a function to an indexed column (e.g., UPPER(email) = 'JOHN@EXAMPLE.COM') prevents index usage. (Covered in detail in Page 3.)
•Implicit Type Conversion — Column and value types don't match, forcing conversion that breaks index eligibility. (Covered in detail in Page 2.)
•Leading Wildcard in LIKE — Patterns like LIKE '%smith%' cannot use standard B-tree indexes because the search isn't prefix-based.
•OR Conditions Spanning Multiple Columns — Complex OR predicates may not have index support, especially when columns aren't in the same composite index.
•NOT IN / NOT EXISTS with Large Sets — Anti-joins often require full scanning when the exclusion set is large.
•Table Too Small — Very small tables (fitting in a few pages) are faster to scan than to access via index.

full_scan_triggers.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Scenario 1: Missing Index (most common)
SELECT * FROM orders WHERE customer_name = 'Acme Corp';
-- No index on customer_name → full table scan
 
-- Scenario 2: Low Selectivity (index exists but ignored)
SELECT * FROM users WHERE is_active = 1;
-- If 90% of users are active, scanning is cheaper than 
-- reading 90% of index entries + row lookups
 
-- Scenario 3: Leading Wildcard
SELECT * FROM products WHERE description LIKE '%wireless%';
-- Cannot seek in B-tree; must scan all values
 
-- Scenario 4: Function on Column
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
-- Index on hire_date is useless; function evaluated per-row
 
-- Scenario 5: Implicit Conversion
SELECT * FROM accounts WHERE account_number = 12345;
-- If account_number is VARCHAR, the integer 12345 causes conversion
-- String-to-int conversion breaks index usage
 
-- Scenario 6: Complex OR
SELECT * FROM orders 
WHERE customer_id = 100 OR shipping_zip = '90210';
-- Unless both columns are indexed AND the optimizer uses Index Merge,
-- this may trigger a full scan

The Selectivity Threshold

Detecting Full Table Scans

Reading Execution Plans:

Every major database provides a mechanism to view execution plans. The full table scan appears with different names:

Database	Full Scan Indicator	Command
PostgreSQL	`Seq Scan`	`EXPLAIN ANALYZE`
MySQL	`ALL` in type column	`EXPLAIN`
SQL Server	`Table Scan` or `Clustered Index Scan`	`SET SHOWPLAN_TEXT ON` or graphical plan
Oracle	`TABLE ACCESS FULL`	`EXPLAIN PLAN FOR` + `DBMS_XPLAN`

Let's examine detection in each major database:

postgresql_detect_scan.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- PostgreSQL: Detecting full table scans with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM customers WHERE email = 'test@example.com';
 
-- Output showing a full table scan (Seq Scan):
/*
Seq Scan on customers  (cost=0.00..1834.00 rows=1 width=256)
                       (actual time=15.432..18.201 rows=1 loops=1)
  Filter: ((email)::text = 'test@example.com'::text)
  Rows Removed by Filter: 99999
Planning Time: 0.152 ms
Execution Time: 18.245 ms
*/
 
-- Key indicators:
-- 1. "Seq Scan" = Sequential Scan = Full Table Scan
-- 2. "Rows Removed by Filter: 99999" — read 100K rows to find 1
-- 3. High actual time relative to rows returned
 
-- With an index, the same query shows:
/*
Index Scan using idx_customers_email on customers
  (cost=0.42..8.44 rows=1 width=256)
  (actual time=0.028..0.029 rows=1 loops=1)
  Index Cond: ((email)::text = 'test@example.com'::text)
Planning Time: 0.185 ms
Execution Time: 0.052 ms
*/
 
-- Performance difference: 18ms vs 0.05ms (360x faster)

Clustered Index Scan vs. Table Scan

The True Cost of Full Table Scans

The impact of full table scans extends far beyond the individual query. Understanding their full cost helps prioritize optimization efforts.

Direct Costs:

Direct Performance Impact of Full Table Scans
Resource	Impact	Scaling Behavior
Disk I/O	Reads entire table from storage	Linear with table size—10x data = 10x I/O
CPU	Evaluates predicate against every row	Linear with row count
Memory (Buffer Pool)	Loads all pages into cache, displacing useful data	Pollutes cache with single-use pages
Query Duration	Proportional to table size, not result size	10x data = ~10x slower, regardless of selectivity
Network	If scanning on remote storage (cloud/SAN)	Bandwidth and latency amplified

Indirect/Systemic Costs:

The damage extends beyond the offending query:

Lock Contention — Long-running scans may hold locks that block other queries. In SQL Server, a scan in serializable isolation can escalate to table-level locks.
Buffer Pool Pollution — Scanned pages evict frequently-used pages from cache. After a large scan completes, other queries experience cache misses, degrading overall system performance.
Resource Starvation — CPU and I/O consumed by scans are unavailable for other workloads. A single bad query can degrade an entire database server.
Query Queue Delays — When scans consume resources, subsequent queries wait longer to execute. This cascades into application-level timeouts and user-perceived slowness.
Unpredictable Performance — Scans that were 'fast enough' at launch become unacceptable as data grows. This creates hidden technical debt that manifests as production incidents.

The Production Time Bomb

Quantifying the Cost (Real-World Example):

Consider a production e-commerce system:

Metric	With Full Scan	With Index
Table Size	20 million orders	20 million orders
Storage	~40 GB	~40 GB
Query Time	45 seconds	3 milliseconds
I/O Operations	~5 million page reads	4 page reads
CPU Time	12 seconds	< 1 millisecond
Memory Impact	Evicts 2GB from buffer pool	32 KB cached
Concurrent Capacity	2 queries/minute	10,000 queries/second

The same hardware can serve 300,000x more queries per second when full table scans are eliminated. This isn't optimization—it's the difference between a system that works and one that doesn't.

Eliminating Full Table Scans

Once detected, eliminating full table scans typically involves one of several strategies. The right approach depends on why the scan is occurring.

Strategy 1: Create an Appropriate Index

The most common solution. If a query filters, joins, or orders on a column (or set of columns) that lacks an index, create one:

create_index_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Problem: Full table scan on email lookup
SELECT * FROM customers WHERE email = 'test@example.com';
 
-- Solution: Create an index on the email column
CREATE INDEX idx_customers_email ON customers(email);
 
-- Now the query uses an index seek instead of a table scan.
 
-- For composite predicates, create a composite index:
SELECT * FROM orders 
WHERE customer_id = 100 AND status = 'pending';
 
-- Create composite index (order matters for leftmost prefix rule)
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
 
-- For ORDER BY or range queries, ensure index supports the access pattern:
SELECT * FROM transactions 
WHERE created_at >= '2024-01-01' 
ORDER BY created_at;
 
CREATE INDEX idx_transactions_created ON transactions(created_at);

Strategy 2: Rewrite the Query to Enable Index Usage

Some queries are written in ways that prevent index usage even when an index exists. Rewriting can unlock index access:

query_rewrite_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Problem: Function on column prevents index use
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
 
-- Solution: Rewrite as range predicate (index-friendly)
SELECT * FROM employees 
WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01';
 
-- Problem: Leading wildcard prevents index use
SELECT * FROM products WHERE name LIKE '%wireless%';
 
-- Solution: Use full-text search index
CREATE FULLTEXT INDEX idx_products_name ON products(name);
SELECT * FROM products WHERE MATCH(name) AGAINST('wireless');
 
-- Alternative: Pre-compute searchable tokens
ALTER TABLE products ADD COLUMN search_tokens TEXT;
-- Populate with keywords, create index on search_tokens
 
-- Problem: OR across different columns
SELECT * FROM orders 
WHERE customer_id = 100 OR shipping_zip = '90210';
 
-- Solution 1: Split into UNION (each uses its own index)
SELECT * FROM orders WHERE customer_id = 100
UNION
SELECT * FROM orders WHERE shipping_zip = '90210';
 
-- Solution 2: Create indexes on both columns
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_zip ON orders(shipping_zip);
-- Optimizer may use Index Merge strategy

Strategy 3: Use Covering Indexes

A covering index contains all columns needed by the query. The database can satisfy the query entirely from the index without accessing the table (no bookmark lookup or heap fetch):

covering_index_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Query retrieving specific columns
SELECT customer_id, email, status 
FROM customers 
WHERE email = 'test@example.com';
 
-- Standard index (requires table lookup for other columns)
CREATE INDEX idx_customers_email ON customers(email);
-- Process: Index seek on email → lookup row for customer_id, status
 
-- Covering index (includes all needed columns)
CREATE INDEX idx_customers_email_covering 
ON customers(email) INCLUDE (customer_id, status);
-- Process: Index seek on email → return columns directly from index
-- No table access required: faster and less I/O
 
-- In some databases (MySQL, PostgreSQL), you can include columns:
CREATE INDEX idx_customers_email_covered 
ON customers(email, customer_id, status);
-- Or use INCLUDE syntax (SQL Server, PostgreSQL 11+):
CREATE INDEX idx_customers_email_inc 
ON customers(email) INCLUDE (customer_id, status);

Strategy 4: Partition Large Tables

For very large tables, partitioning divides data into smaller physical segments. Queries that filter on the partition key access only relevant partitions (partition pruning):

partitioning_solution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Create partitioned table (PostgreSQL example)
CREATE TABLE orders (
    order_id BIGINT,
    customer_id INT,
    order_date DATE,
    total DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
 
-- Create partitions by year
CREATE TABLE orders_2022 PARTITION OF orders
    FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');
CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
 
-- Query with partition key in predicate
SELECT * FROM orders WHERE order_date >= '2024-01-01';
-- Only scans orders_2024 partition, not entire table
 
-- Without partitioning: 100M rows scanned
-- With partitioning: 2M rows scanned (only current year)
-- 50x reduction in scan scope

Index Strategy Guidelines

Index columns that appear in WHERE, JOIN, and ORDER BY clauses. 2. For composite indexes, put equality predicates first, then ranges. 3. Consider covering indexes for frequently-run queries. 4. Monitor index usage—unused indexes consume resources during writes. 5. Balance read performance against write overhead; don't over-index.

When Full Table Scans Are Acceptable

Legitimate Use Cases:

When Scans Are Actually Optimal

•Very Small Tables — If a table fits in 2-3 pages (a few hundred rows), scanning is faster than index traversal. The overhead of index navigation exceeds the cost of reading a few pages.
•High Selectivity Queries — When a query returns 30-50%+ of all rows, an index actually slows down the query. Each index entry requires a separate random I/O to fetch the row. Sequential reading is faster than thousands of random lookups.
•Analytic/Reporting Workloads — Aggregate queries over entire datasets (SUM, AVG over all rows) must read all data by definition. No index helps when you need every value.
•Data Warehouse Full-Table Operations — ETL processes that load, transform, or export entire tables legitimately scan everything. These are batch operations designed for full access.
•One-Time or Rare Queries — Ad-hoc analysis queries that run once don't justify index creation. The overhead of building and maintaining an index exceeds the cost of occasional scans.
•Data Migration and Bulk Operations — Moving data between tables or systems inherently requires reading all rows. Partitioned operations may help but full access is still needed.

legitimate_scan_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Example 1: Very small lookup table (100 rows)
SELECT * FROM country_codes WHERE code = 'US';
-- Table has 100 rows, fits in 1 page. Index overhead isn't worth it.
-- Full scan reads 1 page; index would read 2+ pages (index + table).
 
-- Example 2: High selectivity (returning most rows)
SELECT * FROM users WHERE is_verified = 1;
-- If 80% of users are verified, index scan would require:
-- 800,000 index lookups + 800,000 random table accesses
-- Full scan reads table sequentially: faster for high-percentage access.
 
-- Example 3: Aggregate over entire table
SELECT AVG(order_total), COUNT(*) FROM orders;
-- Must read every row to compute average. No index helps.
-- Full scan is the only sensible strategy.
 
-- Example 4: Reporting query on entire dataset
SELECT 
    DATE_TRUNC('month', order_date) as month,
    SUM(total) as monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date);
-- Aggregating all orders by month—full scan is appropriate.
 
-- Example 5: Bulk data export
COPY customers TO '/backup/customers.csv' WITH CSV HEADER;
-- Exporting all rows; full scan is the only option.

The Selectivity Judgment Call

Forcing the Optimizer's Hand:

In rare cases, you may know something the optimizer doesn't (stale statistics, unusual data patterns). Most databases offer hints to override optimizer decisions:

-- PostgreSQL: Disable sequential scans to force index use
SET enable_seqscan = off;
SELECT * FROM customers WHERE status = 'active';
SET enable_seqscan = on;

-- MySQL: Use index hint
SELECT * FROM customers FORCE INDEX (idx_status) WHERE status = 'active';

-- SQL Server: Table hint
SELECT * FROM customers WITH (INDEX(idx_status)) WHERE status = 'active';

-- Oracle: Optimizer hint
SELECT /*+ INDEX(customers idx_status) */ * FROM customers WHERE status = 'active';

Monitoring and Prevention

Eliminating existing full table scans is only half the battle. You need systems to prevent new ones from reaching production and to detect regressions.

Proactive Strategies:

Scan Prevention Practices

•Mandatory EXPLAIN in Code Reviews — Require execution plan analysis for any new or modified query before merging. Make 'no table scans' a code review criterion.
•CI/CD Query Analysis — Integrate query plan analysis into your build pipeline. Tools like pg_stat_statements, MySQL Performance Schema, or third-party analyzers can flag problematic patterns.
•Production-Scale Test Data — Test with datasets matching production scale. Catch scans that appear only at volume.
•Alerting on Query Duration — Set thresholds for query execution time. Scans often manifest as duration anomalies.
•Regular Statistics Maintenance — Schedule statistics updates (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server) to keep the optimizer informed.
•Index Usage Monitoring — Track which indexes are used and which aren't. Missing index advisors exist in most databases.

monitoring_queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- PostgreSQL: Find slow queries with full scans
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%Seq Scan%'  -- This is illustrative; actual detection requires plan analysis
ORDER BY total_exec_time DESC
LIMIT 20;
 
-- PostgreSQL: Tables without statistics recently updated
SELECT 
    schemaname,
    relname,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE last_analyze IS NULL 
   OR last_analyze < NOW() - INTERVAL '7 days'
ORDER BY n_live_tup DESC;
 
-- SQL Server: Find missing indexes suggested by optimizer
SELECT 
    mig.index_group_handle,
    mid.statement AS table_name,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns,
    migs.user_seeks,
    migs.avg_total_user_cost * migs.avg_user_impact AS improvement_measure
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs 
    ON migs.group_handle = mig.index_group_handle
JOIN sys.dm_db_missing_index_details mid 
    ON mig.index_handle = mid.index_handle
ORDER BY improvement_measure DESC;
 
-- MySQL: Find queries doing full scans from slow query log
-- Enable: SET GLOBAL slow_query_log = 'ON';
-- SET GLOBAL log_queries_not_using_indexes = 'ON';

Automate Detection

Summary: Full Table Scans

Full table scans are the most common and impactful performance issue in database systems. Understanding them thoroughly is essential for any engineer working with data at scale.

Key Takeaways

•A full table scan reads every page of a table — Performance degrades linearly with table size, regardless of how many rows match your query.
•Scans occur when indexes are missing, ignored, or unusable — Missing indexes, low selectivity, stale statistics, functions on columns, and type mismatches are primary causes.
•Detection requires execution plan analysis — Look for 'Seq Scan', 'Table Scan', 'Clustered Index Scan', or 'TABLE ACCESS FULL' depending on your database.
•The cost extends beyond query duration — Buffer pool pollution, lock contention, and resource starvation affect the entire system.
•Solutions include indexing, query rewriting, and partitioning — Choose based on root cause. Covering indexes eliminate table lookups entirely.
•Some scans are legitimate — Small tables, high-selectivity queries, and analytic workloads may warrant full scans.
•Prevention requires process — Embed plan analysis in code review, test with production-scale data, and monitor query performance proactively.

Page Complete

1 / 5