Loading learning content...
Few SQL practices are as universally discouraged—yet as widely used—as SELECT *. This seemingly innocent shortcut appears in countless production systems, tutorials, and even documentation examples. It's quick to type and appears to 'just work.' But beneath this convenience lurks a cascade of performance problems that can cripple database systems at scale.
Understanding why SELECT * is problematic—and more importantly, how to avoid it—is one of the most impactful optimizations any developer can learn. The principles explored here touch every aspect of database performance: network bandwidth, memory utilization, query planning, index optimization, and long-term maintainability.
By the end of this page, you will understand the full spectrum of problems caused by SELECT *, learn to quantify its impact on your systems, master techniques for column selection optimization, and recognize the few legitimate scenarios where SELECT * is acceptable.
Before examining the problems, let's understand exactly what happens when the database engine encounters SELECT *. This knowledge is fundamental to appreciating why explicit column selection matters.
Query Expansion at Parse Time:
When you write SELECT * FROM employees, the database doesn't literally fetch 'everything.' During the parsing phase, the query processor expands the asterisk into an explicit list of every column in the table, in their defined order. If the employees table has columns id, first_name, last_name, email, department_id, salary, hire_date, manager_id, address, and phone, the query becomes:
1234567
-- What you write:SELECT * FROM employees; -- What the database actually executes:SELECT id, first_name, last_name, email, department_id, salary, hire_date, manager_id, address, phoneFROM employees;Metadata Lookup Overhead:
To perform this expansion, the database must:
This metadata lookup adds latency—small per query, but significant at high volumes. More critically, it creates a dependency on the current schema state that can cause problems we'll explore shortly.
Dynamic vs. Static Column Lists:
With SELECT *, your query's column list is dynamic—it changes whenever the table schema changes. With explicit column selection, your query is static—it returns exactly what you specified, regardless of schema evolution. This distinction has profound implications for application stability.
Using SELECT * creates invisible coupling between your application code and the database schema. The query 'looks' simple, but it's actually a function of the table's current column definitions—a dependency that's easy to forget until it causes production failures.
The most immediate cost of SELECT * is the sheer volume of data transferred. Every unnecessary column consumes network bandwidth, increases disk I/O, and occupies memory buffers. Let's quantify this impact.
Case Study: User Profile Retrieval
Consider a typical web application that needs to display a user's name on a dashboard. The users table contains:
| Column | Data Type | Typical Size (bytes) | Needed for Display? |
|---|---|---|---|
| id | BIGINT | 8 | No |
| username | VARCHAR(50) | 25 avg | Yes |
| VARCHAR(255) | 30 avg | No | |
| password_hash | CHAR(60) | 60 | Never |
| first_name | VARCHAR(50) | 15 avg | Yes |
| last_name | VARCHAR(50) | 15 avg | Yes |
| profile_photo | BLOB | 500,000 avg | No |
| bio | TEXT | 2,000 avg | No |
| created_at | TIMESTAMP | 8 | No |
| last_login | TIMESTAMP | 8 | No |
| settings_json | JSON | 5,000 avg | No |
The multiplication effect:
This 9,000x difference compounds across your application:
SELECT * transfers unnecessary dataFor high-traffic applications, the cumulative impact of SELECT * across hundreds of queries can consume terabytes of unnecessary bandwidth monthly.
Large object columns (BLOB, TEXT, JSON) are particularly dangerous with SELECT *. Even if you never use the data, the database must read it from disk (often from separate storage pages), transfer it over the network, and allocate memory for it. One forgotten TEXT column can dwarf the cost of all other columns combined.
Beyond network transfer, SELECT * affects database memory management in ways that ripple throughout system performance.
Buffer Pool Contamination:
Most databases use a buffer pool (or cache) to keep frequently accessed data pages in memory. When you request unnecessary columns, the database must load their data pages into the buffer pool, potentially evicting more useful data.
Consider a query fetching order IDs for processing:
12345
-- Anti-pattern: Loads entire rows including large description TEXTSELECT * FROM orders WHERE status = 'pending'; -- Optimized: Only loads pages containing order IDsSELECT order_id FROM orders WHERE status = 'pending';If the orders table includes a description TEXT column averaging 5KB per order, the first query loads 50,000x more data into the buffer pool than the second. This means:
Result Set Memory Allocation:
The database must allocate memory for the entire result set before transferring it. With SELECT *, this allocation includes space for columns you never examine:
| Query Type | Columns Retrieved | Memory Allocated | Actual Data Used |
|---|---|---|---|
| SELECT * | 20 columns (1.5 KB/row) | 15 MB | 500 KB (3 columns) |
| SELECT a, b, c | 3 columns (50 bytes/row) | 500 KB | 500 KB (100%) |
Client-Side Memory Waste:
The problem extends to your application. ORMs and database drivers allocate objects or arrays to hold query results. With SELECT *, every result object contains fields for unused columns:
ResultSet row allocates all column objectsFor long-running services processing millions of rows daily, this waste accumulates into gigabytes of unnecessary memory allocation and garbage collection overhead.
Memory waste from SELECT * compounds across the entire request lifecycle: database buffer pool → database result buffer → network buffers → driver buffers → application memory → serialization buffers. Each layer allocates space for columns you never use.
Perhaps the most significant—yet least understood—impact of SELECT * is on query execution plans and index utilization. This is where the difference between SELECT * and explicit column selection moves from linear waste to exponential slowdown.
Destroying Covering Index Potential:
A covering index (also called an index-only scan) is one of the most powerful optimizations available. When all columns needed by a query exist in an index, the database can satisfy the query entirely from the index without touching the table data at all.
Consider a query to find employee email addresses by department:
12345678910
-- Assume this covering index exists:CREATE INDEX idx_emp_dept_email ON employees(department_id, email); -- Query 1: Cannot use covering index - must access tableSELECT * FROM employees WHERE department_id = 5;-- Execution: Index seek + Table lookup for EVERY row (expensive) -- Query 2: Uses covering index exclusively - never touches tableSELECT email FROM employees WHERE department_id = 5;-- Execution: Index-only scan (10-100x faster)Performance difference explained:
With the covering index approach:
department_id and email togetherWith SELECT *:
| Operation | SELECT * | SELECT email |
|---|---|---|
| Index scan time | 5 ms | 5 ms |
| Table lookups required | 10,000 random seeks | 0 |
| Disk I/O (worst case) | 10,000 × 10ms = 100 sec | None |
| Disk I/O (cached) | 10,000 × 0.1ms = 1 sec | None |
| Total query time (worst) | ~100 seconds | 5 ms |
| Total query time (cached) | ~1 second | 5 ms |
Optimizer Decision Influence:
The query optimizer considers all requested columns when choosing an execution plan. With SELECT *, the optimizer knows it must eventually retrieve all columns, which influences its decisions:
By requesting only needed columns, you give the optimizer more freedom to choose efficient plans.
EXPLAIN plans don't always show the full cost difference. A plan might show 'index seek' for both queries, but the SELECT * version incurs thousands of additional table lookups that may not be visible in the plan cost estimates. Always test with actual execution times.
Beyond immediate performance, SELECT * creates long-term maintainability challenges that compound as systems evolve.
The Fragile Query Problem:
When a schema changes, queries using SELECT * behave unpredictably:
123456789101112131415
-- Original tableCREATE TABLE products (id INT, name VARCHAR(100), price DECIMAL(10,2)); -- Application code expects: [id, name, price]SELECT * FROM products; -- Six months later, DBA adds columns:ALTER TABLE products ADD COLUMN description TEXT; -- 10KB averageALTER TABLE products ADD COLUMN image_url VARCHAR(500);ALTER TABLE products ADD COLUMN inventory_count INT;ALTER TABLE products ADD COLUMN metadata JSON; -- 50KB average -- Same query now returns 60KB per row instead of 200 bytes-- Application silently slows down by 300x-- No code change, no deployment, no alertSymptoms of this problem:
Column Order Dependencies:
Some application code implicitly depends on column order when processing SELECT * results:
12345678910111213141516
# Dangerous: Assumes column positionscursor.execute("SELECT * FROM users")for row in cursor.fetchall(): user_id = row[0] # Assumes id is first username = row[1] # Assumes username is second email = row[2] # Assumes email is third # If DBA reorders columns or adds a column before 'email',# this code silently produces wrong results # Safe: Explicit column selectioncursor.execute("SELECT id, username, email FROM users")for row in cursor.fetchall(): user_id = row[0] # Always id username = row[1] # Always username email = row[2] # Always emailDocumentation and Readability:
SELECT * tells readers nothing about the query's purpose. Explicit columns document intent:
Months later, when debugging or optimizing, explicit columns provide essential context that SELECT * obscures.
SELECT * can inadvertently expose sensitive columns. When new sensitive data (SSN, password hashes, financial info) is added to a table, all SELECT * queries immediately start returning it. This has caused real data breaches when developers assumed SELECT * results wouldn't contain sensitive data.
Now that we understand the problems, let's establish concrete practices for effective column selection.
Rule 1: Always Specify Columns Explicitly
Make explicit column selection your default, not an optimization added later:
12345678910111213
-- Instead of:SELECT * FROM orders WHERE customer_id = 123; -- Write:SELECT order_id, order_date, total_amount, statusFROM ordersWHERE customer_id = 123; -- Benefits:-- 1. Documents exactly what data is needed-- 2. Enables covering index optimization-- 3. Immune to schema changes-- 4. Uses minimal network and memoryRule 2: Request Only What You Process
Audit your code to ensure every selected column is actually used:
123456789
// Bad: Selects 20 columns, uses 3const users = await db.query('SELECT * FROM users WHERE active = true');const names = users.map(u => `${u.first_name} ${u.last_name}`); // Good: Selects exactly what's neededconst users = await db.query( 'SELECT first_name, last_name FROM users WHERE active = true');const names = users.map(u => `${u.first_name} ${u.last_name}`);Rule 3: Use Table Aliases in Joins
When joining tables, always qualify column names to avoid ambiguity and ensure clarity:
12345678910111213
-- Bad: Ambiguous and fetches all columns from both tablesSELECT * FROM orders oJOIN customers c ON o.customer_id = c.id; -- Good: Clear, minimal, and optimizableSELECT o.order_id, o.order_date, o.total_amount, c.customer_name, c.emailFROM orders oJOIN customers c ON o.customer_id = c.id;Rule 4: Create Views for Common Column Sets
If multiple queries need the same column subset, encapsulate it in a view:
12345678910111213141516
-- Create a view for the common display fieldsCREATE VIEW customer_display ASSELECT id, customer_name, email, phone, city, countryFROM customers; -- Queries can now use the view, knowing it contains minimum necessary fieldsSELECT * FROM customer_display WHERE country = 'USA'; -- Note: SELECT * from a well-designed view is acceptable because-- the view itself enforces column disciplineDespite everything discussed, there are legitimate scenarios where SELECT * is appropriate. Understanding these exceptions helps apply the rules pragmatically.
Ad-Hoc Query Exploration:
During development or debugging, SELECT * helps you understand table structure and data:
12345678910111213
-- Acceptable: Quick exploration during developmentSELECT * FROM unknown_table LIMIT 10; -- Acceptable: Debugging to see all data related to an issueSELECT * FROM orders WHERE order_id = 12345; -- Acceptable: Export/backup scenarios where all data is neededSELECT * INTO backup_table FROM production_table; -- Acceptable: Subquery where outer query filters columnsSELECT id, name FROM ( SELECT * FROM employees WHERE department = 'Engineering') AS eng_employees;EXISTS Subqueries:
SELECT * in an EXISTS clause is optimized away by smart query engines:
12345678910
-- Acceptable: The SELECT * is never actually executedSELECT customer_nameFROM customers cWHERE EXISTS ( SELECT * FROM orders o WHERE o.customer_id = c.id); -- The optimizer recognizes this as an existence check-- No columns are actually retrieved from the orders table-- Equivalent to SELECT 1 or SELECT NULLRow Count Queries:
For COUNT(*), the asterisk means 'count rows' rather than 'select all columns':
12345678
-- COUNT(*) is a special case - counts rows, doesn't retrieve columnsSELECT COUNT(*) FROM orders WHERE status = 'pending'; -- This is fully optimized and does NOT suffer from SELECT * problems-- The database can use any available index or table statistics-- No actual row data is retrieved -- Note: COUNT(1) and COUNT(*) are equivalent in modern databasesIf the query will reach production code, specify columns explicitly. If you're exploring data interactively or writing a one-time script, SELECT * is fine. The distinction is: will this query be executed thousands or millions of times?
To convince stakeholders or measure improvement in your systems, use these techniques to quantify SELECT * impact.
Query Comparison Testing:
1234567891011121314151617
-- PostgreSQL: Compare execution statisticsEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT * FROM orders WHERE customer_id = 1000; EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT order_id, order_date, total FROM orders WHERE customer_id = 1000; -- Key metrics to compare:-- - 'Buffers: shared hit/read' - I/O operations-- - 'actual time' - execution duration-- - 'rows' vs 'width' - data volume -- MySQL: Enable profilingSET profiling = 1;SELECT * FROM orders WHERE customer_id = 1000;SELECT order_id, order_date, total FROM orders WHERE customer_id = 1000;SHOW PROFILES; -- Compare query durationsNetwork Transfer Measurement:
123456789101112131415161718
-- PostgreSQL: Estimate row sizesSELECT pg_column_size(t.*) as row_size_bytes, pg_column_size((t.order_id, t.order_date, t.total)) as minimal_size_bytesFROM orders tLIMIT 100; -- MySQL: Get column sizesSELECT column_name, data_type, character_maximum_length, numeric_precisionFROM information_schema.columnsWHERE table_name = 'orders'; -- Calculate total transfer size difference-- Row size × Row count = Total bytes transferred| Metric | How to Measure | Target Improvement |
|---|---|---|
| Query execution time | EXPLAIN ANALYZE | 50% reduction |
| Buffer/page reads | BUFFERS in EXPLAIN | 70% reduction |
| Network bytes | Row size × count | 80% reduction |
| Memory allocation | Application profiler | 60% reduction |
| Client processing time | Application timing | 30% reduction |
Search your codebase for 'SELECT *' patterns. Each instance is a potential quick win. Prioritize queries that: (1) execute frequently, (2) touch tables with large columns, and (3) return many rows. These offer the highest return on optimization effort.
We've explored one of the most impactful yet simplest query optimizations available. Let's consolidate the key principles:
What's next:
The next page explores another fundamental optimization: limiting result sets. You'll learn techniques for efficient pagination, the performance implications of unbounded queries, and strategies for processing large data sets without overwhelming your systems.
You now understand why SELECT * is one of the most common SQL anti-patterns and how explicit column selection can dramatically improve query performance. This single practice—applied consistently—can reduce network traffic, memory usage, and query times by orders of magnitude.