Loading content...
One of the most dangerous patterns in production databases is the unbounded query—a SELECT statement with no practical limit on how many rows it can return. In development with small datasets, these queries work fine. In production with millions of rows, they become ticking time bombs.
A single unbounded query can:
Mastering result limiting techniques is essential for building robust, production-grade database applications that perform consistently regardless of data growth.
By the end of this page, you'll master LIMIT/OFFSET pagination and its limitations, understand keyset (cursor-based) pagination for high-performance scenarios, learn TOP-N query optimization patterns, and develop strategies for safely processing large result sets.
Before diving into solutions, let's understand exactly what happens when a query returns more data than systems can handle.
Memory Allocation Chain:
When you execute SELECT * FROM large_table (no LIMIT), the following sequence occurs:
Real-World Failure Scenario:
Consider an admin dashboard that displays 'recent orders' using this query:
1234567891011
-- Innocent-looking querySELECT order_id, customer_name, total, order_dateFROM ordersORDER BY order_date DESC; -- Day 1: 100 orders returned - works perfectly-- Month 6: 50,000 orders - starts feeling slow-- Year 2: 500,000 orders - application timeouts-- Year 5: 5,000,000 orders - cascading system failure -- The query never changed - but the data kept growing| Time Frame | Row Count | Response Time | Memory Usage | User Experience |
|---|---|---|---|---|
| Launch | 100 | 50 ms | 10 KB | Instant |
| 6 months | 50,000 | 2 seconds | 5 MB | Noticeable delay |
| 1 year | 200,000 | 12 seconds | 20 MB | Frustrating |
| 2 years | 500,000 | 45 seconds | 50 MB | Timeouts begin |
| 5 years | 5,000,000 | Timeout/Crash | 500 MB+ | System failure |
Unbounded queries are especially dangerous because they 'work' during development and early production. The failure is gradual and often attributed to 'the database being slow' rather than the actual cause: queries that scale linearly with data volume.
The LIMIT clause is the first line of defense against unbounded queries. It caps the number of rows returned regardless of how many match the WHERE clause.
Standard Syntax Variations:
Different database systems use different syntax for the same concept:
123456789101112131415161718192021222324
-- PostgreSQL, MySQL, SQLite: LIMIT syntaxSELECT order_id, total FROM ordersORDER BY order_date DESCLIMIT 100; -- SQL Server: TOP syntaxSELECT TOP 100 order_id, total FROM ordersORDER BY order_date DESC; -- Oracle 12c+: FETCH FIRST syntax (ANSI SQL:2008)SELECT order_id, total FROM ordersORDER BY order_date DESCFETCH FIRST 100 ROWS ONLY; -- Oracle (traditional): ROWNUM in subquerySELECT * FROM ( SELECT order_id, total FROM orders ORDER BY order_date DESC) WHERE ROWNUM <= 100; -- DB2: FETCH FIRST syntaxSELECT order_id, total FROM ordersORDER BY order_date DESCFETCH FIRST 100 ROWS ONLY;How LIMIT Optimizes Query Execution:
When the optimizer sees a LIMIT clause, it can apply several optimizations:
1234567891011121314
-- Without LIMIT: Full table scan + complete sort-- Complexity: O(n) scan + O(n log n) sortSELECT order_id, total FROM ordersORDER BY order_date DESC; -- With LIMIT: Can use index + early termination-- Complexity: O(k) where k is limit, if index exists on order_dateSELECT order_id, total FROM ordersORDER BY order_date DESCLIMIT 10; -- Execution plan difference:-- Without LIMIT: "Sort (cost=50000) -> Seq Scan (cost=40000)"-- With LIMIT: "Limit -> Index Scan Backward (cost=0.5)"For LIMIT to provide maximum optimization, ensure the ORDER BY columns have a matching index. An index on (order_date DESC) allows the database to return the top N rows by simply reading the first N index entries—no sorting required.
The most common pagination approach combines LIMIT with OFFSET, allowing users to navigate through result pages. While straightforward to implement, this approach has significant performance implications at scale.
Basic OFFSET Pagination:
1234567891011121314151617181920212223
-- Page 1: Get first 20 resultsSELECT order_id, customer_name, totalFROM ordersORDER BY order_date DESCLIMIT 20 OFFSET 0; -- Page 2: Skip first 20, get next 20SELECT order_id, customer_name, totalFROM ordersORDER BY order_date DESCLIMIT 20 OFFSET 20; -- Page 50: Skip first 980, get next 20SELECT order_id, customer_name, totalFROM ordersORDER BY order_date DESCLIMIT 20 OFFSET 980; -- Page 5000: Skip first 99,980 rows!SELECT order_id, customer_name, totalFROM ordersORDER BY order_date DESCLIMIT 20 OFFSET 99980;The OFFSET Performance Problem:
Here's the critical issue: the database must still process all skipped rows. OFFSET 99980 doesn't magically jump to row 99,981—it reads and discards 99,980 rows first.
| Page Number | OFFSET Value | Rows Processed | Rows Returned | Efficiency |
|---|---|---|---|---|
| 1 | 0 | 20 | 20 | 100% |
| 10 | 180 | 200 | 20 | 10% |
| 100 | 1,980 | 2,000 | 20 | 1% |
| 1,000 | 19,980 | 20,000 | 20 | 0.1% |
| 10,000 | 199,980 | 200,000 | 20 | 0.01% |
Visualization of the Problem:
Think of OFFSET like reading a book to find page 500. You don't have a table of contents (index for the specific page), so you must flip through 499 pages to reach your destination. Every 'next page' request requires flipping from page 1 again.
Performance Measurement:
1234567891011121314151617181920
-- PostgreSQL: Measure execution time at different offsetsEXPLAIN ANALYZE SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 0;-- Execution Time: 0.5 ms EXPLAIN ANALYZE SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 10000;-- Execution Time: 25 ms EXPLAIN ANALYZE SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 100000;-- Execution Time: 250 ms EXPLAIN ANALYZE SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 1000000;-- Execution Time: 2500 ms (2.5 seconds!)Query time with OFFSET grows linearly with the offset value. Page 10,000 takes 10,000x longer than page 1. For large datasets with deep pagination, OFFSET-based pagination becomes unusable and can cause database resource exhaustion.
Keyset pagination (also called cursor-based or seek pagination) solves the OFFSET performance problem by using values from the last seen row to fetch the next page. This approach maintains consistent O(1) performance regardless of how deep into the dataset you navigate.
The Keyset Concept:
Instead of saying 'skip N rows,' you say 'get rows after this value.'
12345678910111213141516171819202122232425
-- Setup: Orders table with order_id as unique, monotonically increasing-- Page 1: Get first 20 ordersSELECT order_id, customer_name, total, order_dateFROM ordersORDER BY order_id DESCLIMIT 20;-- Returns order_ids 1000, 999, 998, ... , 981-- Remember last_seen_id = 981 -- Page 2: Get next 20 orders after id 981SELECT order_id, customer_name, total, order_dateFROM ordersWHERE order_id < 981 -- Keyset conditionORDER BY order_id DESCLIMIT 20;-- Returns order_ids 980, 979, 978, ... , 961-- Remember last_seen_id = 961 -- Page 5000: Still instant! Just different keyset valueSELECT order_id, customer_name, total, order_dateFROM ordersWHERE order_id < 123 -- Last seen id from page 4999ORDER BY order_id DESCLIMIT 20;-- Execution time: Same as page 1! (~0.5 ms)Why Keyset Pagination is O(1):
With OFFSET, the database:
With Keyset, the database:
No rows are skipped—the index provides direct access to the starting position.
Compound Keyset for Non-Unique Columns:
When paginating by a non-unique column (like order_date), use a compound keyset with a unique tiebreaker:
123456789101112131415161718192021222324
-- Problem: Multiple orders can have the same order_date-- Solution: Use (order_date, order_id) as compound keyset -- Page 1SELECT order_id, customer_name, total, order_dateFROM ordersORDER BY order_date DESC, order_id DESCLIMIT 20;-- Last row: order_date='2024-01-15', order_id=5432 -- Page 2: Row comparison for compound keysetSELECT order_id, customer_name, total, order_dateFROM ordersWHERE (order_date, order_id) < ('2024-01-15', 5432)ORDER BY order_date DESC, order_id DESCLIMIT 20; -- Alternative syntax (more compatible):SELECT order_id, customer_name, total, order_dateFROM ordersWHERE order_date < '2024-01-15' OR (order_date = '2024-01-15' AND order_id < 5432)ORDER BY order_date DESC, order_id DESCLIMIT 20;For optimal keyset pagination performance, create a compound index matching your ORDER BY: CREATE INDEX idx_orders_date_id ON orders(order_date DESC, order_id DESC). This allows pure index-only navigation with no table access.
A common pattern is retrieving the 'top N' records by some criteria—highest revenue customers, most recent orders, best-rated products. These queries are highly optimizable when structured correctly.
Top-N Sort Optimization:
Modern query optimizers recognize TOP-N patterns and use specialized algorithms:
12345678910111213141516171819
-- Find top 10 highest-value ordersSELECT order_id, customer_name, totalFROM ordersORDER BY total DESCLIMIT 10; -- Without optimization: Full sort O(n log n)-- 1. Scan all 10 million rows-- 2. Sort all 10 million by total-- 3. Return first 10 -- With Top-N optimization: Heap-based O(n log k) where k=10-- 1. Maintain a min-heap of size 10-- 2. Scan rows, each taking O(log 10) to maintain heap-- 3. Return heap contents -- For 10 million rows:-- Full sort: 10M * log(10M) = 230 million comparisons-- Top-N heap: 10M * log(10) = 33 million comparisons (7x faster)Index-Backed Top-N:
When an index exists on the ORDER BY column, Top-N becomes even more efficient:
1234567891011121314151617181920
-- Assuming index: CREATE INDEX idx_orders_total ON orders(total DESC) SELECT order_id, totalFROM ordersORDER BY total DESCLIMIT 10; -- Execution plan: "Limit -> Index Scan idx_orders_total"-- The database reads exactly 10 index entries and stops-- Complexity: O(k) where k is the limit - constant time! -- Compare to WHERE-filtered Top-N:SELECT order_id, totalFROM ordersWHERE status = 'completed'ORDER BY total DESCLIMIT 10; -- Now needs: Index on (status, total DESC) for optimal performance-- CREATE INDEX idx_orders_status_total ON orders(status, total DESC)Top-N Per Group (A Common Challenge):
Retrieving top N items for each category requires special techniques:
123456789101112131415161718192021222324252627282930313233343536
-- Goal: Top 3 highest-value orders per customer -- Method 1: Window Functions (most elegant)SELECT customer_id, order_id, totalFROM ( SELECT customer_id, order_id, total, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY total DESC ) as rn FROM orders) rankedWHERE rn <= 3; -- Method 2: LATERAL JOIN (PostgreSQL, SQL Server)SELECT c.customer_id, top_orders.*FROM customers cCROSS JOIN LATERAL ( SELECT order_id, total FROM orders o WHERE o.customer_id = c.customer_id ORDER BY total DESC LIMIT 3) top_orders; -- Method 3: Correlated subquery with EXISTSSELECT o1.customer_id, o1.order_id, o1.totalFROM orders o1WHERE ( SELECT COUNT(*) FROM orders o2 WHERE o2.customer_id = o1.customer_id AND o2.total > o1.total) < 3;Window functions are typically most efficient for Top-N per group on modern databases. LATERAL JOIN excels when each group is accessed via an index. Test all approaches with your specific data distribution—performance varies significantly based on group sizes and cardinality.
Beyond pagination, result limiting is a defensive programming technique to protect your systems from unexpected data volumes.
Always-Limit Pattern:
Establish a maximum result limit for all queries, even those not explicitly paginated:
123456789101112131415161718192021222324252627282930
// Application-level defensive limitsconst MAX_RESULTS = 10000; async function findOrders(filters) { const query = buildQuery(filters); // Always append a safety limit const safeQuery = query + ` LIMIT ${MAX_RESULTS + 1}`; const results = await db.execute(safeQuery); // Detect if we hit the limit (indicating unbounded data) if (results.length > MAX_RESULTS) { logger.warn('Query returned more than MAX_RESULTS', { filters, rowCount: results.length }); // Option 1: Return truncated results with warning return { data: results.slice(0, MAX_RESULTS), truncated: true, message: 'Results limited to 10,000 records' }; // Option 2: Throw error requiring more specific filters // throw new TooManyResultsError('Please refine your search'); } return { data: results, truncated: false };}Database-Level Safeguards:
Some databases support configuration-level limits:
123456789101112
-- MySQL: Set max rows in resultsSET GLOBAL sql_select_limit = 10000; -- PostgreSQL: Use statement timeout as safety netSET statement_timeout = '30s'; -- SQL Server: Row goal hint (influences optimizer, not hard limit)SELECT order_id, total FROM ordersOPTION (FAST 100); -- Optimize for returning first 100 quickly -- Oracle: Row limit in sessionALTER SESSION SET SQL_SELECT_LIMIT = 10000;Streaming and Chunked Processing:
For batch jobs that must process all rows, use streaming or chunked approaches:
1234567891011121314151617181920212223242526272829303132333435
# Bad: Load all rows into memorydef process_all_orders_dangerous(): orders = db.execute("SELECT * FROM orders") # 10 million rows! for order in orders: # Memory exhausted before loop starts process(order) # Good: Stream with server-side cursordef process_all_orders_streaming(): with db.cursor(name='order_cursor') as cursor: cursor.execute("SELECT * FROM orders") while True: batch = cursor.fetchmany(1000) # Fetch 1000 at a time if not batch: break for order in batch: process(order) # Good: Keyset-based chunkingdef process_all_orders_keyset(): last_id = 0 while True: batch = db.execute(""" SELECT * FROM orders WHERE order_id > %s ORDER BY order_id LIMIT 1000 """, [last_id]) if not batch: break for order in batch: process(order) last_id = batch[-1]['order_id']Server-side cursors maintain state on the database, consuming resources. Keyset chunking is stateless but requires an indexed ordering column. For long-running batch jobs, keyset chunking is more resilient to connection failures and can be easily parallelized.
Even with good intentions, result limiting has common pitfalls. Here are patterns to avoid and their solutions.
Pitfall 1: Incorrect Total Count with LIMIT:
1234567891011121314151617181920
-- Wrong: Getting total count and page in separate queries (race condition)SELECT COUNT(*) FROM orders WHERE status = 'pending'; -- Returns 5000SELECT * FROM orders WHERE status = 'pending' LIMIT 20; -- Might be different! -- Better: Use window function for totalSELECT order_id, total, COUNT(*) OVER() as total_count -- Includes total in each rowFROM orders WHERE status = 'pending'LIMIT 20; -- Best for large tables: Approximate count-- PostgreSQLSELECT reltuples::bigint FROM pg_class WHERE relname = 'orders'; -- With filters, use EXPLAIN to get row estimateEXPLAIN SELECT * FROM orders WHERE status = 'pending';-- Read "rows=" from output for estimatePitfall 2: LIMIT Without ORDER BY:
12345678910111213141516
-- Dangerous: No ORDER BY means unpredictable resultsSELECT order_id, total FROM orders LIMIT 10; -- Which 10 orders? Could be any 10!-- Results may differ between:-- - Different executions-- - Different database replicas -- - Before and after VACUUM/optimize -- Always specify ORDER BY with LIMITSELECT order_id, total FROM orders ORDER BY order_id -- Deterministic, repeatableLIMIT 10; -- Exception: When you truly don't care which rows (sampling)SELECT * FROM orders TABLESAMPLE SYSTEM(0.1); -- Random 0.1%Pitfall 3: Duplicate/Missing Rows During Pagination:
1234567891011121314151617181920212223
-- Scenario: User is on page 5, new order inserted-- With OFFSET pagination:-- Page 5: OFFSET 80 LIMIT 20 returns rows 81-100-- New row inserted, pushes all rows down-- User clicks "Next" -- Page 6: OFFSET 100 LIMIT 20 - row 100 appears again (duplicate!) -- Solution 1: Keyset pagination (inherently consistent)SELECT * FROM orders WHERE order_id > :last_seen_id ORDER BY order_id LIMIT 20; -- Solution 2: Read consistency with timestamp snapshotSELECT * FROM orders WHERE created_at < :query_start_time -- Freeze pointORDER BY created_at DESCLIMIT 20 OFFSET 80; -- Solution 3: Explicit transaction isolation (expensive)BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;SELECT * FROM orders ORDER BY order_id LIMIT 20 OFFSET 0;-- Keep transaction open for all pages (not recommended)Pagination bugs often only manifest at scale. Test with production-size data and realistic concurrent write load. A pagination system that works perfectly with 1,000 rows may break with 1,000,000 rows and 100 writes/second.
For complex scenarios, these advanced patterns provide efficient result limiting.
Deferred Join Pattern:
When selecting many columns but filtering on few, fetch IDs first:
123456789101112131415161718192021222324
-- Slow: Fetches all columns during pagination scanSELECT o.*, c.name, c.email, p.titleFROM orders oJOIN customers c ON o.customer_id = c.idJOIN products p ON o.product_id = p.idWHERE o.status = 'pending'ORDER BY o.created_at DESCLIMIT 20 OFFSET 10000; -- Fast: Deferred join - pagination on IDs onlySELECT o.*, c.name, c.email, p.titleFROM ( SELECT order_id FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 20 OFFSET 10000) pageJOIN orders o ON o.order_id = page.order_idJOIN customers c ON o.customer_id = c.idJOIN products p ON o.product_id = p.id; -- Why faster: Inner query scans small index, outer query -- touches only 20 rows across all tablesMaterialized Pagination:
For frequently-accessed paginated views, pre-compute page assignments:
123456789101112131415161718192021
-- Create a materialized view with page numbersCREATE MATERIALIZED VIEW orders_paginated ASSELECT order_id, customer_id, total, order_date, (ROW_NUMBER() OVER (ORDER BY order_date DESC) - 1) / 20 + 1 as page_numFROM ordersWHERE status = 'active'; -- Create index on page numberCREATE INDEX idx_orders_page ON orders_paginated(page_num); -- Fetching any page is now O(1)SELECT order_id, customer_id, total, order_dateFROM orders_paginatedWHERE page_num = 5000; -- Instant! -- Refresh periodicallyREFRESH MATERIALIZED VIEW orders_paginated;Bidirectional Keyset Navigation:
Enabling both 'previous' and 'next' page navigation with keyset:
1234567891011121314151617181920
-- Current page centered on order_id = 500-- Display ordering: newest first (ORDER BY order_id DESC) -- Next page (older orders)SELECT order_id, total, order_dateFROM ordersWHERE order_id < 481 -- Last id from current pageORDER BY order_id DESCLIMIT 20; -- Previous page (newer orders)-- Reverse the query and re-reverse resultsSELECT * FROM ( SELECT order_id, total, order_date FROM orders WHERE order_id > 500 -- First id from current page ORDER BY order_id ASC -- Opposite direction LIMIT 20) prevORDER BY order_id DESC; -- Restore display orderReturn cursor tokens in your API response: { data: [...], nextCursor: 'eyJpZCI6NDgxfQ==', prevCursor: 'eyJpZCI6NTAxfQ==' }. Base64-encode the keyset values. This hides implementation details and prevents cursor manipulation while enabling efficient navigation.
Result limiting is fundamental to building database applications that remain performant as data grows. Let's consolidate the key principles:
What's next:
The next page explores efficient join techniques. You'll learn how join order affects performance, strategies for optimizing multi-table queries, and when to restructure queries to leverage different join algorithms.
You now understand result limiting as both a feature (pagination) and a defensive practice (resource protection). These techniques ensure your queries remain performant at any scale, protecting both user experience and system stability.