Loading learning content...
Your application is running slow. You check the database—CPU usage is normal, no long-running queries. You check the app server—it's waiting on the database. What's happening?
You enable query logging and discover the horrifying truth: to render a single page listing 50 blog posts with their authors, your application executed 51 separate database queries. One query to fetch the posts, then 50 individual queries to fetch each author.
This is the N+1 query problem: the most common performance anti-pattern in application development. It's called 'N+1' because you execute 1 query to get a list of N items, then N additional queries to get related data for each item. The result is:
The insidious nature of N+1:
By the end of this page, you will understand exactly how N+1 problems arise, recognize them in ORMs and raw SQL, implement eager loading and batching solutions, and establish practices to prevent N+1 issues in your codebase.
The N+1 pattern emerges from a seemingly innocent data access pattern: retrieving a collection, then accessing related data for each item.
The Naive Pattern:
1. Query: Get all posts → Returns N posts
2. For each post:
Query: Get author where id = post.author_id → N queries
─────────────────────────────────────────
Total: 1 + N queries
Concrete Example:
12345678910111213141516171819202122
# Python with SQLAlchemy ORM - THE PROBLEM # Query 1: Fetch all postsposts = session.query(Post).all() for post in posts: # Query 2 through N+1: For each post, fetch the author print(f"Title: {post.title}") print(f"Author: {post.author.name}") # <-- Triggers lazy load! # ^^^^^^^^^^^ # SQLAlchemy executes: SELECT * FROM authors WHERE id = ? # This happens for EVERY post in the loop! # If we have 100 posts:# Query 1: SELECT * FROM posts; (1 query)# Query 2: SELECT * FROM authors WHERE id = 1; # Query 3: SELECT * FROM authors WHERE id = 2; # Query 4: SELECT * FROM authors WHERE id = 3; # ...# Query 101: SELECT * FROM authors WHERE id = 100; (100 queries)# ─────────────────────────────────────────────────# TOTAL: 101 queries for what should be 1-2 queries!Why This Happens:
Most ORMs use lazy loading by default for related entities. When you access post.author, the ORM:
author hasn't been loaded yetThis is convenient for single-object access—you only load what you need. But in loops, it's catastrophic.
The Mathematical Reality:
| Posts | Queries (N+1) | Round-trips @ 5ms each | Total Latency |
|---|---|---|---|
| 10 | 11 | 55ms | Acceptable |
| 100 | 101 | 505ms | Sluggish |
| 1,000 | 1,001 | 5,005ms | Unusable |
| 10,000 | 10,001 | 50,005ms | Application failure |
The efficient alternative (2 queries with JOIN or IN clause) would take ~10ms regardless of N.
N+1 isn't just about query execution time—it's about network round-trips. Each query requires: network request, connection handling, query parsing, execution, and network response. Even with a 5ms query, 1,000 round-trips add 5 seconds of unavoidable latency. Cloud databases with higher network latency suffer even more.
Every ORM can produce N+1 queries. Understanding how it manifests in your technology stack is essential for prevention.
The Common Thread:
All ORMs support lazy loading (load related data on-demand) and eager loading (load related data upfront). N+1 occurs when you iterate over lazily-loaded relationships.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
# ==========================================# SQLAlchemy - The N+1 Problem# ========================================== # PROBLEM: Lazy loading in loopposts = session.query(Post).all()for post in posts: print(post.author.name) # N additional queries # SOLUTION 1: joinedload (LEFT OUTER JOIN)from sqlalchemy.orm import joinedloadposts = session.query(Post).options(joinedload(Post.author)).all()# Generates: SELECT posts.*, authors.* # FROM posts LEFT OUTER JOIN authors ON posts.author_id = authors.id# Single query returns all data # SOLUTION 2: subqueryload (separate query, but batched)from sqlalchemy.orm import subqueryloadposts = session.query(Post).options(subqueryload(Post.comments)).all()# Query 1: SELECT * FROM posts# Query 2: SELECT * FROM comments WHERE post_id IN (1, 2, 3, ...)# 2 queries total, regardless of N # SOLUTION 3: selectinload (recommended for collections)from sqlalchemy.orm import selectinloadposts = session.query(Post).options(selectinload(Post.comments)).all()# Same as subqueryload but uses IN clause # ==========================================# Django ORM - The N+1 Problem# ========================================== # PROBLEM: Lazy loading in loopposts = Post.objects.all()for post in posts: print(post.author.name) # N additional queries # SOLUTION 1: select_related (for ForeignKey/OneToOne - uses JOIN)posts = Post.objects.select_related('author').all()# Single query with JOIN # SOLUTION 2: prefetch_related (for ManyToMany/reverse FK - uses IN)posts = Post.objects.prefetch_related('comments').all()# 2 queries: one for posts, one for all related comments # SOLUTION 3: Combine both for complex relationshipsposts = Post.objects.select_related('author').prefetch_related('tags', 'comments').all()# 3 queries: posts+authors JOIN, tags IN, comments INLazy loading is convenient for development but dangerous in production. Some teams disable lazy loading entirely, forcing explicit eager loading for all relationships. This makes N+1 issues visible at development time when related entities are accessed without being loaded.
N+1 problems aren't exclusive to ORMs. They occur in any code that executes queries in loops—including raw SQL and stored procedures.
N+1 in Application Layer Raw SQL:
12345678910111213141516171819202122232425262728293031323334353637383940
# Python example with raw SQL # PROBLEM: Query in a loopcursor.execute("SELECT id, title, author_id FROM posts")posts = cursor.fetchall() for post in posts: # Executing a query for each post - N+1! cursor.execute( "SELECT name FROM authors WHERE id = %s", (post['author_id'],) ) author = cursor.fetchone() print(f"{post['title']} by {author['name']}") # SOLUTION 1: Single JOIN querycursor.execute(""" SELECT p.id, p.title, a.name as author_name FROM posts p JOIN authors a ON p.author_id = a.id""")posts_with_authors = cursor.fetchall()for row in posts_with_authors: print(f"{row['title']} by {row['author_name']}")# 1 query instead of N+1 # SOLUTION 2: Batch with IN clausecursor.execute("SELECT id, title, author_id FROM posts")posts = cursor.fetchall()author_ids = tuple(set(p['author_id'] for p in posts)) cursor.execute( "SELECT id, name FROM authors WHERE id IN %s", (author_ids,))authors = {a['id']: a['name'] for a in cursor.fetchall()} for post in posts: print(f"{post['title']} by {authors[post['author_id']]}")# 2 queries regardless of NN+1 in Stored Procedures (Cursor-Based Loops):
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- PROBLEM: Stored procedure with cursor loopCREATE PROCEDURE ProcessOrders()BEGIN DECLARE done INT DEFAULT FALSE; DECLARE order_id INT; DECLARE customer_name VARCHAR(100); -- Cursor to iterate through orders DECLARE order_cursor CURSOR FOR SELECT id FROM orders WHERE status = 'pending'; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; OPEN order_cursor; read_loop: LOOP FETCH order_cursor INTO order_id; IF done THEN LEAVE read_loop; END IF; -- N+1: Query per order to get customer name SELECT name INTO customer_name FROM customers WHERE id = (SELECT customer_id FROM orders WHERE id = order_id); -- Process the order... END LOOP; CLOSE order_cursor;END; -- SOLUTION: Avoid cursor, use set-based operationsCREATE PROCEDURE ProcessOrdersBetter()BEGIN -- Get all data in one query DECLARE result CURSOR FOR SELECT o.id, o.total, c.name as customer_name FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.status = 'pending'; -- Or better: Avoid cursors entirely with set-based logic -- Most cursor operations can be rewritten as UPDATE/INSERT with JOIN UPDATE orders o JOIN customers c ON o.customer_id = c.id SET o.processed_by = c.sales_rep, o.status = 'processing' WHERE o.status = 'pending'; -- Single set-based operation, no N+1END;Row-by-row cursor processing in stored procedures is a common N+1 source. Any query executed inside a cursor loop multiplies database operations. Prefer set-based operations (UPDATE with JOIN, INSERT...SELECT) over cursor iteration.
N+1 problems are notoriously hard to detect because they're invisible at the code level and only manifest at runtime. Effective detection requires multiple approaches.
Detection Methods:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
# ==========================================# Python: Query Counting in Tests# ========================================== from sqlalchemy import eventfrom sqlalchemy.engine import Engineimport threading # Thread-local query counterquery_counter = threading.local() @event.listens_for(Engine, "before_cursor_execute")def count_query(conn, cursor, statement, parameters, context, executemany): if not hasattr(query_counter, 'count'): query_counter.count = 0 query_counter.count += 1 # Context manager for counting queriesfrom contextlib import contextmanager @contextmanagerdef count_queries(): query_counter.count = 0 yield query_counter # Usage in testsdef test_no_n1_in_post_list(): with count_queries() as counter: posts = session.query(Post).options(joinedload(Post.author)).all() for post in posts: _ = post.author.name # Assert reasonable query count (should be 1 or 2, not N+1) assert counter.count <= 2, f"N+1 detected: {counter.count} queries" # ==========================================# Django: django-debug-toolbar# ==========================================# In settings.py:INSTALLED_APPS = [ ... 'debug_toolbar',] # The toolbar shows query count and detects duplicates # ==========================================# Rails: bullet gem# ==========================================# In Gemfile:# gem 'bullet', group: 'development' # In config/environments/development.rb:# Bullet.enable = true# Bullet.alert = true # Browser alert on N+1# Bullet.add_footer = true # Show in page footer12345678910111213141516171819202122232425262728293031323334353637383940414243
-- PostgreSQL: Find potential N+1 from query patterns-- Look for repeated similar queries in pg_stat_statements SELECT query, calls, total_exec_time, mean_exec_time, calls * mean_exec_time AS total_impactFROM pg_stat_statementsWHERE query LIKE 'SELECT%WHERE%id = $1%' -- Parameterized single-row lookupsORDER BY calls DESCLIMIT 20; -- High "calls" with low "mean_exec_time" suggests N+1-- e.g., 10,000 calls of 0.5ms each = N+1 pattern -- MySQL: Enable general query log temporarilySET GLOBAL general_log = 'ON';SET GLOBAL log_output = 'TABLE'; -- Review queriesSELECT * FROM mysql.general_log ORDER BY event_time DESC LIMIT 1000; -- Look for patterns like:-- SELECT * FROM authors WHERE id = 1;-- SELECT * FROM authors WHERE id = 2; -- SELECT * FROM authors WHERE id = 3;-- ... -- SQL Server: Query StoreSELECT q.query_id, qt.query_sql_text, rs.count_executions, rs.avg_duration / 1000 AS avg_msFROM sys.query_store_query qJOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_idJOIN sys.query_store_plan p ON q.query_id = p.query_idJOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_idWHERE qt.query_sql_text LIKE '%WHERE%id%=%'ORDER BY rs.count_executions DESC;One of the most effective N+1 prevention techniques is writing tests that assert a maximum query count. If you expect a view to execute 2 queries, write a test that fails if it executes more. This catches N+1 regressions immediately.
Eager loading is the primary solution for N+1 problems. Instead of loading related data on-demand (lazy), you load it upfront in the initial query.
Eager Loading Strategies:
When to Use Each:
| Strategy | Best For | Queries | Memory Usage | Watch Out For |
|---|---|---|---|---|
| JOIN | Single-value relationships (ManyToOne) | 1 | Higher (duplicate parent data) | Cartesian explosion with multiple collections |
| Batch/IN Clause | Collection relationships (OneToMany) | 2 | Lower | Large IN clauses may hit limits |
| DataLoader | GraphQL/complex graphs | 1 per entity type | Lowest | Complexity of caching/batching logic |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
-- ==========================================-- Strategy 1: JOIN-based loading-- ========================================== -- Load posts with authors in single querySELECT p.id AS post_id, p.title, p.content, a.id AS author_id, a.name AS author_name, a.email AS author_emailFROM posts pJOIN authors a ON p.author_id = a.id; -- Result: One row per post, with author embedded-- ORM hydrates this into Post objects with populated Author -- ==========================================-- Strategy 2: Batch/IN Clause loading-- ========================================== -- Query 1: Get all postsSELECT id, title, content, author_id FROM posts; -- Application collects author_ids: [1, 2, 3, 5, 8, 13, ...] -- Query 2: Get all authors for those postsSELECT id, name, email FROM authors WHERE id IN (1, 2, 3, 5, 8, 13, ...); -- Application merges in memory-- 2 queries total, regardless of post count -- ==========================================-- Avoiding Cartesian Explosion-- ========================================== -- PROBLEM: Multiple JOINs on collectionsSELECT p.id, p.title, c.id AS comment_id, c.text, t.id AS tag_id, t.nameFROM posts pLEFT JOIN comments c ON p.post_id = c.post_idLEFT JOIN post_tags pt ON p.id = pt.post_idLEFT JOIN tags t ON pt.tag_id = t.id; -- If post has 10 comments and 5 tags:-- Result: 10 × 5 = 50 rows per post!-- 100 posts × 50 rows = 5,000 total rows-- This is called "cartesian explosion" -- SOLUTION: Separate queries for each collectionSELECT id, title FROM posts; -- 1 querySELECT id, text, post_id FROM comments WHERE post_id IN (...); -- 1 querySELECT t.id, t.name, pt.post_id FROM tags t JOIN post_tags pt ON t.id = pt.tag_id WHERE pt.post_id IN (...); -- 1 query-- 3 compact result sets, no cartesian explosionWhen eager loading multiple collections via JOINs, result sets can explode exponentially. 100 posts × 10 comments × 5 tags = 5,000 rows for just 100 posts. Modern ORMs (EF Core's AsSplitQuery, Hibernate's @Fetch(SUBSELECT)) offer options to avoid this.
When eager loading isn't practical (e.g., dynamic queries, GraphQL resolvers, or complex conditional loading), batching and DataLoader patterns provide an alternative.
The DataLoader Pattern:
Originally developed for GraphQL at Facebook, DataLoader is a utility that:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
// JavaScript/TypeScript DataLoader Example import DataLoader from 'dataloader'; // Create a DataLoader for authorsconst authorLoader = new DataLoader(async (authorIds: number[]) => { // This function is called once with ALL requested IDs batched together console.log(`Loading authors: ${authorIds}`); // Single database query for all requested authors const authors = await db.query( 'SELECT * FROM authors WHERE id IN (?)', [authorIds] ); // Return authors in same order as requested IDs const authorsById = new Map(authors.map(a => [a.id, a])); return authorIds.map(id => authorsById.get(id) || null);}); // Usage in resolversasync function getPostsWithAuthors() { const posts = await db.query('SELECT * FROM posts'); // Even though we call load() N times, DataLoader batches into 1 query const postsWithAuthors = await Promise.all( posts.map(async (post) => ({ ...post, author: await authorLoader.load(post.authorId) // Batched! })) ); return postsWithAuthors;} // What happens:// 1. Posts query executes (1 query)// 2. 100 calls to authorLoader.load() accumulate// 3. DataLoader batches into: SELECT * FROM authors WHERE id IN (1,2,3,4,...)// 4. Total: 2 queries instead of 101 // ==========================================// DataLoader with Caching// ========================================== // DataLoader also caches within a requestconst posts = [ { id: 1, authorId: 5 }, { id: 2, authorId: 5 }, // Same author as post 1 { id: 3, authorId: 5 }, // Same author again]; // Only loads author 5 once, returns cached result for subsequent callsposts.forEach(async (post) => { const author = await authorLoader.load(post.authorId); // First call: loads from DB // Subsequent calls: returns cached});123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
# Python: Manual Batching Pattern from collections import defaultdict def get_posts_with_details(post_ids): """ Fetch posts with authors and comments using manual batching. Avoids N+1 by batching related entity lookups. """ # Step 1: Load all posts posts = db.query( "SELECT * FROM posts WHERE id IN %s", (tuple(post_ids),) ) if not posts: return [] # Step 2: Extract foreign keys for batch loading author_ids = list(set(p['author_id'] for p in posts)) # Step 3: Batch load all authors authors = db.query( "SELECT * FROM authors WHERE id IN %s", (tuple(author_ids),) ) authors_by_id = {a['id']: a for a in authors} # Step 4: Batch load all comments comments = db.query( "SELECT * FROM comments WHERE post_id IN %s", (tuple(p['id'] for p in posts),) ) comments_by_post = defaultdict(list) for comment in comments: comments_by_post[comment['post_id']].append(comment) # Step 5: Assemble results result = [] for post in posts: result.append({ **post, 'author': authors_by_id.get(post['author_id']), 'comments': comments_by_post.get(post['id'], []) }) return result # Total queries: 3 (posts, authors, comments)# Instead of: 1 + N (posts) + N (comments) = 2N + 1 queriesWhile DataLoader was created for GraphQL, its pattern applies anywhere: REST APIs with embedded resources, batch processing jobs, or any scenario where multiple independent code paths request the same type of data. The key is batching requests within a single "request context" (HTTP request, background job, etc.).
Fixing individual N+1 issues is important, but establishing practices to prevent them systematically is even more valuable.
Organizational Practices:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
# ==========================================# Python/Django: Prevent by detecting at runtime# ========================================== # settings.py - strict mode for catching N+1 in developmentif DEBUG: # Raise exception on lazy loading (forces explicit eager loading) import warnings warnings.filterwarnings( 'error', r".*accessing.*lazy.*", category=DeprecationWarning ) # ==========================================# SQLAlchemy: Disable lazy loading globally# ========================================== from sqlalchemy.orm import configure_mappers # After all models are definedconfigure_mappers() # Make all relationships "raise" on lazy accessfor mapper in Base.registry.mappers: for rel in mapper.relationships: rel.lazy = 'raise' # Raises exception if accessed without loading # Now accessing unloaded relationships raises an error:posts = session.query(Post).all()posts[0].author.name # InvalidRequestError: 'Post.author' is not available # Force explicit loading:posts = session.query(Post).options(joinedload(Post.author)).all()posts[0].author.name # Works # ==========================================# Testing: Query Count Assertions# ========================================== import pytestfrom django.test.utils import CaptureQueriesContext @pytest.mark.django_dbdef test_post_list_query_count(client, django_assert_max_num_queries): # Create test data create_posts(count=50) # Assert maximum query count with django_assert_max_num_queries(3): # posts + authors + comments response = client.get('/api/posts/') assert response.status_code == 200 # If more than 3 queries execute, test fails with actual count # ==========================================# Logging: Make queries visible# ========================================== # Django settings.pyLOGGING = { 'version': 1, 'handlers': { 'console': {'class': 'logging.StreamHandler'}, }, 'loggers': { 'django.db.backends': { 'level': 'DEBUG', # Shows all SQL 'handlers': ['console'], }, },}Configuring your ORM to raise exceptions on lazy loading eliminates N+1 by forcing developers to explicitly declare what data they need. This is aggressive but highly effective—it's impossible to introduce N+1 if lazy access is forbidden.
The N+1 query problem is perhaps the most common performance anti-pattern in application development. Its prevalence, invisibility, and catastrophic scaling behavior make it essential to understand and prevent.
Congratulations! You've completed the module on Common Performance Issues. You now understand the major SQL performance anti-patterns: full table scans, implicit conversions, functions on indexed columns, correlated subquery problems, and the N+1 query problem. Armed with this knowledge, you can identify, diagnose, and resolve the most prevalent performance issues in database systems.