Loading learning content...
Many real-world queries ask: Is this value one of these? Find orders with status 'pending', 'processing', or 'shipped'. Show products in categories 5, 12, or 23. List employees in the Engineering, Product, or Design departments. The IN operator answers these questions elegantly, testing whether a value belongs to a specified set.
IN is more than syntactic convenience—it's a semantic statement about set membership that the query optimizer can recognize and optimize. Understanding IN deeply means understanding its relationship to equality, its NULL handling quirks, and its performance characteristics across different use cases.
By the end of this page, you will master the IN operator's syntax with literal lists and subqueries, understand the critical NULL handling difference between IN and NOT IN, and know how to write set membership queries that perform well at scale.
The IN operator tests whether a value matches any value in a specified set. The set can be a literal list or the result of a subquery.
Basic Syntax:
value IN (value1, value2, value3, ...)
value IN (SELECT column FROM table WHERE ...)
The expression value IN (a, b, c) is logically equivalent to:
value = a OR value = b OR value = c
12345678910111213141516171819
-- Literal list of valuesSELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');-- Returns orders with any of these three statuses SELECT * FROM products WHERE category_id IN (5, 12, 23, 45);-- Products in any of these categories SELECT * FROM employees WHERE department IN ('Engineering', 'Product', 'Design');-- Employees in any of these departments -- Single value (valid but unusual)SELECT * FROM orders WHERE status IN ('pending');-- Equivalent to: status = 'pending' -- Empty list behavior varies by databaseSELECT * FROM orders WHERE status IN ();-- PostgreSQL: Syntax error-- MySQL: Returns no rows (equivalent to FALSE)-- Some databases: Syntax errorType Compatibility:
All values in the IN list must be compatible with the column's data type. Mixed types may cause implicit conversions or errors:
123456789101112131415161718
-- Consistent types (correct)SELECT * FROM products WHERE id IN (1, 2, 3, 4, 5);SELECT * FROM users WHERE email IN ('a@x.com', 'b@x.com', 'c@x.com'); -- Mixed types (may work but risky)SELECT * FROM products WHERE id IN (1, 2, '3', 4);-- '3' may be converted to integer, but behavior varies -- Type mismatch (avoid)SELECT * FROM products WHERE id IN ('one', 'two');-- Error or unexpected results depending on database -- Dates in IN listsSELECT * FROM events WHERE event_date IN ('2024-01-15', '2024-02-20', '2024-03-25');-- String literals converted to dates -- Better: Explicit type for claritySELECT * FROM events WHERE event_date IN (DATE '2024-01-15', DATE '2024-02-20');For long IN lists, format with one value per line. This makes code reviews easier, diffs cleaner, and maintenance simpler. Some teams even sort the values alphabetically or numerically for consistency.
IN truly shines when combined with subqueries. This enables dynamic set membership testing against data computed at query time.
Basic Subquery Usage:
1234567891011121314151617181920212223242526
-- Find orders for high-value customersSELECT * FROM orders WHERE customer_id IN ( SELECT customer_id FROM customers WHERE lifetime_value > 100000); -- Products in popular categoriesSELECT * FROM products WHERE category_id IN ( SELECT category_id FROM order_items GROUP BY category_id HAVING COUNT(*) > 1000); -- Employees who manage someoneSELECT * FROM employees WHERE employee_id IN ( SELECT DISTINCT manager_id FROM employees WHERE manager_id IS NOT NULL); -- Users who placed orders this monthSELECT * FROM users WHERE user_id IN ( SELECT DISTINCT user_id FROM orders WHERE order_date >= DATE_TRUNC('month', CURRENT_DATE));Correlated vs Non-Correlated Subqueries:
Most IN subqueries are non-correlated—they execute once, independent of the outer query. Correlated IN subqueries reference the outer query and execute once per outer row (generally avoid these with IN):
1234567891011121314151617
-- Non-correlated (preferred): Subquery runs onceSELECT * FROM orders WHERE customer_id IN (SELECT customer_id FROM vip_customers);-- Subquery result is computed once and reused -- Correlated: Subquery runs once per outer row (avoid with IN)SELECT * FROM orders oWHERE o.product_id IN ( SELECT p.product_id FROM products p WHERE p.vendor_id = o.vendor_id -- References outer query);-- Subquery runs for each row in orders - potentially slow -- Better rewrite as JOIN:SELECT o.* FROM orders oINNER JOIN products p ON o.product_id = p.product_id AND p.vendor_id = o.vendor_id;An IN subquery must return exactly one column. WHERE id IN (SELECT id, name FROM ...) is an error. The returned column's type must be compatible with the compared value.
NOT IN tests whether a value is not in the specified set. It's logically equivalent to:
value <> a AND value <> b AND value <> c
Note the AND (not OR) and the inequality operators. This equivalence has profound implications for NULL handling.
12345678910111213141516171819
-- Basic NOT IN usageSELECT * FROM orders WHERE status NOT IN ('cancelled', 'returned', 'refunded');-- Orders that weren't cancelled, returned, or refunded SELECT * FROM products WHERE category_id NOT IN (1, 2, 3);-- Products outside categories 1, 2, and 3 -- NOT IN with subquerySELECT * FROM customers WHERE customer_id NOT IN ( SELECT DISTINCT customer_id FROM orders);-- Customers who have never placed an order -- Employees without direct reportsSELECT * FROM employees WHERE employee_id NOT IN ( SELECT DISTINCT manager_id FROM employees WHERE manager_id IS NOT NULL);NOT IN returns no rows if the set contains ANY NULL value! Because value <> NULL is UNKNOWN, and TRUE AND UNKNOWN is UNKNOWN, a single NULL in the IN list contaminates the entire result. This is one of the most common SQL bugs.
123456789101112131415161718192021222324252627282930313233
-- THE NOT IN NULL TRAP-- Subquery: SELECT manager_id FROM employees-- Returns: 101, 102, 103, NULL (some employees have no manager) SELECT * FROM employees WHERE employee_id NOT IN (SELECT manager_id FROM employees);-- RETURNS: **NO ROWS!**-- Even though many employee_ids are not 101, 102, or 103 -- Why? Expanded logic:-- employee_id NOT IN (101, 102, 103, NULL)-- = employee_id <> 101 AND employee_id <> 102 AND employee_id <> 103 AND employee_id <> NULL-- = TRUE AND TRUE AND TRUE AND UNKNOWN-- = UNKNOWN-- And WHERE only includes rows where the result is TRUE -- SOLUTION 1: Filter NULLs in subquerySELECT * FROM employees WHERE employee_id NOT IN ( SELECT manager_id FROM employees WHERE manager_id IS NOT NULL); -- SOLUTION 2: Use NOT EXISTS insteadSELECT * FROM employees eWHERE NOT EXISTS ( SELECT 1 FROM employees m WHERE m.manager_id = e.employee_id); -- SOLUTION 3: Use LEFT JOIN with NULL checkSELECT e.* FROM employees eLEFT JOIN (SELECT DISTINCT manager_id FROM employees) m ON e.employee_id = m.manager_idWHERE m.manager_id IS NULL;| Expression | Set Contains NULL? | Result for non-NULL value not in set |
|---|---|---|
x IN (1, 2, NULL) | Yes | UNKNOWN (if x ∉ {1,2}) |
x NOT IN (1, 2, NULL) | Yes | UNKNOWN (NEVER TRUE!) |
x IN (1, 2) | No | FALSE (if x ∉ {1,2}) |
x NOT IN (1, 2) | No | TRUE (if x ∉ {1,2}) |
SQL supports testing whether a tuple (multiple columns together) belongs to a set of tuples. This is called row constructor syntax or composite IN.
Syntax:
(column1, column2) IN ((val1a, val1b), (val2a, val2b), ...)
(column1, column2) IN (SELECT col1, col2 FROM ...)
123456789101112131415161718192021222324252627
-- Composite key lookupSELECT * FROM order_items WHERE (order_id, product_id) IN ((1001, 'SKU-A'), (1001, 'SKU-B'), (1002, 'SKU-A'));-- Find specific order-product combinations -- Using subquerySELECT * FROM order_items WHERE (order_id, product_id) IN ( SELECT order_id, product_id FROM returns);-- Find order items that were returned -- Alternative to complex JOIN conditionsSELECT * FROM shipments WHERE (origin_city, dest_city) IN ( SELECT hub_city, spoke_city FROM hub_routes); -- Historical data matchingSELECT * FROM prices WHERE (product_id, effective_date) IN ( SELECT product_id, MAX(effective_date) FROM prices WHERE effective_date <= CURRENT_DATE GROUP BY product_id);-- Current price for each product (latest effective date)Multi-column IN can be more efficient than equivalent AND/OR combinations. The optimizer can use composite indexes directly. For example, (a, b) IN ((1,2), (3,4)) can use an index on (a, b) as a single range scan rather than multiple seeks.
Database Support:
Multi-column IN is standard SQL and supported by PostgreSQL, MySQL, Oracle, and SQL Server (2022+). Older SQL Server versions require workarounds.
123456789101112
-- SQL Server (before 2022) workaround - use EXISTSSELECT * FROM order_items oiWHERE EXISTS ( SELECT 1 FROM (VALUES (1001, 'SKU-A'), (1001, 'SKU-B'), (1002, 'SKU-A')) AS targets(order_id, product_id) WHERE oi.order_id = targets.order_id AND oi.product_id = targets.product_id); -- Or use concatenation (less efficient, type-sensitive)SELECT * FROM order_items WHERE CONCAT(order_id, '-', product_id) IN ('1001-SKU-A', '1001-SKU-B', '1002-SKU-A');-- Warning: Concatenation can have type coercion issuesThree SQL constructs can express the same "find rows related to another set" logic: IN, EXISTS, and JOIN. Understanding their differences helps you choose the right tool.
Semantic Comparison:
123456789101112131415
-- Goal: Find customers who have placed orders -- Using INSELECT * FROM customers WHERE customer_id IN (SELECT customer_id FROM orders); -- Using EXISTS SELECT * FROM customers cWHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id); -- Using JOIN (DISTINCT required if customers have multiple orders)SELECT DISTINCT c.* FROM customers cINNER JOIN orders o ON c.customer_id = o.customer_id; -- All three return the same customers, but with different performance profiles| Aspect | IN | EXISTS | JOIN |
|---|---|---|---|
| NULL handling | Dangerous with NOT IN | Safe with NULLs | Safe with NULLs |
| Duplicates | Automatic dedup | Automatic dedup | May need DISTINCT |
| Correlated subquery | Possible but slow | Designed for this | N/A |
| Readability | Simple, declarative | More verbose | Explicit relationship |
| Optimizer handling | Often converts to semi-join | Semi-join pattern | Full join semantics |
When to Use Each:
123456789101112131415161718
-- NOT IN is dangerous with NULLs-- BAD: May return no rows if subquery has NULLsSELECT * FROM customers WHERE customer_id NOT IN (SELECT customer_id FROM blacklist); -- GOOD: NOT EXISTS handles NULLs correctlySELECT * FROM customers cWHERE NOT EXISTS ( SELECT 1 FROM blacklist b WHERE b.customer_id = c.customer_id); -- GOOD: LEFT JOIN with NULL checkSELECT c.* FROM customers cLEFT JOIN blacklist b ON c.customer_id = b.customer_idWHERE b.customer_id IS NULL; -- Rule of thumb: Prefer NOT EXISTS over NOT IN-- unless you're absolutely certain the subquery cannot return NULLIN's performance depends on list size, subquery complexity, indexing, and optimizer behavior.
Literal List Performance:
1234567891011121314151617181920
-- Small lists: Efficient index seeksSELECT * FROM products WHERE id IN (1, 2, 3, 4, 5);-- Optimizer converts to: id=1 OR id=2 OR id=3 OR id=4 OR id=5-- With index, executes as multiple index seeks -- Larger lists: May switch strategiesSELECT * FROM products WHERE id IN (1, 2, 3, ... 100 values ...);-- Optimizer may:-- 1. Convert to range scan if values are contiguous-- 2. Build temporary table and perform semi-join-- 3. Fall back to IN-list iteration -- Very large lists: Performance degradesSELECT * FROM products WHERE id IN (1, 2, 3, ... 10000 values ...);-- Parser overhead; may hit limits-- Consider: Temporary table + JOIN instead -- Check execution planEXPLAIN SELECT * FROM products WHERE id IN (1, 2, 3, 4, 5);-- Look for: Index Seek / Index Scan / Hash Semi JoinMost databases have practical limits on IN list size. SQL Server: ~2,100 parameters. Oracle: 1,000 items per IN clause. PostgreSQL: No hard limit but performance degrades. For very large sets, load data into a temporary table and JOIN.
Subquery IN Performance:
Modern optimizers typically convert IN subqueries to semi-joins, which are highly efficient:
1234567891011121314151617181920
-- IN subquery often becomes semi-joinSELECT * FROM orders WHERE customer_id IN (SELECT customer_id FROM vip_customers); -- Explain may show:-- -> Hash Semi Join -- -> Seq Scan on orders-- -> Hash -- -> Seq Scan on vip_customers -- Semi-join properties:-- - Stops after first match (doesn't fetch all matching rows)-- - Automatically deduplicates-- - Often uses hash or merge join strategies -- Index on subquery's column helpsCREATE INDEX idx_vip_customer_id ON vip_customers(customer_id); -- For NOT IN, use NOT EXISTS to avoid NULL issues-- NOT EXISTS also converts to anti-join, which is efficient| Scenario | Typical Optimization | Tips |
|---|---|---|
| Small literal list (< 10) | Multiple index seeks | Ensure index exists on column |
| Medium literal list (< 100) | Index range scan or hash lookup | Watch for plan changes |
| Large literal list (> 100) | Temp table + hash join | Consider explicit temp table |
| Non-correlated subquery | Semi-join (hash or merge) | Index subquery's result column |
| Correlated subquery | Per-row execution (slow!) | Rewrite as JOIN or non-correlated |
IN can test expressions, not just column values. This enables flexible filtering but may impact performance.
Expression on Left Side:
1234567891011121314151617181920212223
-- Expression as test valueSELECT * FROM products WHERE UPPER(category) IN ('ELECTRONICS', 'APPLIANCES', 'GADGETS'); SELECT * FROM orders WHERE EXTRACT(MONTH FROM order_date) IN (1, 2, 3); -- Q1 orders SELECT * FROM transactions WHERE ABS(amount) IN (100, 200, 500); -- Specific absolute amounts -- Computed columnSELECT * FROM order_items WHERE quantity * unit_price IN ( SELECT target_amount FROM promotion_thresholds); -- WARNING: Expressions on columns often prevent index usage-- The above queries may require full table scans -- Sargable alternatives:-- Instead of: UPPER(category) IN ('ELECTRONICS', ...)-- Use: category IN ('Electronics', 'electronics', 'ELECTRONICS', ...)-- Or: Create functional index on UPPER(category)Expressions in the IN List:
1234567891011121314151617181920212223242526272829
-- Expressions as list valuesSELECT * FROM orders WHERE status IN ( 'pending', CASE WHEN @include_processing THEN 'processing' ELSE NULL END, 'shipped');-- Dynamic list based on parameter (NULL in list has no effect for IN) -- Function results in listSELECT * FROM events WHERE event_date IN ( CURRENT_DATE, CURRENT_DATE - INTERVAL '1 day', CURRENT_DATE - INTERVAL '2 days');-- Today and previous two days -- Scalar subqueries in listSELECT * FROM employees WHERE department_id IN ( (SELECT department_id FROM departments WHERE name = 'Engineering'), (SELECT department_id FROM departments WHERE name = 'Product'));-- Though better written as single subquery:SELECT * FROM employees WHERE department_id IN ( SELECT department_id FROM departments WHERE name IN ('Engineering', 'Product'));Including NULL in an IN list has no effect—x IN (1, NULL, 3) matches 1 and 3, but never matches 'because of NULL'. The NULL is essentially ignored. However, remember that NOT IN with NULL in the list returns no matches!
Proper type handling in IN lists prevents subtle bugs and ensures optimal performance.
String IN Lists:
1234567891011121314151617
-- Case sensitivity depends on collationSELECT * FROM products WHERE category IN ('Electronics', 'Appliances');-- May or may not match 'electronics' depending on collation -- Whitespace mattersSELECT * FROM users WHERE role IN ('admin', 'admin ', ' admin');-- These are three different values in most databases-- Ensure consistent data or TRIM() the column -- Unicode and encodingSELECT * FROM locations WHERE country IN (N'日本', N'中国', N'한국');-- Use N'' prefix for Unicode literals in SQL Server -- Empty stringsSELECT * FROM contacts WHERE phone IN ('', NULL);-- Empty string is NOT the same as NULL-- This finds empty string phones, but the NULL in the list is ignoredNumeric IN Lists:
123456789101112131415
-- Integer listSELECT * FROM products WHERE id IN (1, 2, 3, 4, 5); -- Decimal precisionSELECT * FROM prices WHERE amount IN (9.99, 19.99, 29.99);-- Exact matches required for DECIMAL/NUMERIC types -- Floating-point cautionSELECT * FROM measurements WHERE reading IN (1.1, 2.2, 3.3);-- May miss values due to floating-point representation-- Consider: reading BETWEEN 1.09 AND 1.11 for 1.1 -- Mixed integer/decimalSELECT * FROM products WHERE price IN (10, 20.00, 30);-- Works due to implicit conversion, but explicit is saferDate/Time IN Lists:
123456789101112131415161718192021222324
-- DATE columnSELECT * FROM holidays WHERE holiday_date IN ('2024-01-01', '2024-12-25', '2024-07-04'); -- TIMESTAMP column: Time component matters!SELECT * FROM events WHERE event_time IN ( '2024-06-15 09:00:00', '2024-06-15 14:00:00', '2024-06-15 18:00:00');-- Only matches exact timestamps -- For "anytime on these dates", use date extraction or BETWEENSELECT * FROM events WHERE DATE(event_time) IN ('2024-06-15', '2024-06-16');-- OrSELECT * FROM events WHERE event_time >= '2024-06-15' AND event_time < '2024-06-17'; -- Timezone considerationsSELECT * FROM meetings WHERE start_time AT TIME ZONE 'UTC' IN ( '2024-06-15 14:00:00+00', '2024-06-16 14:00:00+00');The IN operator provides expressive power for set membership queries, but demands careful attention to NULL handling and performance optimization.
What's Next:
With set membership queries mastered, we complete our WHERE clause exploration with LIKE patterns—SQL's mechanism for flexible string matching. You'll learn wildcard syntax, escape sequences, pattern design for performance, and alternatives for complex pattern matching needs.
You now understand the IN operator comprehensively—from basic list membership to subquery semantics to the critical NOT IN NULL trap. Set membership queries are now a reliable tool in your SQL arsenal.