Loading content...
In the world of programming, we work with concrete values—integers, strings, booleans. We're comfortable with the binary nature of most operations: something either equals something else, or it doesn't. A variable either has a value, or it's initialized to zero, an empty string, or false.
Then we encounter NULL in SQL, and everything we thought we knew about logic breaks down.
NULL is not zero. NULL is not an empty string. NULL is not false. NULL is not even equal to NULL. It's something far more subtle and philosophically profound: it represents the absence of a value, a marker indicating unknown or not applicable information.
Understanding NULL is not merely a technical requirement—it's essential to writing correct SQL queries. Misunderstanding NULL leads to silent bugs, incorrect query results, and data integrity issues that can persist undetected for months or years.
By the end of this page, you will understand the philosophical foundations of NULL, the historical context of its introduction to relational databases, the different semantic meanings NULL can represent, and why NULL requires special handling in every SQL operation.
NULL was introduced to the relational model by E.F. Codd in the early 1970s. Codd recognized a fundamental problem: in any real-world database, there will be situations where information is simply not available.
Consider a customer database with a phone number field. What do you store when:
Before NULL, developers used arbitrary placeholder values: empty strings, zeros, special sentinel values like -1 or 9999999999. This approach created serious problems:
0 really zero, or does it mean 'unknown'?Codd proposed NULL as a universal marker for missing information, distinct from all regular data values. The database system would recognize NULL and handle it specially, enforcing consistent behavior across all operations. This was a revolutionary idea that fundamentally changed how databases handle incomplete data.
The elegance of NULL lies in its universality.
Instead of requiring every application to handle missing data in its own ad-hoc way, NULL provides a database-level solution. The DBMS knows that NULL means 'missing' and applies consistent rules:
= or <> operatorsThis consistency is a major advantage of the relational model. No matter who writes the query, NULL behaves the same way.
NULL doesn't have a single meaning—it can represent several different semantic states, all unified under one marker. Understanding these different meanings is crucial for proper database design and query writing.
The Three Interpretations of NULL:
| Interpretation | Meaning | Example |
|---|---|---|
| Unknown | A value exists but we don't know it | A customer's birth date that hasn't been collected yet |
| Not Applicable | The attribute doesn't apply to this entity | Spouse name for an unmarried person |
| Not Yet Assigned | A value will be assigned later | Employee's project assignment before project starts |
SQL uses a single NULL marker for all three interpretations. This means the database cannot distinguish between 'unknown birth date' and 'birth date not applicable'. Some researchers (including Codd himself in later work) proposed multiple NULL types to handle these cases differently, but standard SQL retains the single-NULL approach for simplicity.
Why does this matter?
The semantic meaning of NULL affects how you should design your schema and interpret query results:
Unknown values suggest the data might be obtained later:
-- Customer hasn't provided phone yet
INSERT INTO customers (name, phone)
VALUES ('Alice Smith', NULL);
Not applicable values suggest the attribute shouldn't exist for this entity:
-- Single-member households don't have a spouse
-- NULL in spouse_name doesn't mean 'unknown spouse'
INSERT INTO households (member_name, spouse_name)
VALUES ('Bob Jones', NULL);
Not yet assigned values suggest a pending operation:
-- New employee awaiting manager assignment
INSERT INTO employees (name, manager_id)
VALUES ('Carol Lee', NULL);
In each case, NULL appears in the database identically, but the business meaning is different. As a query writer, you must understand the context to interpret results correctly.
This is the most counterintuitive aspect of NULL and the source of endless bugs: NULL is not a value—it's a marker indicating the absence of a value.
When you ask 'Is NULL equal to NULL?', you're asking the wrong question. It's like asking 'Is the unknown equal to the unknown?' We don't know what either value is, so we can't determine if they're equal.
This has profound implications for SQL queries:
123456789101112131415161718
-- These comparisons might surprise you: -- This is NOT TRUE (it evaluates to UNKNOWN)SELECT * FROM employees WHERE commission = NULL; -- This is also NOT TRUE (still UNKNOWN)SELECT * FROM employees WHERE NULL = NULL; -- This always returns 0 rows, regardless of dataSELECT * FROM employees WHERE commission = NULL; -- Even more surprising:SELECT * FROM employees WHERE NOT (commission = NULL);-- This ALSO returns 0 rows! NOT UNKNOWN = UNKNOWN -- The correct way to check for NULL:SELECT * FROM employees WHERE commission IS NULL;SELECT * FROM employees WHERE commission IS NOT NULL;Using column = NULL instead of column IS NULL is one of the most common SQL bugs. It silently returns zero rows instead of throwing an error, making it particularly insidious. Modern SQL linters flag this as a warning, but it still appears constantly in production code.
The philosophical foundation:
Think of NULL as representing a set of possible values—we don't know which one. When you compare NULL to 5, you're really asking: 'Is some unknown value equal to 5?' The answer is: 'We don't know—it might be 5, or it might not.'
This uncertainty propagates through all operations:
| Expression | Result | Reasoning |
|---|---|---|
NULL = NULL | UNKNOWN | Both values are unknown; we can't determine equality |
NULL <> NULL | UNKNOWN | Same reasoning—we can't determine inequality either |
NULL > 5 | UNKNOWN | The unknown value might be greater, or might not |
NULL + 10 | NULL | Unknown value plus 10 is still unknown |
NULL AND TRUE | UNKNOWN | If NULL were FALSE, result would be FALSE; if TRUE, result would be TRUE |
NULL OR TRUE | TRUE | Regardless of NULL's value, OR with TRUE is TRUE |
NULL behavior varies slightly depending on where it appears in SQL operations. Understanding these contexts helps you predict and control query behavior.
12345678910111213141516171819202122232425262728293031323334353637383940
-- Sample data for examplesCREATE TABLE sales ( id INT PRIMARY KEY, product VARCHAR(50), amount DECIMAL(10,2), region VARCHAR(50)); INSERT INTO sales VALUES (1, 'Widget', 100.00, 'North'),(2, 'Widget', NULL, 'South'),(3, 'Gadget', 200.00, NULL),(4, 'Gadget', 150.00, 'North'),(5, 'Widget', NULL, NULL); -- Aggregate behavior with NULLSELECT COUNT(*) as total_rows, -- Returns 5 COUNT(amount) as rows_with_amount, -- Returns 3 (ignores 2 NULLs) SUM(amount) as total_amount, -- Returns 450.00 (ignores NULLs) AVG(amount) as avg_amount -- Returns 150.00 (450/3, not 450/5)FROM sales; -- GROUP BY with NULLSELECT region, COUNT(*) as countFROM salesGROUP BY region;-- Results:-- North: 2-- South: 1 -- NULL: 2 (both NULL regions grouped together) -- ORDER BY with NULLSELECT product, amount FROM sales ORDER BY amount NULLS LAST; -- NULL amounts appear at the end SELECT product, amount FROM sales ORDER BY amount NULLS FIRST; -- NULL amounts appear at the startNotice that AVG ignores NULL values in both numerator and denominator. If you have sales amounts of 100, NULL, 200, NULL, 150, the AVG is (100+200+150)/3 = 150, not (100+200+150)/5 = 90. This is usually the desired behavior, but can surprise developers expecting the 5-row average.
The ability to allow or prohibit NULL values is a fundamental tool for maintaining data integrity. Schema designers must carefully consider which columns can be NULL and which cannot.
1234567891011121314151617181920212223242526272829303132
-- Schema design with NULL considerationsCREATE TABLE employees ( -- Never NULL - primary identifier employee_id INT PRIMARY KEY, -- Never NULL - essential employee info first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(100) NOT NULL UNIQUE, hire_date DATE NOT NULL, department_id INT NOT NULL, -- Nullable - optional information middle_name VARCHAR(50), -- Not everyone has one phone VARCHAR(20), -- Not required for employment personal_email VARCHAR(100), -- Optional secondary contact -- Nullable - not applicable cases manager_id INT, -- CEO has no manager termination_date DATE, -- Only for former employees -- Nullable - not yet assigned office_id INT, -- Remote workers may not have office parking_spot VARCHAR(10), -- Not all employees get parking -- Nullable with semantic meaning commission_rate DECIMAL(5,2), -- Only sales staff have commission FOREIGN KEY (department_id) REFERENCES departments(id), FOREIGN KEY (manager_id) REFERENCES employees(employee_id), FOREIGN KEY (office_id) REFERENCES offices(id));Default to NOT NULL unless you have a specific reason to allow NULL. This forces you to think about what value should exist for every row and prevents accidental NULL insertion. When you do allow NULL, document why in your schema comments.
While NULL semantics are standardized in ANSI SQL, different database systems have subtle variations in behavior. Understanding these differences is crucial when working across multiple platforms or migrating between systems.
| Behavior | PostgreSQL | MySQL | Oracle | SQL Server |
|---|---|---|---|---|
| Empty string = NULL | No | No | Yes | No |
| NULL in UNIQUE constraint | Multiple allowed | Multiple allowed | One allowed | One allowed (default) |
| String concatenation with NULL | Returns NULL | Returns NULL | Returns NULL | Returns NULL (CONCAT_NULL_YIELDS_NULL) |
| NULL ordering default | NULLS LAST | NULLS FIRST (ASC) | NULLS LAST | NULLS FIRST |
| Boolean NULL | Full 3VL support | Limited | No native boolean | Limited |
Oracle treats empty strings ('') as NULL. This is a significant deviation from the SQL standard. If you insert an empty string, Oracle stores NULL. This affects migrations, comparisons, and LENGTH() calculations. Always test string handling carefully when working with Oracle.
1234567891011121314151617181920212223242526272829303132
-- PostgreSQL / MySQL / SQL ServerINSERT INTO test (name) VALUES ('');SELECT * FROM test WHERE name = ''; -- Finds the rowSELECT * FROM test WHERE name IS NULL; -- Does NOT find the row -- OracleINSERT INTO test (name) VALUES (''); -- Stores as NULL!SELECT * FROM test WHERE name = ''; -- Returns NO rowsSELECT * FROM test WHERE name IS NULL; -- FINDS the row -- UNIQUE constraint differencesCREATE TABLE unique_test ( id INT PRIMARY KEY, code VARCHAR(10) UNIQUE); -- PostgreSQL/MySQL: Both inserts succeed-- Oracle/SQL Server: Second insert fails (duplicate NULL)INSERT INTO unique_test VALUES (1, NULL);INSERT INTO unique_test VALUES (2, NULL); -- May fail depending on DBMS -- Controlling ORDER BY NULL placement-- ANSI SQL (PostgreSQL, Oracle):SELECT * FROM employees ORDER BY commission NULLS FIRST;SELECT * FROM employees ORDER BY commission NULLS LAST; -- MySQL workaround:SELECT * FROM employees ORDER BY commission IS NULL, commission; -- SQL Server workaround:SELECT * FROM employees ORDER BY CASE WHEN commission IS NULL THEN 1 ELSE 0 END, commission;Let's address and correct the most common misconceptions about NULL that trip up developers at all experience levels.
OR column IS NULL if you want NULL rows.WHERE x NOT IN (1, 2, NULL) returns zero rows always! The NULL comparison returns UNKNOWN, which propagates to exclude all rows.12345678910111213141516171819202122232425262728293031
-- The infamous NOT IN with NULL trapCREATE TABLE products (id INT, category_id INT);CREATE TABLE excluded_categories (id INT); INSERT INTO products VALUES (1, 10), (2, 20), (3, 30);INSERT INTO excluded_categories VALUES (20), (NULL); -- You want products not in excluded categories-- This returns ZERO rows! (not 1 and 3 as expected)SELECT * FROM products WHERE category_id NOT IN (SELECT id FROM excluded_categories); -- Why? The subquery returns (20, NULL)-- For product 1 (category_id = 10):-- 10 NOT IN (20, NULL) -- = NOT (10 = 20 OR 10 = NULL)-- = NOT (FALSE OR UNKNOWN)-- = NOT UNKNOWN-- = UNKNOWN ← Row excluded! -- The fix: Filter out NULLs or use NOT EXISTSSELECT * FROM products WHERE category_id NOT IN ( SELECT id FROM excluded_categories WHERE id IS NOT NULL); -- Or better, use NOT EXISTS (handles NULL gracefully)SELECT * FROM products pWHERE NOT EXISTS ( SELECT 1 FROM excluded_categories e WHERE e.id = p.category_id);The NOT IN with NULL subquery is perhaps the most dangerous NULL pitfall in SQL. It silently returns zero rows instead of what you expect. Many experienced developers have been bitten by this. Always prefer NOT EXISTS or explicitly filter NULLs from IN subqueries.
We've explored the philosophical and practical foundations of NULL. Let's consolidate the key insights:
What's next:
Now that we understand what NULL means conceptually, we'll dive into the specific operators used to detect NULL values: IS NULL and IS NOT NULL. We'll explore their syntax, behavior in complex expressions, and best practices for NULL detection.
You now understand the philosophical foundations of NULL in SQL databases. NULL is not zero, not empty, and not even equal to itself—it's a marker for missing information that requires special handling throughout SQL operations. Next, we'll learn the specific syntax for detecting and handling NULL values.