Null Handling - Learning Module

Loading content...

0/252

NULL Meaning: Understanding the Absence of Data

The Most Misunderstood Value in Databases

In the world of programming, we work with concrete values—integers, strings, booleans. We're comfortable with the binary nature of most operations: something either equals something else, or it doesn't. A variable either has a value, or it's initialized to zero, an empty string, or false.

Then we encounter NULL in SQL, and everything we thought we knew about logic breaks down.

NULL is not zero. NULL is not an empty string. NULL is not false. NULL is not even equal to NULL. It's something far more subtle and philosophically profound: it represents the absence of a value, a marker indicating unknown or not applicable information.

Understanding NULL is not merely a technical requirement—it's essential to writing correct SQL queries. Misunderstanding NULL leads to silent bugs, incorrect query results, and data integrity issues that can persist undetected for months or years.

What You Will Learn

By the end of this page, you will understand the philosophical foundations of NULL, the historical context of its introduction to relational databases, the different semantic meanings NULL can represent, and why NULL requires special handling in every SQL operation.

The Invention of NULL

NULL was introduced to the relational model by E.F. Codd in the early 1970s. Codd recognized a fundamental problem: in any real-world database, there will be situations where information is simply not available.

Consider a customer database with a phone number field. What do you store when:

A customer refuses to provide their phone number?
The phone number field was added to the system after the customer registered?
The customer's phone number is temporarily unknown pending verification?

Before NULL, developers used arbitrary placeholder values: empty strings, zeros, special sentinel values like -1 or 9999999999. This approach created serious problems:

Semantic confusion: Is a salary of 0 really zero, or does it mean 'unknown'?
Computational errors: Average calculations including fake zeros produce wrong results
Query complexity: Every query must check for special placeholder values
Inconsistency: Different developers choose different placeholders for the same meaning

Codd's Vision

Codd proposed NULL as a universal marker for missing information, distinct from all regular data values. The database system would recognize NULL and handle it specially, enforcing consistent behavior across all operations. This was a revolutionary idea that fundamentally changed how databases handle incomplete data.

The elegance of NULL lies in its universality.

Instead of requiring every application to handle missing data in its own ad-hoc way, NULL provides a database-level solution. The DBMS knows that NULL means 'missing' and applies consistent rules:

NULL cannot be compared with = or <> operators
NULL propagates through arithmetic operations
Aggregate functions have defined behavior for NULL
NULL handling is standardized across all SQL implementations

This consistency is a major advantage of the relational model. No matter who writes the query, NULL behaves the same way.

What NULL Actually Means

NULL doesn't have a single meaning—it can represent several different semantic states, all unified under one marker. Understanding these different meanings is crucial for proper database design and query writing.

The Three Interpretations of NULL:

Semantic Meanings of NULL
Interpretation	Meaning	Example
Unknown	A value exists but we don't know it	A customer's birth date that hasn't been collected yet
Not Applicable	The attribute doesn't apply to this entity	Spouse name for an unmarried person
Not Yet Assigned	A value will be assigned later	Employee's project assignment before project starts

The Single-NULL Limitation

SQL uses a single NULL marker for all three interpretations. This means the database cannot distinguish between 'unknown birth date' and 'birth date not applicable'. Some researchers (including Codd himself in later work) proposed multiple NULL types to handle these cases differently, but standard SQL retains the single-NULL approach for simplicity.

Why does this matter?

The semantic meaning of NULL affects how you should design your schema and interpret query results:

Unknown values suggest the data might be obtained later:

-- Customer hasn't provided phone yet
INSERT INTO customers (name, phone) 
VALUES ('Alice Smith', NULL);

Not applicable values suggest the attribute shouldn't exist for this entity:

-- Single-member households don't have a spouse
-- NULL in spouse_name doesn't mean 'unknown spouse'
INSERT INTO households (member_name, spouse_name) 
VALUES ('Bob Jones', NULL);

Not yet assigned values suggest a pending operation:

-- New employee awaiting manager assignment
INSERT INTO employees (name, manager_id) 
VALUES ('Carol Lee', NULL);

In each case, NULL appears in the database identically, but the business meaning is different. As a query writer, you must understand the context to interpret results correctly.

NULL Is Not a Value

This is the most counterintuitive aspect of NULL and the source of endless bugs: NULL is not a value—it's a marker indicating the absence of a value.

When you ask 'Is NULL equal to NULL?', you're asking the wrong question. It's like asking 'Is the unknown equal to the unknown?' We don't know what either value is, so we can't determine if they're equal.

This has profound implications for SQL queries:

null_comparison_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- These comparisons might surprise you:
 
-- This is NOT TRUE (it evaluates to UNKNOWN)
SELECT * FROM employees WHERE commission = NULL;
 
-- This is also NOT TRUE (still UNKNOWN)
SELECT * FROM employees WHERE NULL = NULL;
 
-- This always returns 0 rows, regardless of data
SELECT * FROM employees WHERE commission = NULL;
 
-- Even more surprising:
SELECT * FROM employees WHERE NOT (commission = NULL);
-- This ALSO returns 0 rows! NOT UNKNOWN = UNKNOWN
 
-- The correct way to check for NULL:
SELECT * FROM employees WHERE commission IS NULL;
SELECT * FROM employees WHERE commission IS NOT NULL;

The Cardinal Sin of NULL

Using column = NULL instead of column IS NULL is one of the most common SQL bugs. It silently returns zero rows instead of throwing an error, making it particularly insidious. Modern SQL linters flag this as a warning, but it still appears constantly in production code.

The philosophical foundation:

Think of NULL as representing a set of possible values—we don't know which one. When you compare NULL to 5, you're really asking: 'Is some unknown value equal to 5?' The answer is: 'We don't know—it might be 5, or it might not.'

This uncertainty propagates through all operations:

Expression	Result	Reasoning
`NULL = NULL`	UNKNOWN	Both values are unknown; we can't determine equality
`NULL <> NULL`	UNKNOWN	Same reasoning—we can't determine inequality either
`NULL > 5`	UNKNOWN	The unknown value might be greater, or might not
`NULL + 10`	NULL	Unknown value plus 10 is still unknown
`NULL AND TRUE`	UNKNOWN	If NULL were FALSE, result would be FALSE; if TRUE, result would be TRUE
`NULL OR TRUE`	TRUE	Regardless of NULL's value, OR with TRUE is TRUE

NULL in Different Database Contexts

NULL behavior varies slightly depending on where it appears in SQL operations. Understanding these contexts helps you predict and control query behavior.

NULL Behavior Across SQL Contexts

•In WHERE Clauses — Rows where the condition evaluates to UNKNOWN are excluded from results. Only rows where the condition is TRUE pass through. This means NULL values often 'disappear' unexpectedly.
•In Aggregate Functions — Most aggregates (SUM, AVG, MIN, MAX) ignore NULL values. COUNT(*) counts all rows, but COUNT(column) counts only non-NULL values. This distinction is critical for accurate calculations.
•In DISTINCT Operations — Two NULL values are considered equal for purposes of DISTINCT. This seems contradictory to NULL ≠ NULL, but is a practical concession to usability.
•In GROUP BY — Similar to DISTINCT, NULL values are grouped together into a single group. All rows with NULL in the grouping column appear in one group.
•In ORDER BY — The position of NULL values (first or last) varies by database. ANSI SQL defines NULLS FIRST and NULLS LAST clauses to control this explicitly.
•In Indexes — Most databases include NULL values in indexes, but behavior varies. Some indexes may not include NULLs, affecting query performance.

null_context_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Sample data for examples
CREATE TABLE sales (
    id INT PRIMARY KEY,
    product VARCHAR(50),
    amount DECIMAL(10,2),
    region VARCHAR(50)
);
 
INSERT INTO sales VALUES 
(1, 'Widget', 100.00, 'North'),
(2, 'Widget', NULL, 'South'),
(3, 'Gadget', 200.00, NULL),
(4, 'Gadget', 150.00, 'North'),
(5, 'Widget', NULL, NULL);
 
-- Aggregate behavior with NULL
SELECT 
    COUNT(*) as total_rows,           -- Returns 5
    COUNT(amount) as rows_with_amount, -- Returns 3 (ignores 2 NULLs)
    SUM(amount) as total_amount,       -- Returns 450.00 (ignores NULLs)
    AVG(amount) as avg_amount          -- Returns 150.00 (450/3, not 450/5)
FROM sales;
 
-- GROUP BY with NULL
SELECT region, COUNT(*) as count
FROM sales
GROUP BY region;
-- Results:
-- North: 2
-- South: 1  
-- NULL:  2  (both NULL regions grouped together)
 
-- ORDER BY with NULL
SELECT product, amount 
FROM sales 
ORDER BY amount NULLS LAST; -- NULL amounts appear at the end
 
SELECT product, amount 
FROM sales 
ORDER BY amount NULLS FIRST; -- NULL amounts appear at the start

The AVG Gotcha

Notice that AVG ignores NULL values in both numerator and denominator. If you have sales amounts of 100, NULL, 200, NULL, 150, the AVG is (100+200+150)/3 = 150, not (100+200+150)/5 = 90. This is usually the desired behavior, but can surprise developers expecting the 5-row average.

NULL and Data Integrity

The ability to allow or prohibit NULL values is a fundamental tool for maintaining data integrity. Schema designers must carefully consider which columns can be NULL and which cannot.

When to Prohibit NULL

•Primary key columns — By definition cannot be NULL
•Required business data — Customer name, order date, etc.
•Columns in unique constraints — Depending on requirements
•Status indicators — Every order should have a status
•Audit columns — created_at, updated_at should always exist

When to Allow NULL

•Optional information — Middle name, phone number
•Not-yet-known values — Delivery date before shipping
•Not-applicable cases — Spouse name for single person
•Foreign keys for optional relationships — Manager for CEO
•Historical data — Fields added after initial records

null_constraints.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Schema design with NULL considerations
CREATE TABLE employees (
    -- Never NULL - primary identifier
    employee_id INT PRIMARY KEY,
    
    -- Never NULL - essential employee info
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL UNIQUE,
    hire_date DATE NOT NULL,
    department_id INT NOT NULL,
    
    -- Nullable - optional information
    middle_name VARCHAR(50),        -- Not everyone has one
    phone VARCHAR(20),              -- Not required for employment
    personal_email VARCHAR(100),    -- Optional secondary contact
    
    -- Nullable - not applicable cases
    manager_id INT,                 -- CEO has no manager
    termination_date DATE,          -- Only for former employees
    
    -- Nullable - not yet assigned
    office_id INT,                  -- Remote workers may not have office
    parking_spot VARCHAR(10),       -- Not all employees get parking
    
    -- Nullable with semantic meaning
    commission_rate DECIMAL(5,2),   -- Only sales staff have commission
    
    FOREIGN KEY (department_id) REFERENCES departments(id),
    FOREIGN KEY (manager_id) REFERENCES employees(employee_id),
    FOREIGN KEY (office_id) REFERENCES offices(id)
);

Design Principle

Default to NOT NULL unless you have a specific reason to allow NULL. This forces you to think about what value should exist for every row and prevents accidental NULL insertion. When you do allow NULL, document why in your schema comments.

NULL Across Database Systems

While NULL semantics are standardized in ANSI SQL, different database systems have subtle variations in behavior. Understanding these differences is crucial when working across multiple platforms or migrating between systems.

NULL Handling Variations Across Major DBMS
Behavior	PostgreSQL	MySQL	Oracle	SQL Server
Empty string = NULL	No	No	Yes	No
NULL in UNIQUE constraint	Multiple allowed	Multiple allowed	One allowed	One allowed (default)
String concatenation with NULL	Returns NULL	Returns NULL	Returns NULL	Returns NULL (CONCAT_NULL_YIELDS_NULL)
NULL ordering default	NULLS LAST	NULLS FIRST (ASC)	NULLS LAST	NULLS FIRST
Boolean NULL	Full 3VL support	Limited	No native boolean	Limited

Oracle's Empty String Behavior

Oracle treats empty strings ('') as NULL. This is a significant deviation from the SQL standard. If you insert an empty string, Oracle stores NULL. This affects migrations, comparisons, and LENGTH() calculations. Always test string handling carefully when working with Oracle.

vendor_differences.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- PostgreSQL / MySQL / SQL Server
INSERT INTO test (name) VALUES ('');
SELECT * FROM test WHERE name = '';  -- Finds the row
SELECT * FROM test WHERE name IS NULL;  -- Does NOT find the row
 
-- Oracle
INSERT INTO test (name) VALUES ('');  -- Stores as NULL!
SELECT * FROM test WHERE name = '';  -- Returns NO rows
SELECT * FROM test WHERE name IS NULL;  -- FINDS the row
 
-- UNIQUE constraint differences
CREATE TABLE unique_test (
    id INT PRIMARY KEY,
    code VARCHAR(10) UNIQUE
);
 
-- PostgreSQL/MySQL: Both inserts succeed
-- Oracle/SQL Server: Second insert fails (duplicate NULL)
INSERT INTO unique_test VALUES (1, NULL);
INSERT INTO unique_test VALUES (2, NULL);  -- May fail depending on DBMS
 
-- Controlling ORDER BY NULL placement
-- ANSI SQL (PostgreSQL, Oracle):
SELECT * FROM employees ORDER BY commission NULLS FIRST;
SELECT * FROM employees ORDER BY commission NULLS LAST;
 
-- MySQL workaround:
SELECT * FROM employees ORDER BY commission IS NULL, commission;
 
-- SQL Server workaround:
SELECT * FROM employees 
ORDER BY CASE WHEN commission IS NULL THEN 1 ELSE 0 END, commission;

Common NULL Misconceptions

Let's address and correct the most common misconceptions about NULL that trip up developers at all experience levels.

NULL Misconceptions Debunked

•"NULL means zero or empty" — WRONG. NULL means absence of any value. Zero is a known value. Empty string is a known value (except in Oracle). NULL is lack of value.
•"NULL = NULL is true" — WRONG. NULL = NULL evaluates to UNKNOWN, not TRUE. You must use IS NULL operator for NULL comparisons.
•"WHERE column != value" returns NULL rows — WRONG. When column is NULL, the comparison returns UNKNOWN, and the row is excluded. You must explicitly include OR column IS NULL if you want NULL rows.
•"Count(*) and COUNT(column) are the same" — WRONG. COUNT(*) counts all rows. COUNT(column) counts only non-NULL values in that column. They differ when NULLs exist.
•"NULL is falsy like in programming languages" — WRONG. NULL is not false. In WHERE clauses, only TRUE passes. UNKNOWN (including NOT NULL) does not pass. This is three-valued logic, not boolean logic.
•"NOT IN handles NULL correctly" — DANGEROUS. WHERE x NOT IN (1, 2, NULL) returns zero rows always! The NULL comparison returns UNKNOWN, which propagates to exclude all rows.

not_in_null_trap.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- The infamous NOT IN with NULL trap
CREATE TABLE products (id INT, category_id INT);
CREATE TABLE excluded_categories (id INT);
 
INSERT INTO products VALUES (1, 10), (2, 20), (3, 30);
INSERT INTO excluded_categories VALUES (20), (NULL);
 
-- You want products not in excluded categories
-- This returns ZERO rows! (not 1 and 3 as expected)
SELECT * FROM products 
WHERE category_id NOT IN (SELECT id FROM excluded_categories);
 
-- Why? The subquery returns (20, NULL)
-- For product 1 (category_id = 10):
--   10 NOT IN (20, NULL) 
--   = NOT (10 = 20 OR 10 = NULL)
--   = NOT (FALSE OR UNKNOWN)
--   = NOT UNKNOWN
--   = UNKNOWN ← Row excluded!
 
-- The fix: Filter out NULLs or use NOT EXISTS
SELECT * FROM products 
WHERE category_id NOT IN (
    SELECT id FROM excluded_categories WHERE id IS NOT NULL
);
 
-- Or better, use NOT EXISTS (handles NULL gracefully)
SELECT * FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM excluded_categories e WHERE e.id = p.category_id
);

The NOT IN Trap

The NOT IN with NULL subquery is perhaps the most dangerous NULL pitfall in SQL. It silently returns zero rows instead of what you expect. Many experienced developers have been bitten by this. Always prefer NOT EXISTS or explicitly filter NULLs from IN subqueries.

Summary: The Essence of NULL

We've explored the philosophical and practical foundations of NULL. Let's consolidate the key insights:

Key Takeaways

•NULL is not a value — It's a marker indicating the absence of a value, representing unknown, not applicable, or not-yet-assigned information.
•NULL cannot be compared with = or <> — Always use IS NULL and IS NOT NULL operators for NULL checks.
•NULL propagates through operations — Any arithmetic or comparison with NULL typically yields NULL or UNKNOWN.
•NULL behaves specially in aggregates — Most aggregate functions ignore NULL values; COUNT(*) vs COUNT(column) is a critical distinction.
•NULL handling varies by DBMS — Oracle's empty-string-as-NULL and UNIQUE constraint behaviors differ from other systems.
•NOT IN with NULL is dangerous — Always filter NULLs or use NOT EXISTS as a safer alternative.

What's next:

Now that we understand what NULL means conceptually, we'll dive into the specific operators used to detect NULL values: IS NULL and IS NOT NULL. We'll explore their syntax, behavior in complex expressions, and best practices for NULL detection.

Page Complete

You now understand the philosophical foundations of NULL in SQL databases. NULL is not zero, not empty, and not even equal to itself—it's a marker for missing information that requires special handling throughout SQL operations. Next, we'll learn the specific syntax for detecting and handling NULL values.