Loading content...
Imagine you're processing sequential IDs from 0 to n, and you discover one is missing. How do you find it? The obvious approach—marking which numbers you've seen—requires O(n) additional space. But with XOR, you can find the missing number using just a single integer variable.
This problem is a classic interview question that appears at companies from Google to Amazon. It elegantly demonstrates how XOR can solve problems that seem to inherently require extra memory.
By the end of this page, you will understand multiple approaches to the missing number problem (sum-based, XOR-based, and hash-based), implement the O(1) space XOR solution, and recognize how to combine XOR with index values for sequence problems.
Problem Statement:
Given an array nums containing n distinct numbers in the range [0, n], return the one number in the range that is missing from the array.
Key Observations:
n elementsConstraints:
Examples:
| Input Array | n | Range | Missing Number |
|---|---|---|---|
| [3, 0, 1] | 3 | [0, 3] | 2 |
| [0, 1] | 2 | [0, 2] | 2 |
| [9, 6, 4, 2, 3, 5, 7, 0, 1] | 9 | [0, 9] | 8 |
| [0] | 1 | [0, 1] | 1 |
| [1] | 1 | [0, 1] | 0 |
Notice that the array has n elements but the range is [0, n], which has n+1 possible values. This is why exactly one number must be missing—the pigeonhole principle guarantees it.
Before diving into the XOR solution, let's survey all approaches to understand why some are preferred over others.
Approach 1: Hash Set
def missing_hash(nums):
s = set(nums)
for i in range(len(nums) + 1):
if i not in s:
return i
Approach 2: Sorting
def missing_sort(nums):
nums.sort()
for i, num in enumerate(nums):
if i != num:
return i
return len(nums)
Approach 3: Sum Formula
def missing_sum(nums):
n = len(nums)
expected = n * (n + 1) // 2
actual = sum(nums)
return expected - actual
Approach 4: XOR (Our Focus)
| Approach | Time | Space | Overflow Risk? | Modifies Input? |
|---|---|---|---|---|
| Hash Set | O(n) | O(n) | No | No |
| Sorting | O(n log n) | O(1)* | No | Yes |
| Sum Formula | O(n) | O(1) | Yes | No |
| XOR | O(n) | O(1) | No | No |
The key insight is to compare what we have with what we should have. If we XOR all array elements with all numbers from 0 to n, duplicates will cancel, leaving only the missing number.
Why does this work?
Mathematical View:
XOR(complement) = XOR(0, 1, 2, ..., n) ⊕ XOR(nums)
= (0 ⊕ 1 ⊕ 2 ⊕ ... ⊕ missing ⊕ ... ⊕ n) ⊕ (all except missing)
= missing (everything else cancels)
This is a generalization of the single-number problem. There, duplicates in the array cancel. Here, we CREATE duplicates by XORing with the expected sequence, causing present numbers to cancel and the missing number to remain.
There are two common ways to implement the XOR solution:
Approach A: Two-Pass (Cleaner)
Approach B: Single-Pass (More Efficient)
Both achieve O(n) time and O(1) space, but single-pass is more elegant.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
def find_missing_two_pass(nums: list[int]) -> int: """ Two-pass approach: XOR expected sequence, then XOR actual elements. Time: O(n), Space: O(1) """ n = len(nums) # XOR all numbers from 0 to n xor_expected = 0 for i in range(n + 1): xor_expected ^= i # XOR all array elements xor_actual = 0 for num in nums: xor_actual ^= num # Missing number is the difference return xor_expected ^ xor_actual def find_missing_single_pass(nums: list[int]) -> int: """ Single-pass approach: XOR indices and elements together. The insight is that we can pair each index with its element. If the array were complete (ignoring the missing one), each number would XOR with its index position. We initialize with n because indices only go from 0 to n-1, but the range is [0, n]. This handles the case where n is missing. Time: O(n), Space: O(1) """ n = len(nums) result = n # Start with n (handles case where n is missing) for i in range(n): result ^= i ^ nums[i] return result # Functional approach using reducefrom functools import reducefrom operator import xor def find_missing_functional(nums: list[int]) -> int: """Functional single-pass approach.""" n = len(nums) return reduce(xor, range(n + 1)) ^ reduce(xor, nums) # Test casestest_cases = [ ([3, 0, 1], 2), ([0, 1], 2), ([9, 6, 4, 2, 3, 5, 7, 0, 1], 8), ([0], 1), ([1], 0), ([1, 2, 3], 0), # Missing 0 ([0, 1, 2], 3), # Missing n] print("Testing find_missing:")for nums, expected in test_cases: result1 = find_missing_two_pass(nums) result2 = find_missing_single_pass(nums) status = "✓" if result1 == expected == result2 else "✗" print(f" {status} {nums} → {result1} (expected {expected})")Let's trace through the single-pass algorithm with [3, 0, 1] where the missing number is 2.
Setup:
Initialization: result = n = 3
| i | nums[i] | i XOR nums[i] | result (before) | result (after) | Binary |
|---|---|---|---|---|---|
| 0 | 3 | 0 ^ 3 = 3 | 3 | 3 ^ 3 = 0 | 000 |
| 1 | 0 | 1 ^ 0 = 1 | 0 | 0 ^ 1 = 1 | 001 |
| 2 | 1 | 2 ^ 1 = 3 | 1 | 1 ^ 3 = 2 | 010 |
Result: 2 ✓
Mathematical Verification:
result = n ^ (0 ^ nums[0]) ^ (1 ^ nums[1]) ^ (2 ^ nums[2])
= 3 ^ (0 ^ 3) ^ (1 ^ 0) ^ (2 ^ 1)
= 3 ^ 0 ^ 3 ^ 1 ^ 0 ^ 2 ^ 1
= (3 ^ 3) ^ (1 ^ 1) ^ (0 ^ 0) ^ 2
= 0 ^ 0 ^ 0 ^ 2
= 2
What's happening?
Each number from 0 to n-1 appears either:
The missing number appears only as an index (if missing < n) or only as the initial value n (if missing = n).
A common question is: why do we initialize result = n instead of result = 0?
The Problem:
In the single-pass loop, we iterate i from 0 to n-1. But the range of possible numbers is 0 to n. The number n never appears as an index!
If the missing number is n:
All indices 0 to n-1 are present in both the index sequence and the array. Without initializing to n, we'd never include n in our XOR, and we'd incorrectly return 0.
Example: [0, 1, 2] with n = 3
Missing number is 3.
With result = n = 3:
result = 3 ^ (0 ^ 0) ^ (1 ^ 1) ^ (2 ^ 2)
= 3 ^ 0 ^ 0 ^ 0
= 3 ✓
With result = 0 (wrong):
result = 0 ^ (0 ^ 0) ^ (1 ^ 1) ^ (2 ^ 2)
= 0 ✗
By initializing with n, we effectively extend our XOR to include n without needing an extra iteration. It's as if we added a "virtual" index n at the end. This is a common pattern in XOR problems: handle the boundary case through initial value rather than special-case code.
Both XOR and sum-based approaches achieve O(n) time and O(1) space. When should you prefer one over the other?
Sum Formula:
def missing_sum(nums):
n = len(nums)
expected = n * (n + 1) // 2 # Sum of 0 to n
return expected - sum(nums)
XOR:
def missing_xor(nums):
result = len(nums)
for i, num in enumerate(nums):
result ^= i ^ num
return result
Overflow Analysis:
For the sum formula with n elements:
XOR never overflows because it operates bit-by-bit without increasing magnitude.
Practical Recommendation:
In Python/JavaScript, use whichever you find clearer. In Java/C++, prefer XOR for safety, or explicitly use long for the sum approach.
Let's verify the algorithm handles all edge cases correctly.
Edge Case 1: Missing 0
Input: [1, 2, 3] (n = 3)
result = 3
i=0: result = 3 ^ 0 ^ 1 = 2
i=1: result = 2 ^ 1 ^ 2 = 1
i=2: result = 1 ^ 2 ^ 3 = 0 ✓
Edge Case 2: Missing n (last element)
Input: [0, 1, 2] (n = 3)
result = 3
i=0: result = 3 ^ 0 ^ 0 = 3
i=1: result = 3 ^ 1 ^ 1 = 3
i=2: result = 3 ^ 2 ^ 2 = 3 ✓
Edge Case 3: Single Element
Input: [0] (n = 1, missing 1)
result = 1
i=0: result = 1 ^ 0 ^ 0 = 1 ✓
Input: [1] (n = 1, missing 0)
result = 1
i=0: result = 1 ^ 0 ^ 1 = 0 ✓
Edge Case 4: Large Input
No special handling needed—XOR processes elements regardless of size. The algorithm remains O(n) and O(1).
123456789101112131415161718192021222324252627282930313233343536373839
def find_missing(nums: list[int]) -> int: result = len(nums) for i, num in enumerate(nums): result ^= i ^ num return result # Comprehensive edge case testingedge_cases = [ # Missing first element {"input": [1], "expected": 0, "desc": "Missing 0, n=1"}, {"input": [1, 2, 3], "expected": 0, "desc": "Missing 0, n=3"}, # Missing last element (n) {"input": [0], "expected": 1, "desc": "Missing n=1"}, {"input": [0, 1, 2], "expected": 3, "desc": "Missing n=3"}, {"input": [0, 1, 2, 3, 4], "expected": 5, "desc": "Missing n=5"}, # Missing middle element {"input": [0, 2], "expected": 1, "desc": "Missing middle"}, {"input": [0, 1, 3, 4], "expected": 2, "desc": "Missing 2 from 0-4"}, # Shuffled input {"input": [3, 0, 1], "expected": 2, "desc": "Shuffled, missing 2"}, {"input": [9, 6, 4, 2, 3, 5, 7, 0, 1], "expected": 8, "desc": "Large shuffled"}, # Large input (verify no overflow) {"input": list(range(10000)) + [10001], "expected": 10000, "desc": "Skip 10000"},] print("Edge Case Testing:")print("=" * 60)for case in edge_cases: # Fix the last test case - it should be 0 to n except missing result = find_missing(case["input"]) status = "✓" if result == case["expected"] else "✗" desc = case['desc'] print(f"{status} {desc}: {result} (expected {case['expected']})")The missing number technique extends to several related problems:
Variation 1: Missing Number in Range [1, n]
If the range is [1, n] instead of [0, n], adjust initialization:
def missing_one_to_n(nums):
n = len(nums) + 1 # n is the max value, not array length
result = n
for i, num in enumerate(nums):
result ^= (i + 1) ^ num # indices shifted by 1
return result
Variation 2: Find Two Missing Numbers
If two numbers are missing from [0, n], we can use the two-singles technique from the previous page:
Variation 3: Missing Element in Sorted Array
If the array is sorted, binary search is more efficient (O(log n)):
def missing_sorted(nums):
lo, hi = 0, len(nums)
while lo < hi:
mid = (lo + hi) // 2
if nums[mid] > mid:
hi = mid
else:
lo = mid + 1
return lo
XOR is ideal for unsorted arrays where O(n) is acceptable. For sorted arrays, binary search is better. For finding multiple missing elements, XOR extends but may require partitioning. Always consider the problem constraints.
The missing number problem appears in various practical scenarios:
Application 1: Data Integrity
When processing sequential records (transaction IDs, packet sequence numbers), detecting a missing entry quickly:
# Verify all packet sequence numbers 0 to n received
def find_missing_packet(received_seq_nums, expected_count):
result = expected_count
for i, seq in enumerate(received_seq_nums):
result ^= i ^ seq
return result if result <= expected_count else None
Application 2: Database Consistency
Checking for gaps in auto-increment IDs after data migration.
Application 3: File System Verification
Verifying all numbered files (image_001.jpg, image_002.jpg, ...) are present.
Application 4: Distributed Systems
In distributed message queues, detecting lost messages when all messages should have sequential IDs.
This is a common interview problem. Here's how to nail it:
Step 1: Clarify Constraints
"Is the range guaranteed to be [0, n]? Are all numbers distinct? What's the size limit?"
Step 2: Mention Multiple Approaches
"I can think of hash set (O(n) space), sorting (O(n log n)), sum formula, and XOR. The last two are O(1) space. XOR avoids overflow, so I'll use that."
Step 3: Explain the Insight
"If I XOR all indices with all elements, present numbers appear twice and cancel. The missing number appears only in the index sequence, so it survives."
Step 4: Handle the n Case
"I initialize with n because indices only go to n-1 but the range includes n."
Step 5: Write Clean Code
def missingNumber(nums):
result = len(nums)
for i, num in enumerate(nums):
result ^= i ^ num
return result
Step 6: Trace an Example
Walk through [3, 0, 1] → 2
Interviewers want to see: (1) knowledge of multiple approaches, (2) ability to analyze trade-offs, (3) understanding of why XOR works, (4) handling of the n edge case, and (5) clean, correct code. Demonstrating all five signals a strong candidate.
result ^= i ^ nums[i] for all iModule Complete!
You've now mastered the fundamental XOR patterns for detection problems:
These techniques form the core of XOR-based problem solving and appear frequently in interviews and competitive programming. Practice applying them to variations, and they'll become second nature.
Congratulations! You've completed Module 8: XOR Patterns — Finding Missing/Duplicate. You now have a deep, rigorous understanding of how XOR's mathematical properties enable elegant O(1) space solutions for detection problems that seem to require much more.