Xor Patterns - Learning Module

Loading content...

0/279

Find Missing Number in Sequence

The Missing Piece of the Puzzle

Imagine you're processing sequential IDs from 0 to n, and you discover one is missing. How do you find it? The obvious approach—marking which numbers you've seen—requires O(n) additional space. But with XOR, you can find the missing number using just a single integer variable.

This problem is a classic interview question that appears at companies from Google to Amazon. It elegantly demonstrates how XOR can solve problems that seem to inherently require extra memory.

What You Will Learn

By the end of this page, you will understand multiple approaches to the missing number problem (sum-based, XOR-based, and hash-based), implement the O(1) space XOR solution, and recognize how to combine XOR with index values for sequence problems.

Problem Formalization

Problem Statement:

Given an array nums containing n distinct numbers in the range [0, n], return the one number in the range that is missing from the array.

Key Observations:

The array has n elements
Numbers are from 0 to n (n+1 possible values)
Exactly one number is missing
All numbers are distinct

Constraints:

n == nums.length
1 ≤ n ≤ 10⁴
0 ≤ nums[i] ≤ n
All numbers are unique

Examples:

Example Inputs and Outputs
Input Array	n	Range	Missing Number
[3, 0, 1]	3	[0, 3]	2
[0, 1]	2	[0, 2]	2
[9, 6, 4, 2, 3, 5, 7, 0, 1]	9	[0, 9]	8
[0]	1	[0, 1]	1
[1]	1	[0, 1]	0

Subtle Constraint

Notice that the array has n elements but the range is [0, n], which has n+1 possible values. This is why exactly one number must be missing—the pigeonhole principle guarantees it.

Approach Survey

Before diving into the XOR solution, let's survey all approaches to understand why some are preferred over others.

Approach 1: Hash Set

def missing_hash(nums):
    s = set(nums)
    for i in range(len(nums) + 1):
        if i not in s:
            return i

Time: O(n)
Space: O(n)
Pros: Simple, clear
Cons: Uses extra space

Approach 2: Sorting

def missing_sort(nums):
    nums.sort()
    for i, num in enumerate(nums):
        if i != num:
            return i
    return len(nums)

Time: O(n log n)
Space: O(1) or O(n) depending on sort
Pros: Easy to understand
Cons: Modifies input, slower time complexity

Approach 3: Sum Formula

def missing_sum(nums):
    n = len(nums)
    expected = n * (n + 1) // 2
    actual = sum(nums)
    return expected - actual

Time: O(n)
Space: O(1)
Pros: Very simple
Cons: Potential integer overflow for large n

Approach 4: XOR (Our Focus)

Time: O(n)
Space: O(1)
Pros: No overflow risk, constant space
Cons: Requires understanding XOR properties

Approach Comparison
Approach	Time	Space	Overflow Risk?	Modifies Input?
Hash Set	O(n)	O(n)	No	No
Sorting	O(n log n)	O(1)*	No	Yes
Sum Formula	O(n)	O(1)	Yes	No
XOR	O(n)	O(1)	No	No

The XOR Insight

The key insight is to compare what we have with what we should have. If we XOR all array elements with all numbers from 0 to n, duplicates will cancel, leaving only the missing number.

Why does this work?

Every number from 0 to n should appear exactly once in the complete sequence
Our array has all numbers except one
If we XOR: (array elements) ⊕ (0 to n)
- Numbers that exist appear twice (once in array, once in 0..n) → cancel to 0
- The missing number appears only in 0..n → survives

Mathematical View:

XOR(complement) = XOR(0, 1, 2, ..., n) ⊕ XOR(nums)
                = (0 ⊕ 1 ⊕ 2 ⊕ ... ⊕ missing ⊕ ... ⊕ n) ⊕ (all except missing)
                = missing  (everything else cancels)

The Pattern

This is a generalization of the single-number problem. There, duplicates in the array cancel. Here, we CREATE duplicates by XORing with the expected sequence, causing present numbers to cancel and the missing number to remain.

Implementation

There are two common ways to implement the XOR solution:

Approach A: Two-Pass (Cleaner)

XOR all numbers from 0 to n
XOR all array elements
XOR results together

Approach B: Single-Pass (More Efficient)

Initialize result with n (special case for last element)
XOR each index with its corresponding array element

Both achieve O(n) time and O(1) space, but single-pass is more elegant.

missing_number.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def find_missing_two_pass(nums: list[int]) -> int:
    """
    Two-pass approach: XOR expected sequence, then XOR actual elements.
    
    Time: O(n), Space: O(1)
    """
    n = len(nums)
    
    # XOR all numbers from 0 to n
    xor_expected = 0
    for i in range(n + 1):
        xor_expected ^= i
    
    # XOR all array elements
    xor_actual = 0
    for num in nums:
        xor_actual ^= num
    
    # Missing number is the difference
    return xor_expected ^ xor_actual
 
 
def find_missing_single_pass(nums: list[int]) -> int:
    """
    Single-pass approach: XOR indices and elements together.
    
    The insight is that we can pair each index with its element.
    If the array were complete (ignoring the missing one), each
    number would XOR with its index position.
    
    We initialize with n because indices only go from 0 to n-1,
    but the range is [0, n]. This handles the case where n is missing.
    
    Time: O(n), Space: O(1)
    """
    n = len(nums)
    result = n  # Start with n (handles case where n is missing)
    
    for i in range(n):
        result ^= i ^ nums[i]
    
    return result
 
 
# Functional approach using reduce
from functools import reduce
from operator import xor
 
def find_missing_functional(nums: list[int]) -> int:
    """Functional single-pass approach."""
    n = len(nums)
    return reduce(xor, range(n + 1)) ^ reduce(xor, nums)
 
 
# Test cases
test_cases = [
    ([3, 0, 1], 2),
    ([0, 1], 2),
    ([9, 6, 4, 2, 3, 5, 7, 0, 1], 8),
    ([0], 1),
    ([1], 0),
    ([1, 2, 3], 0),  # Missing 0
    ([0, 1, 2], 3),  # Missing n
]
 
print("Testing find_missing:")
for nums, expected in test_cases:
    result1 = find_missing_two_pass(nums)
    result2 = find_missing_single_pass(nums)
    status = "✓" if result1 == expected == result2 else "✗"
    print(f"  {status} {nums} → {result1} (expected {expected})")

Step-by-Step Execution Trace

Let's trace through the single-pass algorithm with [3, 0, 1] where the missing number is 2.

Setup:

n = 3 (array length)
Range = [0, 1, 2, 3]
Missing = 2

Initialization: result = n = 3

Execution Trace for [3, 0, 1]
i	nums[i]	i XOR nums[i]	result (before)	result (after)	Binary
0	3	0 ^ 3 = 3	3	3 ^ 3 = 0	000
1	0	1 ^ 0 = 1	0	0 ^ 1 = 1	001
2	1	2 ^ 1 = 3	1	1 ^ 3 = 2	010

Result: 2 ✓

Mathematical Verification:

result = n ^ (0 ^ nums[0]) ^ (1 ^ nums[1]) ^ (2 ^ nums[2])
       = 3 ^ (0 ^ 3) ^ (1 ^ 0) ^ (2 ^ 1)
       = 3 ^ 0 ^ 3 ^ 1 ^ 0 ^ 2 ^ 1
       = (3 ^ 3) ^ (1 ^ 1) ^ (0 ^ 0) ^ 2
       = 0 ^ 0 ^ 0 ^ 2
       = 2

What's happening?

Each number from 0 to n-1 appears either:

As an index AND in the array (appears twice, cancels)
As an index but NOT in the array (missing number!)

The missing number appears only as an index (if missing < n) or only as the initial value n (if missing = n).

Why Initialize with n?

A common question is: why do we initialize result = n instead of result = 0?

The Problem:

In the single-pass loop, we iterate i from 0 to n-1. But the range of possible numbers is 0 to n. The number n never appears as an index!

If the missing number is n:

All indices 0 to n-1 are present in both the index sequence and the array. Without initializing to n, we'd never include n in our XOR, and we'd incorrectly return 0.

Example: [0, 1, 2] with n = 3

Missing number is 3.

With result = n = 3:
result = 3 ^ (0 ^ 0) ^ (1 ^ 1) ^ (2 ^ 2)
       = 3 ^ 0 ^ 0 ^ 0
       = 3 ✓

With result = 0 (wrong):
result = 0 ^ (0 ^ 0) ^ (1 ^ 1) ^ (2 ^ 2)
       = 0 ✗

The Elegant Trick

By initializing with n, we effectively extend our XOR to include n without needing an extra iteration. It's as if we added a "virtual" index n at the end. This is a common pattern in XOR problems: handle the boundary case through initial value rather than special-case code.

XOR vs Sum Formula: When to Use Which

Both XOR and sum-based approaches achieve O(n) time and O(1) space. When should you prefer one over the other?

Sum Formula:

def missing_sum(nums):
    n = len(nums)
    expected = n * (n + 1) // 2  # Sum of 0 to n
    return expected - sum(nums)

XOR:

def missing_xor(nums):
    result = len(nums)
    for i, num in enumerate(nums):
        result ^= i ^ num
    return result

Choose Sum When

•n is small (no overflow risk)
•You want the simplest possible code
•You need to explain to a non-technical audience
•The language handles big integers automatically (Python)

Choose XOR When

•n can be large (overflow protection)
•You're in a language with fixed-width integers
•You want to demonstrate bit manipulation skill
•Interview asks for XOR specifically

Overflow Analysis:

For the sum formula with n elements:

Sum = n × (n + 1) / 2
For 32-bit signed integers, overflow at n ≈ 65,536
For 64-bit integers, safe up to n ≈ 4 billion

XOR never overflows because it operates bit-by-bit without increasing magnitude.

Practical Recommendation:

In Python/JavaScript, use whichever you find clearer. In Java/C++, prefer XOR for safety, or explicitly use long for the sum approach.

Edge Cases

Let's verify the algorithm handles all edge cases correctly.

Edge Case 1: Missing 0

Input: [1, 2, 3] (n = 3)
result = 3
i=0: result = 3 ^ 0 ^ 1 = 2
i=1: result = 2 ^ 1 ^ 2 = 1
i=2: result = 1 ^ 2 ^ 3 = 0 ✓

Edge Case 2: Missing n (last element)

Input: [0, 1, 2] (n = 3)
result = 3
i=0: result = 3 ^ 0 ^ 0 = 3
i=1: result = 3 ^ 1 ^ 1 = 3
i=2: result = 3 ^ 2 ^ 2 = 3 ✓

Edge Case 3: Single Element

Input: [0] (n = 1, missing 1)
result = 1
i=0: result = 1 ^ 0 ^ 0 = 1 ✓

Input: [1] (n = 1, missing 0)
result = 1
i=0: result = 1 ^ 0 ^ 1 = 0 ✓

Edge Case 4: Large Input

No special handling needed—XOR processes elements regardless of size. The algorithm remains O(n) and O(1).

edge_cases_missing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def find_missing(nums: list[int]) -> int:
    result = len(nums)
    for i, num in enumerate(nums):
        result ^= i ^ num
    return result
 
 
# Comprehensive edge case testing
edge_cases = [
    # Missing first element
    {"input": [1], "expected": 0, "desc": "Missing 0, n=1"},
    {"input": [1, 2, 3], "expected": 0, "desc": "Missing 0, n=3"},
    
    # Missing last element (n)
    {"input": [0], "expected": 1, "desc": "Missing n=1"},
    {"input": [0, 1, 2], "expected": 3, "desc": "Missing n=3"},
    {"input": [0, 1, 2, 3, 4], "expected": 5, "desc": "Missing n=5"},
    
    # Missing middle element
    {"input": [0, 2], "expected": 1, "desc": "Missing middle"},
    {"input": [0, 1, 3, 4], "expected": 2, "desc": "Missing 2 from 0-4"},
    
    # Shuffled input
    {"input": [3, 0, 1], "expected": 2, "desc": "Shuffled, missing 2"},
    {"input": [9, 6, 4, 2, 3, 5, 7, 0, 1], "expected": 8, "desc": "Large shuffled"},
    
    # Large input (verify no overflow)
    {"input": list(range(10000)) + [10001], "expected": 10000, 
     "desc": "Skip 10000"},
]
 
print("Edge Case Testing:")
print("=" * 60)
for case in edge_cases:
    # Fix the last test case - it should be 0 to n except missing
    result = find_missing(case["input"])
    status = "✓" if result == case["expected"] else "✗"
    desc = case['desc']
    print(f"{status} {desc}: {result} (expected {case['expected']})")

Variations and Extensions

The missing number technique extends to several related problems:

Variation 1: Missing Number in Range [1, n]

If the range is [1, n] instead of [0, n], adjust initialization:

def missing_one_to_n(nums):
    n = len(nums) + 1  # n is the max value, not array length
    result = n
    for i, num in enumerate(nums):
        result ^= (i + 1) ^ num  # indices shifted by 1
    return result

Variation 2: Find Two Missing Numbers

If two numbers are missing from [0, n], we can use the two-singles technique from the previous page:

XOR all with expected sequence → A ⊕ B
Find distinguishing bit
Partition and XOR each group

Variation 3: Missing Element in Sorted Array

If the array is sorted, binary search is more efficient (O(log n)):

def missing_sorted(nums):
    lo, hi = 0, len(nums)
    while lo < hi:
        mid = (lo + hi) // 2
        if nums[mid] > mid:
            hi = mid
        else:
            lo = mid + 1
    return lo

Choose the Right Tool

XOR is ideal for unsorted arrays where O(n) is acceptable. For sorted arrays, binary search is better. For finding multiple missing elements, XOR extends but may require partitioning. Always consider the problem constraints.

Real-World Applications

The missing number problem appears in various practical scenarios:

Application 1: Data Integrity

When processing sequential records (transaction IDs, packet sequence numbers), detecting a missing entry quickly:

# Verify all packet sequence numbers 0 to n received
def find_missing_packet(received_seq_nums, expected_count):
    result = expected_count
    for i, seq in enumerate(received_seq_nums):
        result ^= i ^ seq
    return result if result <= expected_count else None

Application 2: Database Consistency

Checking for gaps in auto-increment IDs after data migration.

Application 3: File System Verification

Verifying all numbered files (image_001.jpg, image_002.jpg, ...) are present.

Application 4: Distributed Systems

In distributed message queues, detecting lost messages when all messages should have sequential IDs.

Why XOR Shines in Production

•Streaming-compatible: Can process elements one at a time without storing
•Memory-efficient: Constant space regardless of sequence length
•Overflow-safe: No risk with large sequences
•Parallelizable: XOR is associative, allowing parallel reduction
•Simple to implement: Few lines of code, low bug surface

Interview Tips

This is a common interview problem. Here's how to nail it:

Step 1: Clarify Constraints

"Is the range guaranteed to be [0, n]? Are all numbers distinct? What's the size limit?"

Step 2: Mention Multiple Approaches

"I can think of hash set (O(n) space), sorting (O(n log n)), sum formula, and XOR. The last two are O(1) space. XOR avoids overflow, so I'll use that."

Step 3: Explain the Insight

"If I XOR all indices with all elements, present numbers appear twice and cancel. The missing number appears only in the index sequence, so it survives."

Step 4: Handle the n Case

"I initialize with n because indices only go to n-1 but the range includes n."

Step 5: Write Clean Code

def missingNumber(nums):
    result = len(nums)
    for i, num in enumerate(nums):
        result ^= i ^ num
    return result

Step 6: Trace an Example

Walk through [3, 0, 1] → 2

Interviewer's Perspective

Interviewers want to see: (1) knowledge of multiple approaches, (2) ability to analyze trade-offs, (3) understanding of why XOR works, (4) handling of the n edge case, and (5) clean, correct code. Demonstrating all five signals a strong candidate.

Summary

Key Takeaways

•XOR with expected sequence reveals the missing element by canceling present ones
•Initialize with n to handle the case where the missing number is n itself
•Single-pass formula: result ^= i ^ nums[i] for all i
•Complexity: O(n) time, O(1) space — optimal for unsorted inputs
•No overflow risk: XOR operates bit-by-bit, unlike sum formulas
•Extends to variations: Multiple missing numbers, different ranges
•Real applications: Data integrity, sequential ID verification

Module Complete!

You've now mastered the fundamental XOR patterns for detection problems:

Self-Cancellation Property — The mathematical foundation
Find Single Number — The classic single unique element
Find Two Singles — Partitioning by distinguishing bit
Find Missing Number — XOR with expected sequence

These techniques form the core of XOR-based problem solving and appear frequently in interviews and competitive programming. Practice applying them to variations, and they'll become second nature.

Module Complete

Congratulations! You've completed Module 8: XOR Patterns — Finding Missing/Duplicate. You now have a deep, rigorous understanding of how XOR's mathematical properties enable elegant O(1) space solutions for detection problems that seem to require much more.