Loading learning content...
Imagine you're a postal worker in a large sorting facility, tasked with organizing thousands of letters by their ZIP codes. You could compare each pair of letters and gradually build a sorted pile—but that's not how postal services actually work. Instead, they use sorting bins: first separating letters by the first digit of the ZIP code, then by the second digit, and so on.
This intuitive approach—sorting by individual digits rather than comparing complete values—is the fundamental insight behind radix sort, one of the most elegant and powerful sorting algorithms ever devised. While comparison-based algorithms like merge sort and quicksort are bound by the O(n log n) lower bound, radix sort sidesteps this limitation entirely by exploiting the structure of the data rather than just comparing values.
By the end of this page, you will deeply understand: (1) How radix sort processes data digit-by-digit to achieve sorting, (2) The critical distinction between Least Significant Digit (LSD) and Most Significant Digit (MSD) approaches, (3) Why processing order matters and how it affects correctness, (4) The mathematical invariants that guarantee radix sort produces correctly sorted output, and (5) How to trace through radix sort execution on real examples.
To understand radix sort, we must first shift our perspective on what numbers actually are. When we write the number 4271, we typically think of it as a single atomic value—four thousand two hundred seventy-one. But this number is also a sequence of symbols: [4, 2, 7, 1], where each symbol comes from a finite alphabet (digits 0-9).
The key insight: If we can sort numbers by looking at one digit position at a time—and do so in a way that respects previous sorting decisions—we can sort the entire collection without ever directly comparing two complete numbers.
This is fundamentally different from comparison-based sorting:
| Aspect | Comparison-Based (e.g., Quicksort) | Digit-Based (Radix Sort) |
|---|---|---|
| Core operation | Compare two complete values | Group by individual digits |
| Information per step | Binary (less than or greater) | k-way (digit value 0 to k-1) |
| Lower bound | Ω(n log n) comparisons | No comparison lower bound applies |
| Key insight | Total ordering from pairwise comparisons | Positional notation encodes magnitude |
| Applicable to | Any comparable elements | Elements with positional structure |
Why does positional notation matter?
In our decimal system, the position of a digit determines its contribution to the value:
4271 = 4×10³ + 2×10² + 7×10¹ + 1×10⁰
= 4000 + 200 + 70 + 1
Critically, the leftmost digits contribute more to the value than rightmost digits. This means:
Radix sort exploits this positional structure systematically, processing digits in a specific order to ensure correctness.
The term radix refers to the base of a numeral system. For decimal numbers, the radix is 10 (digits 0-9). For binary, the radix is 2 (bits 0-1). For hexadecimal, the radix is 16 (0-9 and A-F). Radix sort works with any radix, and the choice of radix significantly impacts performance—a trade-off we'll explore in detail.
Given that numbers are sequences of digits, a natural question arises: in what order should we process the digits? There are two fundamental approaches, each with distinct characteristics:
Remarkably, both approaches can produce correct sorted output—but they work in fundamentally different ways and have different properties.
The stability imperative for LSD:
LSD radix sort has a crucial requirement that MSD does not: each pass must use a stable sorting algorithm. A sorting algorithm is stable if it preserves the relative order of elements with equal keys.
Why does LSD require stability? Consider what happens after multiple passes:
Without stability, earlier sorting decisions would be destroyed by later passes, producing incorrect results.
Using an unstable sort (like standard quicksort) as the subroutine for LSD radix sort will produce INCORRECT results. The algorithm's correctness depends entirely on each pass preserving the relative order established by previous passes. Counting sort is the typical choice because it is naturally stable and has O(n + k) complexity.
LSD radix sort is the more common variant, particularly for fixed-length integer sorting. Its elegance lies in its simplicity: perform d passes (one for each digit position), using a stable sort for each pass.
Algorithm Overview:
LSD_RADIX_SORT(array, d):
for digit_position = 0 to d-1: // 0 is rightmost
STABLE_SORT(array by digit at position digit_position)
return array
Let's trace through a complete example to build deep intuition.
Comprehensive Example:
Consider sorting the following 3-digit numbers:
Initial array: [329, 457, 657, 839, 436, 720, 355]
Pass 1: Sort by 1s digit (rightmost)
We extract the 1s digit of each number and group accordingly:
329 → digit 9
457 → digit 7
657 → digit 7
839 → digit 9
436 → digit 6
720 → digit 0
355 → digit 5
After stable sorting by this digit:
[720, 355, 436, 457, 657, 329, 839]
0 5 6 7 7 9 9
Note: 457 appears before 657 (both have digit 7), preserving their original relative order. Similarly, 329 appears before 839 (both have digit 9).
Pass 2: Sort by 10s digit
Now we sort by the 10s digit, maintaining the order from Pass 1:
720 → digit 2
355 → digit 5
436 → digit 3
457 → digit 5
657 → digit 5
329 → digit 2
839 → digit 3
After stable sorting by this digit:
[720, 329, 436, 839, 355, 457, 657]
2 2 3 3 5 5 5
Critical observation: Among numbers with 10s digit 2, 720 comes before 329. WHY? Because in Pass 1, 720 (ending in 0) was placed before 329 (ending in 9). The stable sort preserved this relative order.
Similarly, 355 < 457 < 657 because their 1s digits (5, 7, 7) were established in Pass 1, and among 457 and 657 (same 1s digit), their original order was preserved.
Pass 3: Sort by 100s digit (leftmost)
Final pass sorts by the most significant digit:
720 → digit 7
329 → digit 3
436 → digit 4
839 → digit 8
355 → digit 3
457 → digit 4
657 → digit 6
After stable sorting:
[329, 355, 436, 457, 657, 720, 839]
3 3 4 4 6 7 8
The array is now completely sorted!
Notice that 329 < 355 (same 100s digit, but 29 < 55 established by earlier passes) and 436 < 457 (same 100s digit, but 36 < 57). The cumulative effect of stable sorting at each position builds up the correct total ordering.
After k passes of LSD radix sort, the array is correctly sorted with respect to the k least significant digits. This invariant is maintained by stability: when sorting by digit k+1, elements with equal (k+1)th digits retain their order, which was correct for the k-digit suffix. After d passes, the entire d-digit numbers are correctly sorted.
MSD radix sort works in the opposite direction, processing from the most significant digit first. This approach is more similar to how a human might sort: first group by the leading digit, then within each group, sort by the next digit, and so on.
Algorithm Overview:
MSD_RADIX_SORT(array, digit_position, start, end):
if start >= end or digit_position > max_digits:
return
// Partition array[start:end] into buckets by current digit
buckets = PARTITION_BY_DIGIT(array, start, end, digit_position)
// Recursively sort each bucket
for each bucket in buckets:
MSD_RADIX_SORT(bucket, digit_position + 1, bucket.start, bucket.end)
Example: MSD Radix Sort in Action
Initial array: [329, 457, 657, 839, 436, 720, 355]
Pass 1: Partition by 100s digit (leftmost)
Digit 3: [329, 355]
Digit 4: [457, 436]
Digit 6: [657]
Digit 7: [720]
Digit 8: [839]
Pass 2: Recursively sort each bucket by 10s digit
Bucket [329, 355] → sorted by 10s digit:
Bucket [457, 436] → sorted by 10s digit:
Single-element buckets [657], [720], [839] are already sorted.
Final result after concatenation:
[329, 355, 436, 457, 657, 720, 839] ✓
| Characteristic | LSD Radix Sort | MSD Radix Sort |
|---|---|---|
| Processing order | Right to left (1s → 10s → 100s...) | Left to right (100s → 10s → 1s...) |
| Algorithm structure | Iterative: d sequential passes | Recursive: partition then recurse |
| Bucket handling | All elements in single pass | Separate recursion per bucket |
| Early termination | Always processes all d digits | Can stop when buckets are single elements |
| Stability requirement | Critical—correctness depends on it | Optional—recursion handles subproblems |
| Cache behavior | Consistent—same pattern each pass | Variable—depends on bucket sizes |
| Variable-length keys | Requires padding to max length | Natural handling—shorter keys end recursion early |
| Parallelization | Each pass is inherently sequential | Independent buckets can parallelize |
When to choose MSD over LSD:
Variable-length strings: MSD naturally handles strings of different lengths. "cat" and "category" can be sorted without padding because once we reach the end of "cat", it's placed before any longer string sharing that prefix.
Partially sorted output needed: MSD can provide lazy evaluation—you can get the smallest k elements without fully sorting the array.
High-entropy leading digits: When the most significant digits effectively separate the data into small buckets, MSD can terminate early.
Parallelizable workloads: Independent buckets after the first pass can be sorted in parallel.
MSD radix sort is particularly elegant for string sorting, where it's often called "multikey quicksort" or "three-way radix quicksort" when combined with three-way partitioning. For string-heavy applications like sorting dictionaries, genomic sequences, or URL databases, MSD variants often outperform generic comparison sorts.
Understanding why radix sort works—not just that it works—requires examining the mathematical invariants that guarantee correctness.
Theorem (LSD Radix Sort Correctness):
Given an array of numbers where each number has at most d digits, LSD radix sort using a stable sort at each digit position produces a correctly sorted array.
Proof by induction on the number of passes:
Base case (k=1): After one pass (sorting by the 1s digit), the array is correctly sorted if we only consider the 1s digit. All numbers ending in 0 precede those ending in 1, etc.
Inductive step: Assume after k passes, the array is correctly sorted with respect to the k least significant digits. Consider what happens in pass k+1 (sorting by digit position k+1):
Therefore, after pass k+1, the array is correctly sorted with respect to the (k+1) least significant digits. ∎
123456789
INVARIANT: After processing digits at positions 0, 1, 2, ..., k-1 (right to left), for all pairs of numbers (a, b) where a appears before b in the array: Either: 1. The k-digit suffix of a is less than the k-digit suffix of b, OR 2. The k-digit suffixes are equal (their relative order doesn't matter yet) CONSEQUENCE: When k = d (number of digits), the "suffix" is the entire number, so condition 1 implies a < b, meaning the array is sorted.Why does MSD work differently?
MSD correctness relies on a different principle: divide-and-conquer with lexicographic ordering.
After partitioning by the first digit, all numbers starting with 1 come before those starting with 2, etc. This is globally correct because the first digit has the highest weight.
Within each partition, the first digit is constant. The relative order of numbers within a partition depends only on their remaining digits (positions 2 through d).
Recursively sorting each partition solves independent subproblems, eventually reaching single elements or examining all digits.
The key insight: MSD is correct because higher-order digits dominate lower-order digits in determining value. Once we partition by digit k, numbers in different partitions have their relative order fixed forever.
LSD radix sort builds correctness from the bottom up—least significant decisions first, preserved by stability. MSD radix sort builds correctness from the top down—most significant decisions first, independence via recursion. Both exploit the positional structure of numeral representations, just in opposite directions.
Moving from algorithm description to practical implementation requires addressing several key details:
Extracting digits efficiently:
For base-10 numbers, extracting the digit at position p:
digit = (number / 10^p) % 10
For base-2^k (binary, octal, hexadecimal), use bit manipulation:
digit = (number >> (k * p)) & ((1 << k) - 1)
The second form is significantly faster because division and modulo by powers of 2 are simple bit operations.
1234567891011121314151617181920212223242526272829303132333435363738394041
def lsd_radix_sort(arr): """ LSD Radix Sort for 32-bit non-negative integers. Uses base 256 (8 bits per digit) for efficiency. This means 4 passes for 32-bit integers (32 / 8 = 4). """ if len(arr) <= 1: return arr RADIX = 256 # 2^8, process 8 bits at a time MASK = RADIX - 1 # 0xFF, used for extracting 8 bits # For 32-bit integers, we need 4 passes (0, 8, 16, 24 bit positions) for shift in range(0, 32, 8): # Count occurrences of each digit count = [0] * RADIX for num in arr: digit = (num >> shift) & MASK count[digit] += 1 # Convert counts to cumulative positions for i in range(1, RADIX): count[i] += count[i - 1] # Build output array (traverse backwards for stability) output = [0] * len(arr) for i in range(len(arr) - 1, -1, -1): digit = (arr[i] >> shift) & MASK count[digit] -= 1 output[count[digit]] = arr[i] arr = output # Use output as input for next pass return arr # Example usage with detailed tracenumbers = [329, 457, 657, 839, 436, 720, 355]print(f"Original: {numbers}")print(f"Sorted: {lsd_radix_sort(numbers)}")Why base 256?
Using base 256 (2⁸) instead of base 10 offers multiple advantages:
The trade-off is a larger count array (256 vs 10 elements), but this is negligible for modern systems.
| Radix | Bits per digit | Number of passes | Count array size | Trade-off |
|---|---|---|---|---|
| 2 | 1 | 32 | 2 | Too many passes—overhead dominates |
| 4 | 2 | 16 | 4 | Still too many passes |
| 16 | 4 | 8 | 16 | Reasonable for small datasets |
| 256 | 8 | 4 | 256 | Optimal for most cases ✓ |
| 65536 | 16 | 2 | 65536 | Count array too large for L1 cache |
Standard radix sort assumes non-negative integers. For signed integers, you must either: (1) Separate negatives, sort each group, then concatenate negatives (reversed) with positives, or (2) Flip the sign bit during sorting to treat the most significant bit correctly (negative numbers have sign bit 1, which would otherwise sort them as larger than positives).
A visual trace helps solidify understanding of how LSD radix sort progressively builds the sorted order. Let's trace through sorting [170, 045, 075, 090, 002, 024, 802, 066] with base 10:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
INITIAL ARRAY: [170, 045, 075, 090, 002, 024, 802, 066] ═══════════════════════════════════════════════════════════════════PASS 1: Sort by 1s digit (rightmost)═══════════════════════════════════════════════════════════════════Number: 170 045 075 090 002 024 802 0661s digit: 0 5 5 0 2 4 2 6 Buckets after distribution: Digit 0: [170, 090] Digit 2: [002, 802] Digit 4: [024] Digit 5: [045, 075] Digit 6: [066] After Pass 1: [170, 090, 002, 802, 024, 045, 075, 066] │ │ │ │ │ │ │ │ 0s 0s 2s 2s 4s 5s 5s 6s ═══════════════════════════════════════════════════════════════════PASS 2: Sort by 10s digit═══════════════════════════════════════════════════════════════════Input: [170, 090, 002, 802, 024, 045, 075, 066]10s digit: 7 9 0 0 2 4 7 6 Buckets after distribution: Digit 0: [002, 802] ← Note: 002 before 802 (stable from Pass 1!) Digit 2: [024] Digit 4: [045] Digit 6: [066] Digit 7: [170, 075] ← Note: 170 before 075 (stable from Pass 1!) Digit 9: [090] After Pass 2: [002, 802, 024, 045, 066, 170, 075, 090] │ │ │ │ │ │ │ │ 0x 0x 2x 4x 6x 7x 7x 9x ═══════════════════════════════════════════════════════════════════PASS 3: Sort by 100s digit (leftmost)═══════════════════════════════════════════════════════════════════Input: [002, 802, 024, 045, 066, 170, 075, 090]100s digit: 0 8 0 0 0 1 0 0 Buckets after distribution: Digit 0: [002, 024, 045, 066, 075, 090] ← All preserved in stable order! Digit 1: [170] Digit 8: [802] FINAL: [002, 024, 045, 066, 075, 090, 170, 802] ✓ SORTED! ═══════════════════════════════════════════════════════════════════KEY INSIGHT: The stable sort at each pass preserved the ordering fromprevious passes. After Pass 1, 002 was before 802 (2 < 2, but 002 camefirst originally). This order was maintained through Pass 2 and Pass 3because they have the same 10s digit (0) and same 100s digit (0/8).Each pass "locks in" correct relative order for numbers that share the current digit. By the final pass, ALL relative orders are correct because we've processed all digits. The stability property acts as a "memory" carrying forward the decisions made in earlier passes.
We've explored the fundamental principles of radix sort—a powerful technique that escapes comparison-based sorting's lower bounds by exploiting the structure of positional numeral representations.
What's next:
Now that we understand the fundamental mechanism of radix sort—processing digits one position at a time—we'll examine how radix sort uses counting sort as a subroutine. This stable, linear-time sorting algorithm is the engine that powers each pass of radix sort, and understanding it deeply reveals why radix sort achieves its remarkable performance characteristics.
You now understand how radix sort processes numbers digit-by-digit, the critical distinction between LSD and MSD approaches, and why stability is essential for LSD correctness. In the next page, we'll examine the counting sort subroutine that enables radix sort's linear-time performance.