Data Structures & AlgorithmsRadix Sort

Radix Sort — Digit-Based Sorting

LevelIntermediate

Duration50 mins

TopicRadix Sort

1 / 4

Sorting by Digits (LSD or MSD)

Beyond Comparison: A Different Way to Sort

Imagine you're a postal worker in a large sorting facility, tasked with organizing thousands of letters by their ZIP codes. You could compare each pair of letters and gradually build a sorted pile—but that's not how postal services actually work. Instead, they use sorting bins: first separating letters by the first digit of the ZIP code, then by the second digit, and so on.

This intuitive approach—sorting by individual digits rather than comparing complete values—is the fundamental insight behind radix sort, one of the most elegant and powerful sorting algorithms ever devised. While comparison-based algorithms like merge sort and quicksort are bound by the O(n log n) lower bound, radix sort sidesteps this limitation entirely by exploiting the structure of the data rather than just comparing values.

What You Will Learn

By the end of this page, you will deeply understand: (1) How radix sort processes data digit-by-digit to achieve sorting, (2) The critical distinction between Least Significant Digit (LSD) and Most Significant Digit (MSD) approaches, (3) Why processing order matters and how it affects correctness, (4) The mathematical invariants that guarantee radix sort produces correctly sorted output, and (5) How to trace through radix sort execution on real examples.

The Fundamental Insight: Numbers as Sequences

To understand radix sort, we must first shift our perspective on what numbers actually are. When we write the number 4271, we typically think of it as a single atomic value—four thousand two hundred seventy-one. But this number is also a sequence of symbols: [4, 2, 7, 1], where each symbol comes from a finite alphabet (digits 0-9).

The key insight: If we can sort numbers by looking at one digit position at a time—and do so in a way that respects previous sorting decisions—we can sort the entire collection without ever directly comparing two complete numbers.

This is fundamentally different from comparison-based sorting:

Paradigm Comparison: Comparison-Based vs Digit-Based Sorting
Aspect	Comparison-Based (e.g., Quicksort)	Digit-Based (Radix Sort)
Core operation	Compare two complete values	Group by individual digits
Information per step	Binary (less than or greater)	k-way (digit value 0 to k-1)
Lower bound	Ω(n log n) comparisons	No comparison lower bound applies
Key insight	Total ordering from pairwise comparisons	Positional notation encodes magnitude
Applicable to	Any comparable elements	Elements with positional structure

Why does positional notation matter?

In our decimal system, the position of a digit determines its contribution to the value:

4271 = 4×10³ + 2×10² + 7×10¹ + 1×10⁰
     = 4000  +  200  +  70   +  1

Critically, the leftmost digits contribute more to the value than rightmost digits. This means:

If two numbers differ in their leftmost digit, we know their relative order without examining other digits
If two numbers share the same leftmost digit, their relative order depends on subsequent digits

Radix sort exploits this positional structure systematically, processing digits in a specific order to ensure correctness.

The Radix Concept

The term radix refers to the base of a numeral system. For decimal numbers, the radix is 10 (digits 0-9). For binary, the radix is 2 (bits 0-1). For hexadecimal, the radix is 16 (0-9 and A-F). Radix sort works with any radix, and the choice of radix significantly impacts performance—a trade-off we'll explore in detail.

The Two Fundamental Approaches: LSD and MSD

Given that numbers are sequences of digits, a natural question arises: in what order should we process the digits? There are two fundamental approaches, each with distinct characteristics:

Least Significant Digit First (LSD Radix Sort): Start from the rightmost digit and work toward the leftmost
Most Significant Digit First (MSD Radix Sort): Start from the leftmost digit and work toward the rightmost

Remarkably, both approaches can produce correct sorted output—but they work in fundamentally different ways and have different properties.

LSD Radix Sort (Right to Left)

•Process order: Rightmost → Leftmost
•Sorting strategy: Sort entire array by each digit position, proceeding through all positions
•Stability requirement: MUST be stable at each pass
•Recursion: Not required—purely iterative
•Memory pattern: Consistent across all passes
•Variable-length strings: Requires padding

MSD Radix Sort (Left to Right)

•Process order: Leftmost → Rightmost
•Sorting strategy: Partition by first digit, then recursively sort each partition
•Stability requirement: Stable if implemented carefully
•Recursion: Typically recursive or stack-based
•Memory pattern: Can vary with partition sizes
•Variable-length strings: Natural fit

The stability imperative for LSD:

LSD radix sort has a crucial requirement that MSD does not: each pass must use a stable sorting algorithm. A sorting algorithm is stable if it preserves the relative order of elements with equal keys.

Why does LSD require stability? Consider what happens after multiple passes:

After sorting by the 1s digit, all numbers ending in 0 come first, then those ending in 1, etc.
When we then sort by the 10s digit, numbers with the same 10s digit must remain in their order from step 1
This ensures that 23 comes before 25 (same 10s digit, but 3 < 5 in 1s digit, established in step 1)

Without stability, earlier sorting decisions would be destroyed by later passes, producing incorrect results.

Critical: Stability in LSD Radix Sort

Using an unstable sort (like standard quicksort) as the subroutine for LSD radix sort will produce INCORRECT results. The algorithm's correctness depends entirely on each pass preserving the relative order established by previous passes. Counting sort is the typical choice because it is naturally stable and has O(n + k) complexity.

LSD Radix Sort: The Iterative Approach

LSD radix sort is the more common variant, particularly for fixed-length integer sorting. Its elegance lies in its simplicity: perform d passes (one for each digit position), using a stable sort for each pass.

Algorithm Overview:

LSD_RADIX_SORT(array, d):
    for digit_position = 0 to d-1:    // 0 is rightmost
        STABLE_SORT(array by digit at position digit_position)
    return array

Let's trace through a complete example to build deep intuition.

Comprehensive Example:

Consider sorting the following 3-digit numbers:

Initial array: [329, 457, 657, 839, 436, 720, 355]

Pass 1: Sort by 1s digit (rightmost)

We extract the 1s digit of each number and group accordingly:

329 → digit 9
457 → digit 7
657 → digit 7
839 → digit 9
436 → digit 6
720 → digit 0
355 → digit 5

After stable sorting by this digit:

[720, 355, 436, 457, 657, 329, 839]
  0    5    6    7    7    9    9

Note: 457 appears before 657 (both have digit 7), preserving their original relative order. Similarly, 329 appears before 839 (both have digit 9).

Pass 2: Sort by 10s digit

Now we sort by the 10s digit, maintaining the order from Pass 1:

720 → digit 2
355 → digit 5
436 → digit 3
457 → digit 5
657 → digit 5
329 → digit 2
839 → digit 3

After stable sorting by this digit:

[720, 329, 436, 839, 355, 457, 657]
  2    2    3    3    5    5    5

Critical observation: Among numbers with 10s digit 2, 720 comes before 329. WHY? Because in Pass 1, 720 (ending in 0) was placed before 329 (ending in 9). The stable sort preserved this relative order.

Similarly, 355 < 457 < 657 because their 1s digits (5, 7, 7) were established in Pass 1, and among 457 and 657 (same 1s digit), their original order was preserved.

Pass 3: Sort by 100s digit (leftmost)

Final pass sorts by the most significant digit:

720 → digit 7
329 → digit 3
436 → digit 4
839 → digit 8
355 → digit 3
457 → digit 4
657 → digit 6

After stable sorting:

[329, 355, 436, 457, 657, 720, 839]
  3    3    4    4    6    7    8

The array is now completely sorted!

Notice that 329 < 355 (same 100s digit, but 29 < 55 established by earlier passes) and 436 < 457 (same 100s digit, but 36 < 57). The cumulative effect of stable sorting at each position builds up the correct total ordering.

The Inductive Invariant

After k passes of LSD radix sort, the array is correctly sorted with respect to the k least significant digits. This invariant is maintained by stability: when sorting by digit k+1, elements with equal (k+1)th digits retain their order, which was correct for the k-digit suffix. After d passes, the entire d-digit numbers are correctly sorted.

MSD Radix Sort: The Recursive Approach

MSD radix sort works in the opposite direction, processing from the most significant digit first. This approach is more similar to how a human might sort: first group by the leading digit, then within each group, sort by the next digit, and so on.

Algorithm Overview:

MSD_RADIX_SORT(array, digit_position, start, end):
    if start >= end or digit_position > max_digits:
        return
    
    // Partition array[start:end] into buckets by current digit
    buckets = PARTITION_BY_DIGIT(array, start, end, digit_position)
    
    // Recursively sort each bucket
    for each bucket in buckets:
        MSD_RADIX_SORT(bucket, digit_position + 1, bucket.start, bucket.end)

Example: MSD Radix Sort in Action

Initial array: [329, 457, 657, 839, 436, 720, 355]

Pass 1: Partition by 100s digit (leftmost)

Digit 3: [329, 355]
Digit 4: [457, 436]
Digit 6: [657]
Digit 7: [720]
Digit 8: [839]

Pass 2: Recursively sort each bucket by 10s digit

Bucket [329, 355] → sorted by 10s digit:

329 has 10s digit 2
355 has 10s digit 5
Result: [329, 355] ✓

Bucket [457, 436] → sorted by 10s digit:

457 has 10s digit 5
436 has 10s digit 3
Result: [436, 457] ✓

Single-element buckets [657], [720], [839] are already sorted.

Final result after concatenation:

[329, 355, 436, 457, 657, 720, 839] ✓

LSD vs MSD Radix Sort: Detailed Comparison
Characteristic	LSD Radix Sort	MSD Radix Sort
Processing order	Right to left (1s → 10s → 100s...)	Left to right (100s → 10s → 1s...)
Algorithm structure	Iterative: d sequential passes	Recursive: partition then recurse
Bucket handling	All elements in single pass	Separate recursion per bucket
Early termination	Always processes all d digits	Can stop when buckets are single elements
Stability requirement	Critical—correctness depends on it	Optional—recursion handles subproblems
Cache behavior	Consistent—same pattern each pass	Variable—depends on bucket sizes
Variable-length keys	Requires padding to max length	Natural handling—shorter keys end recursion early
Parallelization	Each pass is inherently sequential	Independent buckets can parallelize

When to choose MSD over LSD:

Variable-length strings: MSD naturally handles strings of different lengths. "cat" and "category" can be sorted without padding because once we reach the end of "cat", it's placed before any longer string sharing that prefix.
Partially sorted output needed: MSD can provide lazy evaluation—you can get the smallest k elements without fully sorting the array.
High-entropy leading digits: When the most significant digits effectively separate the data into small buckets, MSD can terminate early.
Parallelizable workloads: Independent buckets after the first pass can be sorted in parallel.

MSD and String Sorting

MSD radix sort is particularly elegant for string sorting, where it's often called "multikey quicksort" or "three-way radix quicksort" when combined with three-way partitioning. For string-heavy applications like sorting dictionaries, genomic sequences, or URL databases, MSD variants often outperform generic comparison sorts.

The Mathematics of Correctness

Understanding why radix sort works—not just that it works—requires examining the mathematical invariants that guarantee correctness.

Theorem (LSD Radix Sort Correctness):

Given an array of numbers where each number has at most d digits, LSD radix sort using a stable sort at each digit position produces a correctly sorted array.

Proof by induction on the number of passes:

Base case (k=1): After one pass (sorting by the 1s digit), the array is correctly sorted if we only consider the 1s digit. All numbers ending in 0 precede those ending in 1, etc.

Inductive step: Assume after k passes, the array is correctly sorted with respect to the k least significant digits. Consider what happens in pass k+1 (sorting by digit position k+1):

Numbers with different digits at position k+1 are correctly ordered by that digit
Numbers with the same digit at position k+1 maintain their relative order from before this pass (by stability)
By the inductive hypothesis, that relative order was correct for the k-digit suffix

Therefore, after pass k+1, the array is correctly sorted with respect to the (k+1) least significant digits. ∎

invariant.md
1
2
3
4
5
6
7
8
9
INVARIANT: After processing digits at positions 0, 1, 2, ..., k-1 (right to left),
           for all pairs of numbers (a, b) where a appears before b in the array:
 
           Either:
           1. The k-digit suffix of a is less than the k-digit suffix of b, OR
           2. The k-digit suffixes are equal (their relative order doesn't matter yet)
 
CONSEQUENCE: When k = d (number of digits), the "suffix" is the entire number,
             so condition 1 implies a < b, meaning the array is sorted.

Why does MSD work differently?

MSD correctness relies on a different principle: divide-and-conquer with lexicographic ordering.

After partitioning by the first digit, all numbers starting with 1 come before those starting with 2, etc. This is globally correct because the first digit has the highest weight.
Within each partition, the first digit is constant. The relative order of numbers within a partition depends only on their remaining digits (positions 2 through d).
Recursively sorting each partition solves independent subproblems, eventually reaching single elements or examining all digits.

The key insight: MSD is correct because higher-order digits dominate lower-order digits in determining value. Once we partition by digit k, numbers in different partitions have their relative order fixed forever.

The Deep Connection

LSD radix sort builds correctness from the bottom up—least significant decisions first, preserved by stability. MSD radix sort builds correctness from the top down—most significant decisions first, independence via recursion. Both exploit the positional structure of numeral representations, just in opposite directions.

Practical Implementation Considerations

Moving from algorithm description to practical implementation requires addressing several key details:

Extracting digits efficiently:

For base-10 numbers, extracting the digit at position p:

digit = (number / 10^p) % 10

For base-2^k (binary, octal, hexadecimal), use bit manipulation:

digit = (number >> (k * p)) & ((1 << k) - 1)

The second form is significantly faster because division and modulo by powers of 2 are simple bit operations.

lsd_radix_sort.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def lsd_radix_sort(arr):
    """
    LSD Radix Sort for 32-bit non-negative integers.
    Uses base 256 (8 bits per digit) for efficiency.
    
    This means 4 passes for 32-bit integers (32 / 8 = 4).
    """
    if len(arr) <= 1:
        return arr
    
    RADIX = 256  # 2^8, process 8 bits at a time
    MASK = RADIX - 1  # 0xFF, used for extracting 8 bits
    
    # For 32-bit integers, we need 4 passes (0, 8, 16, 24 bit positions)
    for shift in range(0, 32, 8):
        # Count occurrences of each digit
        count = [0] * RADIX
        for num in arr:
            digit = (num >> shift) & MASK
            count[digit] += 1
        
        # Convert counts to cumulative positions
        for i in range(1, RADIX):
            count[i] += count[i - 1]
        
        # Build output array (traverse backwards for stability)
        output = [0] * len(arr)
        for i in range(len(arr) - 1, -1, -1):
            digit = (arr[i] >> shift) & MASK
            count[digit] -= 1
            output[count[digit]] = arr[i]
        
        arr = output  # Use output as input for next pass
    
    return arr
 
 
# Example usage with detailed trace
numbers = [329, 457, 657, 839, 436, 720, 355]
print(f"Original: {numbers}")
print(f"Sorted:   {lsd_radix_sort(numbers)}")

Why base 256?

Using base 256 (2⁸) instead of base 10 offers multiple advantages:

Fewer passes: 32-bit integers require only 4 passes in base-256 vs 10 passes in base-10 (for 10-digit numbers)
Efficient extraction: Bit shifting and masking are single-cycle operations on modern CPUs
Cache-friendly counts: 256-element count arrays fit comfortably in L1 cache
Power-of-two alignment: Computer memory and arithmetic are optimized for powers of 2

The trade-off is a larger count array (256 vs 10 elements), but this is negligible for modern systems.

Radix Choice Impact on 32-bit Integer Sorting
Radix	Bits per digit	Number of passes	Count array size	Trade-off
2	1	32	2	Too many passes—overhead dominates
4	2	16	4	Still too many passes
16	4	8	16	Reasonable for small datasets
256	8	4	256	Optimal for most cases ✓
65536	16	2	65536	Count array too large for L1 cache

Handling Negative Numbers

Standard radix sort assumes non-negative integers. For signed integers, you must either: (1) Separate negatives, sort each group, then concatenate negatives (reversed) with positives, or (2) Flip the sign bit during sorting to treat the most significant bit correctly (negative numbers have sign bit 1, which would otherwise sort them as larger than positives).

Visualizing Radix Sort Execution

A visual trace helps solidify understanding of how LSD radix sort progressively builds the sorted order. Let's trace through sorting [170, 045, 075, 090, 002, 024, 802, 066] with base 10:

trace.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
INITIAL ARRAY: [170, 045, 075, 090, 002, 024, 802, 066]
 
═══════════════════════════════════════════════════════════════════
PASS 1: Sort by 1s digit (rightmost)
═══════════════════════════════════════════════════════════════════
Number:  170   045   075   090   002   024   802   066
1s digit: 0     5     5     0     2     4     2     6
 
Buckets after distribution:
  Digit 0: [170, 090]
  Digit 2: [002, 802]
  Digit 4: [024]
  Digit 5: [045, 075]
  Digit 6: [066]
 
After Pass 1: [170, 090, 002, 802, 024, 045, 075, 066]
               │    │    │    │    │    │    │    │
              0s   0s   2s   2s   4s   5s   5s   6s
              
═══════════════════════════════════════════════════════════════════
PASS 2: Sort by 10s digit
═══════════════════════════════════════════════════════════════════
Input:   [170, 090, 002, 802, 024, 045, 075, 066]
10s digit: 7     9     0     0     2     4     7     6
 
Buckets after distribution:
  Digit 0: [002, 802]  ← Note: 002 before 802 (stable from Pass 1!)
  Digit 2: [024]
  Digit 4: [045]
  Digit 6: [066]
  Digit 7: [170, 075]  ← Note: 170 before 075 (stable from Pass 1!)
  Digit 9: [090]
 
After Pass 2: [002, 802, 024, 045, 066, 170, 075, 090]
               │    │    │    │    │    │    │    │
              0x   0x   2x   4x   6x   7x   7x   9x
 
═══════════════════════════════════════════════════════════════════
PASS 3: Sort by 100s digit (leftmost)
═══════════════════════════════════════════════════════════════════
Input:    [002, 802, 024, 045, 066, 170, 075, 090]
100s digit: 0     8     0     0     0     1     0     0
 
Buckets after distribution:
  Digit 0: [002, 024, 045, 066, 075, 090]  ← All preserved in stable order!
  Digit 1: [170]
  Digit 8: [802]
 
FINAL:    [002, 024, 045, 066, 075, 090, 170, 802]  ✓ SORTED!
 
═══════════════════════════════════════════════════════════════════
KEY INSIGHT: The stable sort at each pass preserved the ordering from
previous passes. After Pass 1, 002 was before 802 (2 < 2, but 002 came
first originally). This order was maintained through Pass 2 and Pass 3
because they have the same 10s digit (0) and same 100s digit (0/8).

The Beauty of LSD Radix Sort

Each pass "locks in" correct relative order for numbers that share the current digit. By the final pass, ALL relative orders are correct because we've processed all digits. The stability property acts as a "memory" carrying forward the decisions made in earlier passes.

Summary: The Power of Digit-Based Sorting

We've explored the fundamental principles of radix sort—a powerful technique that escapes comparison-based sorting's lower bounds by exploiting the structure of positional numeral representations.

Key Takeaways

•Numbers are sequences of digits — This perspective enables processing one digit position at a time
•LSD processes right-to-left — Iterative approach requiring stability at each pass to preserve earlier ordering decisions
•MSD processes left-to-right — Recursive approach that partitions by most significant digit first, with independent subproblems
•Stability is critical for LSD — Without stable sorting at each pass, earlier ordering decisions would be destroyed
•Correctness comes from invariants — LSD: 'sorted by k rightmost digits after k passes'; MSD: 'correct relative order established by dominant digits'
•Radix choice affects performance — Base 256 is typically optimal for 32-bit integers, balancing passes vs cache efficiency

What's next:

Now that we understand the fundamental mechanism of radix sort—processing digits one position at a time—we'll examine how radix sort uses counting sort as a subroutine. This stable, linear-time sorting algorithm is the engine that powers each pass of radix sort, and understanding it deeply reveals why radix sort achieves its remarkable performance characteristics.

Page Complete

You now understand how radix sort processes numbers digit-by-digit, the critical distinction between LSD and MSD approaches, and why stability is essential for LSD correctness. In the next page, we'll examine the counting sort subroutine that enables radix sort's linear-time performance.

1 / 4

Loading learning content...

Data Structures & AlgorithmsRadix Sort

Radix Sort — Digit-Based Sorting

LevelIntermediate

Duration50 mins

TopicRadix Sort

1 / 4

Sorting by Digits (LSD or MSD)

Beyond Comparison: A Different Way to Sort

What You Will Learn

The Fundamental Insight: Numbers as Sequences

This is fundamentally different from comparison-based sorting:

Paradigm Comparison: Comparison-Based vs Digit-Based Sorting
Aspect	Comparison-Based (e.g., Quicksort)	Digit-Based (Radix Sort)
Core operation	Compare two complete values	Group by individual digits
Information per step	Binary (less than or greater)	k-way (digit value 0 to k-1)
Lower bound	Ω(n log n) comparisons	No comparison lower bound applies
Key insight	Total ordering from pairwise comparisons	Positional notation encodes magnitude
Applicable to	Any comparable elements	Elements with positional structure

Why does positional notation matter?

In our decimal system, the position of a digit determines its contribution to the value:

4271 = 4×10³ + 2×10² + 7×10¹ + 1×10⁰
     = 4000  +  200  +  70   +  1

Critically, the leftmost digits contribute more to the value than rightmost digits. This means:

If two numbers differ in their leftmost digit, we know their relative order without examining other digits
If two numbers share the same leftmost digit, their relative order depends on subsequent digits

Radix sort exploits this positional structure systematically, processing digits in a specific order to ensure correctness.

The Radix Concept

The Two Fundamental Approaches: LSD and MSD

Given that numbers are sequences of digits, a natural question arises: in what order should we process the digits? There are two fundamental approaches, each with distinct characteristics:

Least Significant Digit First (LSD Radix Sort): Start from the rightmost digit and work toward the leftmost
Most Significant Digit First (MSD Radix Sort): Start from the leftmost digit and work toward the rightmost

Remarkably, both approaches can produce correct sorted output—but they work in fundamentally different ways and have different properties.

LSD Radix Sort (Right to Left)

•Process order: Rightmost → Leftmost
•Sorting strategy: Sort entire array by each digit position, proceeding through all positions
•Stability requirement: MUST be stable at each pass
•Recursion: Not required—purely iterative
•Memory pattern: Consistent across all passes
•Variable-length strings: Requires padding

MSD Radix Sort (Left to Right)

•Process order: Leftmost → Rightmost
•Sorting strategy: Partition by first digit, then recursively sort each partition
•Stability requirement: Stable if implemented carefully
•Recursion: Typically recursive or stack-based
•Memory pattern: Can vary with partition sizes
•Variable-length strings: Natural fit

The stability imperative for LSD:

Why does LSD require stability? Consider what happens after multiple passes:

After sorting by the 1s digit, all numbers ending in 0 come first, then those ending in 1, etc.
When we then sort by the 10s digit, numbers with the same 10s digit must remain in their order from step 1
This ensures that 23 comes before 25 (same 10s digit, but 3 < 5 in 1s digit, established in step 1)

Without stability, earlier sorting decisions would be destroyed by later passes, producing incorrect results.

Critical: Stability in LSD Radix Sort

LSD Radix Sort: The Iterative Approach

Algorithm Overview:

LSD_RADIX_SORT(array, d):
    for digit_position = 0 to d-1:    // 0 is rightmost
        STABLE_SORT(array by digit at position digit_position)
    return array

Let's trace through a complete example to build deep intuition.

Comprehensive Example:

Consider sorting the following 3-digit numbers:

Initial array: [329, 457, 657, 839, 436, 720, 355]

Pass 1: Sort by 1s digit (rightmost)

We extract the 1s digit of each number and group accordingly:

329 → digit 9
457 → digit 7
657 → digit 7
839 → digit 9
436 → digit 6
720 → digit 0
355 → digit 5

After stable sorting by this digit:

[720, 355, 436, 457, 657, 329, 839]
  0    5    6    7    7    9    9

Note: 457 appears before 657 (both have digit 7), preserving their original relative order. Similarly, 329 appears before 839 (both have digit 9).

Pass 2: Sort by 10s digit

Now we sort by the 10s digit, maintaining the order from Pass 1:

720 → digit 2
355 → digit 5
436 → digit 3
457 → digit 5
657 → digit 5
329 → digit 2
839 → digit 3

After stable sorting by this digit:

[720, 329, 436, 839, 355, 457, 657]
  2    2    3    3    5    5    5

Similarly, 355 < 457 < 657 because their 1s digits (5, 7, 7) were established in Pass 1, and among 457 and 657 (same 1s digit), their original order was preserved.

Pass 3: Sort by 100s digit (leftmost)

Final pass sorts by the most significant digit:

720 → digit 7
329 → digit 3
436 → digit 4
839 → digit 8
355 → digit 3
457 → digit 4
657 → digit 6

After stable sorting:

[329, 355, 436, 457, 657, 720, 839]
  3    3    4    4    6    7    8

The array is now completely sorted!

The Inductive Invariant

MSD Radix Sort: The Recursive Approach

Algorithm Overview:

MSD_RADIX_SORT(array, digit_position, start, end):
    if start >= end or digit_position > max_digits:
        return
    
    // Partition array[start:end] into buckets by current digit
    buckets = PARTITION_BY_DIGIT(array, start, end, digit_position)
    
    // Recursively sort each bucket
    for each bucket in buckets:
        MSD_RADIX_SORT(bucket, digit_position + 1, bucket.start, bucket.end)

Example: MSD Radix Sort in Action

Initial array: [329, 457, 657, 839, 436, 720, 355]

Pass 1: Partition by 100s digit (leftmost)

Digit 3: [329, 355]
Digit 4: [457, 436]
Digit 6: [657]
Digit 7: [720]
Digit 8: [839]

Pass 2: Recursively sort each bucket by 10s digit

Bucket [329, 355] → sorted by 10s digit:

329 has 10s digit 2
355 has 10s digit 5
Result: [329, 355] ✓

Bucket [457, 436] → sorted by 10s digit:

457 has 10s digit 5
436 has 10s digit 3
Result: [436, 457] ✓

Single-element buckets [657], [720], [839] are already sorted.

Final result after concatenation:

[329, 355, 436, 457, 657, 720, 839] ✓

LSD vs MSD Radix Sort: Detailed Comparison
Characteristic	LSD Radix Sort	MSD Radix Sort
Processing order	Right to left (1s → 10s → 100s...)	Left to right (100s → 10s → 1s...)
Algorithm structure	Iterative: d sequential passes	Recursive: partition then recurse
Bucket handling	All elements in single pass	Separate recursion per bucket
Early termination	Always processes all d digits	Can stop when buckets are single elements
Stability requirement	Critical—correctness depends on it	Optional—recursion handles subproblems
Cache behavior	Consistent—same pattern each pass	Variable—depends on bucket sizes
Variable-length keys	Requires padding to max length	Natural handling—shorter keys end recursion early
Parallelization	Each pass is inherently sequential	Independent buckets can parallelize

When to choose MSD over LSD:

Variable-length strings: MSD naturally handles strings of different lengths. "cat" and "category" can be sorted without padding because once we reach the end of "cat", it's placed before any longer string sharing that prefix.
Partially sorted output needed: MSD can provide lazy evaluation—you can get the smallest k elements without fully sorting the array.
High-entropy leading digits: When the most significant digits effectively separate the data into small buckets, MSD can terminate early.
Parallelizable workloads: Independent buckets after the first pass can be sorted in parallel.

MSD and String Sorting

The Mathematics of Correctness

Understanding why radix sort works—not just that it works—requires examining the mathematical invariants that guarantee correctness.

Theorem (LSD Radix Sort Correctness):

Given an array of numbers where each number has at most d digits, LSD radix sort using a stable sort at each digit position produces a correctly sorted array.

Proof by induction on the number of passes:

Base case (k=1): After one pass (sorting by the 1s digit), the array is correctly sorted if we only consider the 1s digit. All numbers ending in 0 precede those ending in 1, etc.

Inductive step: Assume after k passes, the array is correctly sorted with respect to the k least significant digits. Consider what happens in pass k+1 (sorting by digit position k+1):

Numbers with different digits at position k+1 are correctly ordered by that digit
Numbers with the same digit at position k+1 maintain their relative order from before this pass (by stability)
By the inductive hypothesis, that relative order was correct for the k-digit suffix

Therefore, after pass k+1, the array is correctly sorted with respect to the (k+1) least significant digits. ∎

invariant.md
1
2
3
4
5
6
7
8
9
INVARIANT: After processing digits at positions 0, 1, 2, ..., k-1 (right to left),
           for all pairs of numbers (a, b) where a appears before b in the array:
 
           Either:
           1. The k-digit suffix of a is less than the k-digit suffix of b, OR
           2. The k-digit suffixes are equal (their relative order doesn't matter yet)
 
CONSEQUENCE: When k = d (number of digits), the "suffix" is the entire number,
             so condition 1 implies a < b, meaning the array is sorted.

Why does MSD work differently?

MSD correctness relies on a different principle: divide-and-conquer with lexicographic ordering.

After partitioning by the first digit, all numbers starting with 1 come before those starting with 2, etc. This is globally correct because the first digit has the highest weight.
Within each partition, the first digit is constant. The relative order of numbers within a partition depends only on their remaining digits (positions 2 through d).
Recursively sorting each partition solves independent subproblems, eventually reaching single elements or examining all digits.

The Deep Connection

Practical Implementation Considerations

Moving from algorithm description to practical implementation requires addressing several key details:

Extracting digits efficiently:

For base-10 numbers, extracting the digit at position p:

digit = (number / 10^p) % 10

For base-2^k (binary, octal, hexadecimal), use bit manipulation:

digit = (number >> (k * p)) & ((1 << k) - 1)

The second form is significantly faster because division and modulo by powers of 2 are simple bit operations.

lsd_radix_sort.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def lsd_radix_sort(arr):
    """
    LSD Radix Sort for 32-bit non-negative integers.
    Uses base 256 (8 bits per digit) for efficiency.
    
    This means 4 passes for 32-bit integers (32 / 8 = 4).
    """
    if len(arr) <= 1:
        return arr
    
    RADIX = 256  # 2^8, process 8 bits at a time
    MASK = RADIX - 1  # 0xFF, used for extracting 8 bits
    
    # For 32-bit integers, we need 4 passes (0, 8, 16, 24 bit positions)
    for shift in range(0, 32, 8):
        # Count occurrences of each digit
        count = [0] * RADIX
        for num in arr:
            digit = (num >> shift) & MASK
            count[digit] += 1
        
        # Convert counts to cumulative positions
        for i in range(1, RADIX):
            count[i] += count[i - 1]
        
        # Build output array (traverse backwards for stability)
        output = [0] * len(arr)
        for i in range(len(arr) - 1, -1, -1):
            digit = (arr[i] >> shift) & MASK
            count[digit] -= 1
            output[count[digit]] = arr[i]
        
        arr = output  # Use output as input for next pass
    
    return arr
 
 
# Example usage with detailed trace
numbers = [329, 457, 657, 839, 436, 720, 355]
print(f"Original: {numbers}")
print(f"Sorted:   {lsd_radix_sort(numbers)}")

Why base 256?

Using base 256 (2⁸) instead of base 10 offers multiple advantages:

Fewer passes: 32-bit integers require only 4 passes in base-256 vs 10 passes in base-10 (for 10-digit numbers)
Efficient extraction: Bit shifting and masking are single-cycle operations on modern CPUs
Cache-friendly counts: 256-element count arrays fit comfortably in L1 cache
Power-of-two alignment: Computer memory and arithmetic are optimized for powers of 2

The trade-off is a larger count array (256 vs 10 elements), but this is negligible for modern systems.

Radix Choice Impact on 32-bit Integer Sorting
Radix	Bits per digit	Number of passes	Count array size	Trade-off
2	1	32	2	Too many passes—overhead dominates
4	2	16	4	Still too many passes
16	4	8	16	Reasonable for small datasets
256	8	4	256	Optimal for most cases ✓
65536	16	2	65536	Count array too large for L1 cache

Handling Negative Numbers

Visualizing Radix Sort Execution

A visual trace helps solidify understanding of how LSD radix sort progressively builds the sorted order. Let's trace through sorting [170, 045, 075, 090, 002, 024, 802, 066] with base 10:

trace.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
INITIAL ARRAY: [170, 045, 075, 090, 002, 024, 802, 066]
 
═══════════════════════════════════════════════════════════════════
PASS 1: Sort by 1s digit (rightmost)
═══════════════════════════════════════════════════════════════════
Number:  170   045   075   090   002   024   802   066
1s digit: 0     5     5     0     2     4     2     6
 
Buckets after distribution:
  Digit 0: [170, 090]
  Digit 2: [002, 802]
  Digit 4: [024]
  Digit 5: [045, 075]
  Digit 6: [066]
 
After Pass 1: [170, 090, 002, 802, 024, 045, 075, 066]
               │    │    │    │    │    │    │    │
              0s   0s   2s   2s   4s   5s   5s   6s
              
═══════════════════════════════════════════════════════════════════
PASS 2: Sort by 10s digit
═══════════════════════════════════════════════════════════════════
Input:   [170, 090, 002, 802, 024, 045, 075, 066]
10s digit: 7     9     0     0     2     4     7     6
 
Buckets after distribution:
  Digit 0: [002, 802]  ← Note: 002 before 802 (stable from Pass 1!)
  Digit 2: [024]
  Digit 4: [045]
  Digit 6: [066]
  Digit 7: [170, 075]  ← Note: 170 before 075 (stable from Pass 1!)
  Digit 9: [090]
 
After Pass 2: [002, 802, 024, 045, 066, 170, 075, 090]
               │    │    │    │    │    │    │    │
              0x   0x   2x   4x   6x   7x   7x   9x
 
═══════════════════════════════════════════════════════════════════
PASS 3: Sort by 100s digit (leftmost)
═══════════════════════════════════════════════════════════════════
Input:    [002, 802, 024, 045, 066, 170, 075, 090]
100s digit: 0     8     0     0     0     1     0     0
 
Buckets after distribution:
  Digit 0: [002, 024, 045, 066, 075, 090]  ← All preserved in stable order!
  Digit 1: [170]
  Digit 8: [802]
 
FINAL:    [002, 024, 045, 066, 075, 090, 170, 802]  ✓ SORTED!
 
═══════════════════════════════════════════════════════════════════
KEY INSIGHT: The stable sort at each pass preserved the ordering from
previous passes. After Pass 1, 002 was before 802 (2 < 2, but 002 came
first originally). This order was maintained through Pass 2 and Pass 3
because they have the same 10s digit (0) and same 100s digit (0/8).

The Beauty of LSD Radix Sort

Summary: The Power of Digit-Based Sorting

We've explored the fundamental principles of radix sort—a powerful technique that escapes comparison-based sorting's lower bounds by exploiting the structure of positional numeral representations.

Key Takeaways

•Numbers are sequences of digits — This perspective enables processing one digit position at a time
•LSD processes right-to-left — Iterative approach requiring stability at each pass to preserve earlier ordering decisions
•MSD processes left-to-right — Recursive approach that partitions by most significant digit first, with independent subproblems
•Stability is critical for LSD — Without stable sorting at each pass, earlier ordering decisions would be destroyed
•Correctness comes from invariants — LSD: 'sorted by k rightmost digits after k passes'; MSD: 'correct relative order established by dominant digits'
•Radix choice affects performance — Base 256 is typically optimal for 32-bit integers, balancing passes vs cache efficiency

What's next:

Page Complete

1 / 4