Radix Sort - Learning Module

Loading content...

0/276

Using Counting Sort as Subroutine

The Engine Behind Radix Sort

Every great algorithm has components that make it work. For radix sort, the crucial component is counting sort—a simple yet powerful algorithm that sorts elements in linear time when the range of values is limited. Understanding how counting sort operates as radix sort's subroutine illuminates both algorithms and reveals why their combination is so effective.

In the previous page, we established that radix sort performs multiple passes, each sorting by one digit position. We emphasized that stability is critical for LSD radix sort's correctness. Now we'll see why counting sort is the perfect choice for each pass: it's naturally stable, runs in O(n + k) time where k is the number of distinct digit values, and uses only simple array operations.

What You Will Learn

By the end of this page, you will deeply understand: (1) How counting sort achieves linear-time sorting for bounded-range data, (2) The three-phase structure of counting sort: count, accumulate, place, (3) Why counting sort is naturally stable and how to implement it correctly, (4) How radix sort invokes counting sort at each pass with different digit extraction, and (5) The symbiotic relationship between radix sort's structure and counting sort's properties.

Counting Sort: Breaking the Comparison Barrier

Before diving into how counting sort serves radix sort, let's understand counting sort as a standalone algorithm.

The problem counting sort solves:

Given an array of n elements where each element is an integer in the range [0, k-1], sort the array in O(n + k) time.

Why can counting sort beat O(n log n)?

Comparison-based sorting algorithms like merge sort and quicksort have a lower bound of Ω(n log n) because they gain at most 1 bit of information per comparison. Counting sort sidesteps this bound because it doesn't use comparisons. Instead, it uses each element's actual value to determine its position directly.

This is analogous to the difference between:

Sorting cards by repeatedly comparing pairs (comparison-based)
Having 52 labeled slots and placing each card directly in its slot (counting-based)

The core insight:

If we know that a value x appears in the array, and we know exactly how many values are less than x, we know exactly where x should be placed in the sorted output.

For example, if x = 5 and there are exactly 7 values less than 5 in the array, then x belongs at index 7 (0-indexed). If there are multiple copies of 5, they belong at indices 7, 8, 9, ... The challenge is computing, for each value, how many values are smaller—which is exactly what counting accomplishes.

Key Terminology

Bounded range: The requirement that elements fall within a known, limited range [0, k-1]. This is essential because counting sort allocates an array of size k. If k is very large relative to n, counting sort becomes impractical.

Non-comparison sorting: Algorithms that determine order without pairwise comparisons. Besides counting sort, bucket sort and radix sort are non-comparison algorithms.

The Three Phases of Counting Sort

Counting sort operates in three distinct phases, each building on the previous. Understanding these phases is crucial for seeing how stability emerges and how the algorithm achieves linear time.

The Three Phases

•Phase 1: Count — Iterate through the input array once, counting how many times each value appears. Store counts in a count array where count[v] = number of occurrences of value v.
•Phase 2: Accumulate — Transform the count array into a cumulative sum array. After this phase, count[v] = number of elements ≤ v. This tells us where each value's block ends in the output.
•Phase 3: Place — Iterate through the input array (backwards for stability), using the cumulative counts to place each element in its correct position. Decrement the count after each placement.

counting_sort.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def counting_sort(arr, k):
    """
    Counting sort for integers in range [0, k-1].
    
    Parameters:
        arr: List of integers to sort
        k: Upper bound (exclusive) of the range, i.e., elements are in [0, k-1]
    
    Returns:
        New sorted list (stable)
    
    Time Complexity: O(n + k)
    Space Complexity: O(n + k)
    """
    n = len(arr)
    if n == 0:
        return []
    
    # ═══════════════════════════════════════════════════════════════
    # PHASE 1: COUNT
    # Count occurrences of each value
    # ═══════════════════════════════════════════════════════════════
    count = [0] * k
    for val in arr:
        count[val] += 1
    
    # At this point, count[v] = number of elements with value v
    # Example: arr = [3, 1, 4, 1, 5], k = 6
    #          count = [0, 2, 0, 1, 1, 1]
    #                   0  1  2  3  4  5  ← values
    
    # ═══════════════════════════════════════════════════════════════
    # PHASE 2: ACCUMULATE
    # Convert counts to cumulative positions
    # ═══════════════════════════════════════════════════════════════
    for i in range(1, k):
        count[i] += count[i - 1]
    
    # At this point, count[v] = number of elements ≤ v
    # Equivalently, count[v] is the index AFTER the last position for value v
    # Example: count = [0, 2, 2, 3, 4, 5]
    #          - 2 elements are ≤ 1
 
    #          - 3 elements are ≤ 3
    #          - 5 elements are ≤ 5 (total)
    
    # ═══════════════════════════════════════════════════════════════
    # PHASE 3: PLACE
    # Build output array by placing elements at correct positions
    # ═══════════════════════════════════════════════════════════════
    output = [0] * n
    
    # CRITICAL: Iterate BACKWARDS for stability
    for i in range(n - 1, -1, -1):
        val = arr[i]
        count[val] -= 1           # Decrement first to get 0-indexed position
        output[count[val]] = val
        
    # Why backwards? See detailed explanation below.
    
    return output

Detailed trace through counting sort:

Let's trace sorting [3, 1, 4, 1, 5, 9, 2, 6] with k = 10:

Input: [3, 1, 4, 1, 5, 9, 2, 6]

PHASE 1: COUNT
count[0] = 0, count[1] = 2, count[2] = 1, count[3] = 1
count[4] = 1, count[5] = 1, count[6] = 1, count[9] = 1

count = [0, 2, 1, 1, 1, 1, 1, 0, 0, 1]
         0  1  2  3  4  5  6  7  8  9

PHASE 2: ACCUMULATE
count = [0, 2, 3, 4, 5, 6, 7, 7, 7, 8]
         0  1  2  3  4  5  6  7  8  9

Reading: 2 elements ≤ 1, 3 elements ≤ 2, 4 elements ≤ 3, etc.
This means: value 1 should occupy indices 0-1, value 2 at index 2,
value 3 at index 3, value 4 at index 4, value 5 at index 5, etc.

PHASE 3: PLACE (backwards iteration)
i=7: arr[7]=6, count[6]=7→6, output[6]=6  →  [_, _, _, _, _, _, 6, _]
i=6: arr[6]=2, count[2]=3→2, output[2]=2  →  [_, _, 2, _, _, _, 6, _]
i=5: arr[5]=9, count[9]=8→7, output[7]=9  →  [_, _, 2, _, _, _, 6, 9]
i=4: arr[4]=5, count[5]=6→5, output[5]=5  →  [_, _, 2, _, _, 5, 6, 9]
i=3: arr[3]=1, count[1]=2→1, output[1]=1  →  [_, 1, 2, _, _, 5, 6, 9]
i=2: arr[2]=4, count[4]=5→4, output[4]=4  →  [_, 1, 2, _, 4, 5, 6, 9]
i=1: arr[1]=1, count[1]=1→0, output[0]=1  →  [1, 1, 2, _, 4, 5, 6, 9]
i=0: arr[0]=3, count[3]=4→3, output[3]=3  →  [1, 1, 2, 3, 4, 5, 6, 9]

Final output: [1, 1, 2, 3, 4, 5, 6, 9] ✓

Why Backwards Iteration Creates Stability

The backwards iteration in Phase 3 is crucial for stability—but why? This is a subtle point that deserves careful analysis.

Recall the stability requirement:

A stable sort preserves the relative order of elements with equal keys. If element A appears before element B in the input, and they have the same key, then A should appear before B in the output.

The mechanics of cumulative counting:

After Phase 2, count[v] represents the number of elements ≤ v, which is also the position after where the last element with value v should be placed.

When we place an element with value v:

We decrement count[v] to get the actual position
We place the element at that position
The next element with value v (if any) will get the previous position

Why backwards?

Consider elements with the same value v. When iterating backwards:

The last occurrence in the input gets placed at the highest position for value v
The second-to-last occurrence gets the second-highest position
The first occurrence (encountered last) gets the lowest position

This means elements with equal keys maintain their original relative order!

stability_trace.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Consider input: [A₁, B, A₂, C, A₃]  where A₁, A₂, A₃ all have value 5
(The subscripts indicate original order, not part of the actual values)
 
After Phase 2, suppose count[5] = 4 (meaning 4 elements ≤ 5, so values 5 go in positions 1-3)
 
BACKWARDS iteration (correct - stable):
  i=4: Place A₃ at position count[5]-1 = 3, decrement → count[5]=3
  i=3: Place C at its position...
  i=2: Place A₂ at position count[5]-1 = 2, decrement → count[5]=2
  i=1: Place B at its position...
  i=0: Place A₁ at position count[5]-1 = 1, decrement → count[5]=1
 
Result: [..., A₁, A₂, A₃, ...]  ← STABLE! Original order preserved.
 
FORWARDS iteration (incorrect - unstable):
  i=0: Place A₁ at position count[5]-1 = 3, decrement → count[5]=3
  i=1: Place B at its position...
  i=2: Place A₂ at position count[5]-1 = 2, decrement → count[5]=2
  i=3: Place C at its position...
  i=4: Place A₃ at position count[5]-1 = 1, decrement → count[5]=1
 
Result: [..., A₃, A₂, A₁, ...]  ← UNSTABLE! Order reversed.

Common Implementation Bug

Many counting sort implementations fail to be stable because they iterate forwards instead of backwards. For standalone sorting of integers, this doesn't affect correctness—[1, 1, 1] is sorted regardless of which '1' came first. But when counting sort is used as a subroutine for radix sort, stability is ESSENTIAL. Each integer carries information from previous passes that must be preserved.

Alternative formulation for stability:

Some implementations count positions differently to enable forward iteration while maintaining stability. Instead of cumulative counts representing positions after each value's block, they represent positions before:

# Alternative Phase 2: Prefix sum of counts
for i in range(1, k):
    count[i] = count[i - 1]  # Position BEFORE (not including) current value

# Alternative Phase 3: Forward iteration
for i in range(n):
    val = arr[i]
    output[count[val]] = val
    count[val] += 1  # Move to next position for this value

Both formulations achieve stability; the key is consistency between how counts are computed and how elements are placed.

Integrating Counting Sort with Radix Sort

Now we connect the pieces: how does radix sort use counting sort as its subroutine?

The key adaptation:

In standalone counting sort, we sort by the element's value directly. In radix sort's use of counting sort, we sort by the digit at a specific position. The element itself is preserved—only the key used for sorting changes.

radix_with_counting.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def get_digit(number, position, base):
    """
    Extract digit at given position from number.
    Position 0 is rightmost (least significant).
    
    For base 10:  get_digit(4271, 0, 10) = 1
                  get_digit(4271, 2, 10) = 2
    
    For base 256: get_digit(0x4271, 0, 256) = 0x71
                  get_digit(0x4271, 1, 256) = 0x42
    """
    return (number // (base ** position)) % base
 
 
def counting_sort_by_digit(arr, position, base):
    """
    Counting sort stable subroutine for radix sort.
    Sorts by the digit at the specified position.
    
    The elements are the original numbers; the KEY is the digit.
    This distinction is what makes radix sort work.
    """
    n = len(arr)
    output = [0] * n
    count = [0] * base
    
    # Phase 1: Count occurrences of each digit (not each number!)
    for num in arr:
        digit = get_digit(num, position, base)
        count[digit] += 1
    
    # Phase 2: Cumulative count
    for i in range(1, base):
        count[i] += count[i - 1]
    
    # Phase 3: Place elements by their digit, preserving the WHOLE number
    for i in range(n - 1, -1, -1):
        num = arr[i]
        digit = get_digit(num, position, base)
        count[digit] -= 1
        output[count[digit]] = num  # Place the ENTIRE number, not just digit
    
    return output
 
 
def radix_sort(arr, base=10):
    """
    LSD Radix Sort using counting sort as stable subroutine.
    """
    if len(arr) == 0:
        return arr
    
    # Determine number of digits in the maximum element
    max_val = max(arr)
    num_digits = 0
    temp = max_val
    while temp > 0:
        num_digits += 1
        temp //= base
    
    # Handle case where max is 0
    if num_digits == 0:
        num_digits = 1
    
    # Apply counting sort for each digit position
    result = arr[:]
    for position in range(num_digits):
        print(f"Pass {position + 1}: Sorting by digit at position {position}")
        result = counting_sort_by_digit(result, position, base)
        print(f"  Result: {result}")
    
    return result
 
 
# Demonstration
numbers = [329, 457, 657, 839, 436, 720, 355]
print(f"Input: {numbers}")
print()
sorted_numbers = radix_sort(numbers, base=10)
print()
print(f"Sorted: {sorted_numbers}")

Execution trace of the integration:

Input: [329, 457, 657, 839, 436, 720, 355]

Pass 1: Sorting by digit at position 0 (1s place)
  Digits: [9, 7, 7, 9, 6, 0, 5]
  Counting sort groups by these digits:
    0: [720]
    5: [355]
    6: [436]
    7: [457, 657]  ← Note: 457 before 657 (original order preserved)
    9: [329, 839]  ← Note: 329 before 839 (original order preserved)
  Result: [720, 355, 436, 457, 657, 329, 839]

Pass 2: Sorting by digit at position 1 (10s place)
  Digits: [2, 5, 3, 5, 5, 2, 3]
  Counting sort groups by these digits (preserving order from Pass 1):
    2: [720, 329]  ← 720 was first in Pass 1 result
    3: [436, 839]  ← 436 was first in Pass 1 result
    5: [355, 457, 657]  ← Order from Pass 1 preserved
  Result: [720, 329, 436, 839, 355, 457, 657]

Pass 3: Sorting by digit at position 2 (100s place)
  Digits: [7, 3, 4, 8, 3, 4, 6]
  Counting sort groups by these digits (preserving order from Pass 2):
    3: [329, 355]  ← 329 before 355 because of earlier passes!
    4: [436, 457]  ← 436 before 457 because of earlier passes!
    6: [657]
    7: [720]
    8: [839]
  Result: [329, 355, 436, 457, 657, 720, 839]  ✓ SORTED

The Critical Insight

At each pass, counting sort ONLY looks at the current digit—but it moves the ENTIRE number. The number 329 carries its 'history' (2 in tens place, 9 in ones place) through the algorithm. Stability ensures this history is respected: when two numbers share the same current digit, their relative order reflects all previous passes.

Complexity Analysis of the Combination

Understanding the complexity of radix sort requires analyzing both the outer loop (radix sort itself) and the inner operations (counting sort at each pass).

Counting sort complexity analysis:

For each pass of counting sort on n elements with digit range [0, k-1]:

Phase 1 (Count): O(n) — one pass through input array
Phase 2 (Accumulate): O(k) — one pass through count array
Phase 3 (Place): O(n) — one pass through input array
Space: O(n + k) — output array of size n, count array of size k

Total per pass: O(n + k) time, O(n + k) space

Radix sort complexity analysis:

If we have d digit positions and use radix/base k:

Number of passes: d
Work per pass: O(n + k)
Total time: O(d × (n + k))
Total space: O(n + k) (reused across passes)

The relationship between d and k:

For a maximum value M in the input:

With base k, we need d = ⌈log_k(M + 1)⌉ digits
Larger k means fewer passes (smaller d) but larger count arrays
Smaller k means more passes but smaller count arrays

Example calculations:

For 32-bit integers (M up to ~4 billion):

Base k	Digits d	Passes	Count array	Time complexity
2	32	32	2	O(32n) ≈ O(n)
10	10	10	10	O(10n) ≈ O(n)
256	4	4	256	O(4n) ≈ O(n)
65536	2	2	65536	O(2n) ≈ O(n)

Radix Choice Trade-offs for 32-bit Integers
Base	Passes (d)	Count Array Size (k)	Cache Behavior	Practical Speed
2	32	2	Excellent (fits L1)	Poor—too many passes
16	8	16	Excellent (fits L1)	Moderate—reasonable balance
256	4	256 (~1 KB)	Good (fits L1)	Excellent—optimal for most CPUs ✓
65536	2	65536 (~256 KB)	Poor (exceeds L1/L2)	Good for very large n, poor for small n

When Linear Time Isn't Linear

Radix sort's O(d(n+k)) complexity is often called 'linear' because for fixed-width integers (like 32-bit), d is constant. But if you're sorting arbitrary-precision integers where d grows with input size, radix sort may not be linear. Always consider what d represents for your specific use case.

Optimizations and Practical Considerations

Real-world radix sort implementations incorporate several optimizations beyond the basic algorithm:

1. Skip unnecessary passes:

If all numbers fit in fewer digits than the maximum, skip the higher-order passes:

def radix_sort_optimized(arr, base=256):
    if not arr:
        return arr
    
    max_val = max(arr)
    
    position = 0
    exp = 1
    result = arr[:]
    
    # Only iterate while there's a significant digit to process
    while max_val // exp > 0:
        result = counting_sort_by_digit(result, position, base)
        exp *= base
        position += 1
    
    return result

2. In-place counting sort:

The standard counting sort requires O(n) extra space for the output array. For some applications, we can use an in-place variant (though it's more complex and the output space is often acceptable).

3. Combining base choices:

Some implementations use different bases for different passes, optimizing cache behavior. For example, use base 256 for lower-order bytes and base 65536 for higher-order portions when n is large enough to amortize the larger count array.

4. Hybrid sorting:

For small subarrays (especially in MSD radix sort's recursive calls), switch to insertion sort. The overhead of counting sort for very small arrays can exceed the benefit.

production_radix_sort.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def radix_sort_production(arr):
    """
    Production-quality LSD radix sort for 32-bit non-negative integers.
    Uses base 256 (8 bits per pass) for optimal performance.
    
    Key optimizations:
    1. Bit operations instead of division/modulo
    2. Early termination when max value is exhausted
    3. Reuses count array across passes
    4. Single allocation for output array
    """
    if len(arr) <= 1:
        return arr[:]
    
    n = len(arr)
    BITS_PER_PASS = 8
    RADIX = 1 << BITS_PER_PASS  # 256
    MASK = RADIX - 1  # 0xFF
    
    max_val = max(arr)
    output = [0] * n
    current = arr[:]
    
    shift = 0
    while (max_val >> shift) > 0:
        # Count phase
        count = [0] * RADIX
        for num in current:
            digit = (num >> shift) & MASK
            count[digit] += 1
        
        # Check if all elements have the same digit (optimization)
        # If so, skip this pass entirely
        non_zero_buckets = sum(1 for c in count if c > 0)
        if non_zero_buckets == 1:
            shift += BITS_PER_PASS
            continue
        
        # Accumulate phase
        for i in range(1, RADIX):
            count[i] += count[i - 1]
        
        # Place phase (backwards for stability)
        for i in range(n - 1, -1, -1):
            num = current[i]
            digit = (num >> shift) & MASK
            count[digit] -= 1
            output[count[digit]] = num
        
        # Swap arrays for next iteration (avoid copy)
        current, output = output, current
        shift += BITS_PER_PASS
    
    return current

Why Bit Operations?

Using bit shifts and masks instead of division and modulo provides significant speedup. On most CPUs: • shift operation: 1 clock cycle • bitwise AND: 1 clock cycle • integer division: 10-40 clock cycles • modulo: 10-40 clock cycles

For millions of elements with multiple passes, this difference is substantial.

Why Counting Sort and Not Other Stable Sorts?

We've established that radix sort needs a stable subroutine. Counting sort is the standard choice, but other stable sorts exist. Why is counting sort preferred?

Comparison with other stable sorting algorithms:

Stable Sorts as Radix Sort Subroutines
Algorithm	Time per Pass	Space	Practical for Radix?	Why/Why Not
Counting Sort	O(n + k)	O(n + k)	✓ Yes	Perfect match—O(n) for bounded k
Merge Sort	O(n log n)	O(n)	✗ No	log n factor defeats purpose
Insertion Sort	O(n²)	O(1)	✗ No	Quadratic time is unacceptable
Bubble Sort	O(n²)	O(1)	✗ No	Even worse than insertion sort
TimSort	O(n log n)	O(n)	✗ No	log n factor defeats purpose

The critical observation:

If we used merge sort as the subroutine and had d digit positions:

Total time: O(d × n log n)
For d = 4: O(4 n log n) = O(n log n)

This is no better than just using merge sort directly! The whole point of radix sort is to achieve O(d(n+k)) which, for fixed d and k, is O(n).

Counting sort's O(n + k) complexity is essential because:

k is small (at most the radix, typically 256)
This makes each pass effectively O(n)
Total complexity O(d × n) ≈ O(n) for fixed d

Could we use bucket sort?

Yes! Bucket sort is essentially counting sort with lists instead of counts, then concatenation. Some implementations prefer this view:

def bucket_sort_by_digit(arr, position, base):
    buckets = [[] for _ in range(base)]
    
    for num in arr:
        digit = get_digit(num, position, base)
        buckets[digit].append(num)  # Automatically stable!
    
    result = []
    for bucket in buckets:
        result.extend(bucket)
    
    return result

This is equivalent to counting sort but uses more memory (list overhead) and has worse cache behavior. For large n, counting sort's array-based approach is faster.

The Synergy

Radix sort and counting sort form a perfect symbiosis:

• Counting sort needs bounded-range keys → digit values are bounded by the radix • Radix sort needs a stable O(n) subroutine → counting sort provides exactly this • Neither alone solves general integer sorting efficiently, but together they achieve O(d(n+k)) ≈ O(n)

Summary: The Counting Sort Engine

We've explored how counting sort serves as the engine powering radix sort's remarkable performance. This symbiotic relationship exemplifies elegant algorithm design: two specialized algorithms combining to solve a broader problem than either could alone.

Key Takeaways

•Counting sort achieves O(n + k) time — By counting occurrences and using cumulative sums to determine positions directly
•Three phases: Count, Accumulate, Place — Each phase is O(n) or O(k), making the total O(n + k)
•Backwards iteration ensures stability — Critical for radix sort correctness, as earlier pass decisions must be preserved
•Radix sort sorts by digit, preserves elements — Each counting sort pass uses the digit as key but moves entire numbers
•Base 256 is typically optimal — Balances number of passes (4 for 32-bit) against count array size (256 elements)
•No other stable sort achieves O(n) — Using merge sort would regress to O(n log n), defeating radix sort's purpose

What's next:

Now that we understand both the overall radix sort algorithm and its counting sort subroutine, we're ready to analyze the complete time complexity: O(d × (n + k)). The next page will dissect this expression, showing how it compares to O(n log n) algorithms and when radix sort truly shines.

Page Complete

You now understand how counting sort operates, why its stability is essential for radix sort, and how the two algorithms work together to achieve near-linear time complexity. Next, we'll analyze when this complexity truly beats comparison-based alternatives.