Loading content...
Every great algorithm has components that make it work. For radix sort, the crucial component is counting sort—a simple yet powerful algorithm that sorts elements in linear time when the range of values is limited. Understanding how counting sort operates as radix sort's subroutine illuminates both algorithms and reveals why their combination is so effective.
In the previous page, we established that radix sort performs multiple passes, each sorting by one digit position. We emphasized that stability is critical for LSD radix sort's correctness. Now we'll see why counting sort is the perfect choice for each pass: it's naturally stable, runs in O(n + k) time where k is the number of distinct digit values, and uses only simple array operations.
By the end of this page, you will deeply understand: (1) How counting sort achieves linear-time sorting for bounded-range data, (2) The three-phase structure of counting sort: count, accumulate, place, (3) Why counting sort is naturally stable and how to implement it correctly, (4) How radix sort invokes counting sort at each pass with different digit extraction, and (5) The symbiotic relationship between radix sort's structure and counting sort's properties.
Before diving into how counting sort serves radix sort, let's understand counting sort as a standalone algorithm.
The problem counting sort solves:
Given an array of n elements where each element is an integer in the range [0, k-1], sort the array in O(n + k) time.
Why can counting sort beat O(n log n)?
Comparison-based sorting algorithms like merge sort and quicksort have a lower bound of Ω(n log n) because they gain at most 1 bit of information per comparison. Counting sort sidesteps this bound because it doesn't use comparisons. Instead, it uses each element's actual value to determine its position directly.
This is analogous to the difference between:
The core insight:
If we know that a value x appears in the array, and we know exactly how many values are less than x, we know exactly where x should be placed in the sorted output.
For example, if x = 5 and there are exactly 7 values less than 5 in the array, then x belongs at index 7 (0-indexed). If there are multiple copies of 5, they belong at indices 7, 8, 9, ... The challenge is computing, for each value, how many values are smaller—which is exactly what counting accomplishes.
Bounded range: The requirement that elements fall within a known, limited range [0, k-1]. This is essential because counting sort allocates an array of size k. If k is very large relative to n, counting sort becomes impractical.
Non-comparison sorting: Algorithms that determine order without pairwise comparisons. Besides counting sort, bucket sort and radix sort are non-comparison algorithms.
Counting sort operates in three distinct phases, each building on the previous. Understanding these phases is crucial for seeing how stability emerges and how the algorithm achieves linear time.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
def counting_sort(arr, k): """ Counting sort for integers in range [0, k-1]. Parameters: arr: List of integers to sort k: Upper bound (exclusive) of the range, i.e., elements are in [0, k-1] Returns: New sorted list (stable) Time Complexity: O(n + k) Space Complexity: O(n + k) """ n = len(arr) if n == 0: return [] # ═══════════════════════════════════════════════════════════════ # PHASE 1: COUNT # Count occurrences of each value # ═══════════════════════════════════════════════════════════════ count = [0] * k for val in arr: count[val] += 1 # At this point, count[v] = number of elements with value v # Example: arr = [3, 1, 4, 1, 5], k = 6 # count = [0, 2, 0, 1, 1, 1] # 0 1 2 3 4 5 ← values # ═══════════════════════════════════════════════════════════════ # PHASE 2: ACCUMULATE # Convert counts to cumulative positions # ═══════════════════════════════════════════════════════════════ for i in range(1, k): count[i] += count[i - 1] # At this point, count[v] = number of elements ≤ v # Equivalently, count[v] is the index AFTER the last position for value v # Example: count = [0, 2, 2, 3, 4, 5] # - 2 elements are ≤ 1 # - 3 elements are ≤ 3 # - 5 elements are ≤ 5 (total) # ═══════════════════════════════════════════════════════════════ # PHASE 3: PLACE # Build output array by placing elements at correct positions # ═══════════════════════════════════════════════════════════════ output = [0] * n # CRITICAL: Iterate BACKWARDS for stability for i in range(n - 1, -1, -1): val = arr[i] count[val] -= 1 # Decrement first to get 0-indexed position output[count[val]] = val # Why backwards? See detailed explanation below. return outputDetailed trace through counting sort:
Let's trace sorting [3, 1, 4, 1, 5, 9, 2, 6] with k = 10:
Input: [3, 1, 4, 1, 5, 9, 2, 6]
PHASE 1: COUNT
count[0] = 0, count[1] = 2, count[2] = 1, count[3] = 1
count[4] = 1, count[5] = 1, count[6] = 1, count[9] = 1
count = [0, 2, 1, 1, 1, 1, 1, 0, 0, 1]
0 1 2 3 4 5 6 7 8 9
PHASE 2: ACCUMULATE
count = [0, 2, 3, 4, 5, 6, 7, 7, 7, 8]
0 1 2 3 4 5 6 7 8 9
Reading: 2 elements ≤ 1, 3 elements ≤ 2, 4 elements ≤ 3, etc.
This means: value 1 should occupy indices 0-1, value 2 at index 2,
value 3 at index 3, value 4 at index 4, value 5 at index 5, etc.
PHASE 3: PLACE (backwards iteration)
i=7: arr[7]=6, count[6]=7→6, output[6]=6 → [_, _, _, _, _, _, 6, _]
i=6: arr[6]=2, count[2]=3→2, output[2]=2 → [_, _, 2, _, _, _, 6, _]
i=5: arr[5]=9, count[9]=8→7, output[7]=9 → [_, _, 2, _, _, _, 6, 9]
i=4: arr[4]=5, count[5]=6→5, output[5]=5 → [_, _, 2, _, _, 5, 6, 9]
i=3: arr[3]=1, count[1]=2→1, output[1]=1 → [_, 1, 2, _, _, 5, 6, 9]
i=2: arr[2]=4, count[4]=5→4, output[4]=4 → [_, 1, 2, _, 4, 5, 6, 9]
i=1: arr[1]=1, count[1]=1→0, output[0]=1 → [1, 1, 2, _, 4, 5, 6, 9]
i=0: arr[0]=3, count[3]=4→3, output[3]=3 → [1, 1, 2, 3, 4, 5, 6, 9]
Final output: [1, 1, 2, 3, 4, 5, 6, 9] ✓
The backwards iteration in Phase 3 is crucial for stability—but why? This is a subtle point that deserves careful analysis.
Recall the stability requirement:
A stable sort preserves the relative order of elements with equal keys. If element A appears before element B in the input, and they have the same key, then A should appear before B in the output.
The mechanics of cumulative counting:
After Phase 2, count[v] represents the number of elements ≤ v, which is also the position after where the last element with value v should be placed.
When we place an element with value v:
count[v] to get the actual positionWhy backwards?
Consider elements with the same value v. When iterating backwards:
This means elements with equal keys maintain their original relative order!
12345678910111213141516171819202122
Consider input: [A₁, B, A₂, C, A₃] where A₁, A₂, A₃ all have value 5(The subscripts indicate original order, not part of the actual values) After Phase 2, suppose count[5] = 4 (meaning 4 elements ≤ 5, so values 5 go in positions 1-3) BACKWARDS iteration (correct - stable): i=4: Place A₃ at position count[5]-1 = 3, decrement → count[5]=3 i=3: Place C at its position... i=2: Place A₂ at position count[5]-1 = 2, decrement → count[5]=2 i=1: Place B at its position... i=0: Place A₁ at position count[5]-1 = 1, decrement → count[5]=1 Result: [..., A₁, A₂, A₃, ...] ← STABLE! Original order preserved. FORWARDS iteration (incorrect - unstable): i=0: Place A₁ at position count[5]-1 = 3, decrement → count[5]=3 i=1: Place B at its position... i=2: Place A₂ at position count[5]-1 = 2, decrement → count[5]=2 i=3: Place C at its position... i=4: Place A₃ at position count[5]-1 = 1, decrement → count[5]=1 Result: [..., A₃, A₂, A₁, ...] ← UNSTABLE! Order reversed.Many counting sort implementations fail to be stable because they iterate forwards instead of backwards. For standalone sorting of integers, this doesn't affect correctness—[1, 1, 1] is sorted regardless of which '1' came first. But when counting sort is used as a subroutine for radix sort, stability is ESSENTIAL. Each integer carries information from previous passes that must be preserved.
Alternative formulation for stability:
Some implementations count positions differently to enable forward iteration while maintaining stability. Instead of cumulative counts representing positions after each value's block, they represent positions before:
# Alternative Phase 2: Prefix sum of counts
for i in range(1, k):
count[i] = count[i - 1] # Position BEFORE (not including) current value
# Alternative Phase 3: Forward iteration
for i in range(n):
val = arr[i]
output[count[val]] = val
count[val] += 1 # Move to next position for this value
Both formulations achieve stability; the key is consistency between how counts are computed and how elements are placed.
Now we connect the pieces: how does radix sort use counting sort as its subroutine?
The key adaptation:
In standalone counting sort, we sort by the element's value directly. In radix sort's use of counting sort, we sort by the digit at a specific position. The element itself is preserved—only the key used for sorting changes.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
def get_digit(number, position, base): """ Extract digit at given position from number. Position 0 is rightmost (least significant). For base 10: get_digit(4271, 0, 10) = 1 get_digit(4271, 2, 10) = 2 For base 256: get_digit(0x4271, 0, 256) = 0x71 get_digit(0x4271, 1, 256) = 0x42 """ return (number // (base ** position)) % base def counting_sort_by_digit(arr, position, base): """ Counting sort stable subroutine for radix sort. Sorts by the digit at the specified position. The elements are the original numbers; the KEY is the digit. This distinction is what makes radix sort work. """ n = len(arr) output = [0] * n count = [0] * base # Phase 1: Count occurrences of each digit (not each number!) for num in arr: digit = get_digit(num, position, base) count[digit] += 1 # Phase 2: Cumulative count for i in range(1, base): count[i] += count[i - 1] # Phase 3: Place elements by their digit, preserving the WHOLE number for i in range(n - 1, -1, -1): num = arr[i] digit = get_digit(num, position, base) count[digit] -= 1 output[count[digit]] = num # Place the ENTIRE number, not just digit return output def radix_sort(arr, base=10): """ LSD Radix Sort using counting sort as stable subroutine. """ if len(arr) == 0: return arr # Determine number of digits in the maximum element max_val = max(arr) num_digits = 0 temp = max_val while temp > 0: num_digits += 1 temp //= base # Handle case where max is 0 if num_digits == 0: num_digits = 1 # Apply counting sort for each digit position result = arr[:] for position in range(num_digits): print(f"Pass {position + 1}: Sorting by digit at position {position}") result = counting_sort_by_digit(result, position, base) print(f" Result: {result}") return result # Demonstrationnumbers = [329, 457, 657, 839, 436, 720, 355]print(f"Input: {numbers}")print()sorted_numbers = radix_sort(numbers, base=10)print()print(f"Sorted: {sorted_numbers}")Execution trace of the integration:
Input: [329, 457, 657, 839, 436, 720, 355]
Pass 1: Sorting by digit at position 0 (1s place)
Digits: [9, 7, 7, 9, 6, 0, 5]
Counting sort groups by these digits:
0: [720]
5: [355]
6: [436]
7: [457, 657] ← Note: 457 before 657 (original order preserved)
9: [329, 839] ← Note: 329 before 839 (original order preserved)
Result: [720, 355, 436, 457, 657, 329, 839]
Pass 2: Sorting by digit at position 1 (10s place)
Digits: [2, 5, 3, 5, 5, 2, 3]
Counting sort groups by these digits (preserving order from Pass 1):
2: [720, 329] ← 720 was first in Pass 1 result
3: [436, 839] ← 436 was first in Pass 1 result
5: [355, 457, 657] ← Order from Pass 1 preserved
Result: [720, 329, 436, 839, 355, 457, 657]
Pass 3: Sorting by digit at position 2 (100s place)
Digits: [7, 3, 4, 8, 3, 4, 6]
Counting sort groups by these digits (preserving order from Pass 2):
3: [329, 355] ← 329 before 355 because of earlier passes!
4: [436, 457] ← 436 before 457 because of earlier passes!
6: [657]
7: [720]
8: [839]
Result: [329, 355, 436, 457, 657, 720, 839] ✓ SORTED
At each pass, counting sort ONLY looks at the current digit—but it moves the ENTIRE number. The number 329 carries its 'history' (2 in tens place, 9 in ones place) through the algorithm. Stability ensures this history is respected: when two numbers share the same current digit, their relative order reflects all previous passes.
Understanding the complexity of radix sort requires analyzing both the outer loop (radix sort itself) and the inner operations (counting sort at each pass).
Counting sort complexity analysis:
For each pass of counting sort on n elements with digit range [0, k-1]:
Total per pass: O(n + k) time, O(n + k) space
Radix sort complexity analysis:
If we have d digit positions and use radix/base k:
The relationship between d and k:
For a maximum value M in the input:
Example calculations:
For 32-bit integers (M up to ~4 billion):
| Base k | Digits d | Passes | Count array | Time complexity |
|---|---|---|---|---|
| 2 | 32 | 32 | 2 | O(32n) ≈ O(n) |
| 10 | 10 | 10 | 10 | O(10n) ≈ O(n) |
| 256 | 4 | 4 | 256 | O(4n) ≈ O(n) |
| 65536 | 2 | 2 | 65536 | O(2n) ≈ O(n) |
| Base | Passes (d) | Count Array Size (k) | Cache Behavior | Practical Speed |
|---|---|---|---|---|
| 2 | 32 | 2 | Excellent (fits L1) | Poor—too many passes |
| 16 | 8 | 16 | Excellent (fits L1) | Moderate—reasonable balance |
| 256 | 4 | 256 (~1 KB) | Good (fits L1) | Excellent—optimal for most CPUs ✓ |
| 65536 | 2 | 65536 (~256 KB) | Poor (exceeds L1/L2) | Good for very large n, poor for small n |
Radix sort's O(d(n+k)) complexity is often called 'linear' because for fixed-width integers (like 32-bit), d is constant. But if you're sorting arbitrary-precision integers where d grows with input size, radix sort may not be linear. Always consider what d represents for your specific use case.
Real-world radix sort implementations incorporate several optimizations beyond the basic algorithm:
1. Skip unnecessary passes:
If all numbers fit in fewer digits than the maximum, skip the higher-order passes:
def radix_sort_optimized(arr, base=256):
if not arr:
return arr
max_val = max(arr)
position = 0
exp = 1
result = arr[:]
# Only iterate while there's a significant digit to process
while max_val // exp > 0:
result = counting_sort_by_digit(result, position, base)
exp *= base
position += 1
return result
2. In-place counting sort:
The standard counting sort requires O(n) extra space for the output array. For some applications, we can use an in-place variant (though it's more complex and the output space is often acceptable).
3. Combining base choices:
Some implementations use different bases for different passes, optimizing cache behavior. For example, use base 256 for lower-order bytes and base 65536 for higher-order portions when n is large enough to amortize the larger count array.
4. Hybrid sorting:
For small subarrays (especially in MSD radix sort's recursive calls), switch to insertion sort. The overhead of counting sort for very small arrays can exceed the benefit.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
def radix_sort_production(arr): """ Production-quality LSD radix sort for 32-bit non-negative integers. Uses base 256 (8 bits per pass) for optimal performance. Key optimizations: 1. Bit operations instead of division/modulo 2. Early termination when max value is exhausted 3. Reuses count array across passes 4. Single allocation for output array """ if len(arr) <= 1: return arr[:] n = len(arr) BITS_PER_PASS = 8 RADIX = 1 << BITS_PER_PASS # 256 MASK = RADIX - 1 # 0xFF max_val = max(arr) output = [0] * n current = arr[:] shift = 0 while (max_val >> shift) > 0: # Count phase count = [0] * RADIX for num in current: digit = (num >> shift) & MASK count[digit] += 1 # Check if all elements have the same digit (optimization) # If so, skip this pass entirely non_zero_buckets = sum(1 for c in count if c > 0) if non_zero_buckets == 1: shift += BITS_PER_PASS continue # Accumulate phase for i in range(1, RADIX): count[i] += count[i - 1] # Place phase (backwards for stability) for i in range(n - 1, -1, -1): num = current[i] digit = (num >> shift) & MASK count[digit] -= 1 output[count[digit]] = num # Swap arrays for next iteration (avoid copy) current, output = output, current shift += BITS_PER_PASS return currentUsing bit shifts and masks instead of division and modulo provides significant speedup. On most CPUs: • shift operation: 1 clock cycle • bitwise AND: 1 clock cycle • integer division: 10-40 clock cycles • modulo: 10-40 clock cycles
For millions of elements with multiple passes, this difference is substantial.
We've established that radix sort needs a stable subroutine. Counting sort is the standard choice, but other stable sorts exist. Why is counting sort preferred?
Comparison with other stable sorting algorithms:
| Algorithm | Time per Pass | Space | Practical for Radix? | Why/Why Not |
|---|---|---|---|---|
| Counting Sort | O(n + k) | O(n + k) | ✓ Yes | Perfect match—O(n) for bounded k |
| Merge Sort | O(n log n) | O(n) | ✗ No | log n factor defeats purpose |
| Insertion Sort | O(n²) | O(1) | ✗ No | Quadratic time is unacceptable |
| Bubble Sort | O(n²) | O(1) | ✗ No | Even worse than insertion sort |
| TimSort | O(n log n) | O(n) | ✗ No | log n factor defeats purpose |
The critical observation:
If we used merge sort as the subroutine and had d digit positions:
This is no better than just using merge sort directly! The whole point of radix sort is to achieve O(d(n+k)) which, for fixed d and k, is O(n).
Counting sort's O(n + k) complexity is essential because:
Could we use bucket sort?
Yes! Bucket sort is essentially counting sort with lists instead of counts, then concatenation. Some implementations prefer this view:
def bucket_sort_by_digit(arr, position, base):
buckets = [[] for _ in range(base)]
for num in arr:
digit = get_digit(num, position, base)
buckets[digit].append(num) # Automatically stable!
result = []
for bucket in buckets:
result.extend(bucket)
return result
This is equivalent to counting sort but uses more memory (list overhead) and has worse cache behavior. For large n, counting sort's array-based approach is faster.
Radix sort and counting sort form a perfect symbiosis:
• Counting sort needs bounded-range keys → digit values are bounded by the radix • Radix sort needs a stable O(n) subroutine → counting sort provides exactly this • Neither alone solves general integer sorting efficiently, but together they achieve O(d(n+k)) ≈ O(n)
We've explored how counting sort serves as the engine powering radix sort's remarkable performance. This symbiotic relationship exemplifies elegant algorithm design: two specialized algorithms combining to solve a broader problem than either could alone.
What's next:
Now that we understand both the overall radix sort algorithm and its counting sort subroutine, we're ready to analyze the complete time complexity: O(d × (n + k)). The next page will dissect this expression, showing how it compares to O(n log n) algorithms and when radix sort truly shines.
You now understand how counting sort operates, why its stability is essential for radix sort, and how the two algorithms work together to achieve near-linear time complexity. Next, we'll analyze when this complexity truly beats comparison-based alternatives.