Loading content...
Radix sort's time complexity, O(d × (n + k)), looks deceptively simple. Three variables—d, n, and k—combine to determine performance. But what do these variables mean in practice? When does this complexity beat O(n log n)? When does it disappoint?
Understanding radix sort's complexity requires moving beyond mechanical formula manipulation into genuine insight about when and why this algorithm excels. By the end of this page, you'll be able to confidently predict radix sort's performance for any given scenario and make informed decisions about algorithm selection.
By the end of this page, you will deeply understand: (1) The precise meaning of each variable in O(d × (n + k)), (2) How d, n, and k relate to input characteristics, (3) The comparison between O(d(n+k)) and O(n log n), (4) When radix sort provides genuine asymptotic advantages, (5) Hidden constants and practical performance considerations, and (6) Space complexity trade-offs.
Let's carefully examine each component of radix sort's time complexity:
n — Number of elements
This is the most familiar variable: the size of the input array. Every sorting algorithm has n in its complexity because we must at least read all n elements.
d — Number of digit positions (passes)
This represents how many passes radix sort makes through the data. For integers:
k — Radix (base) / Alphabet size
The range of values each digit can take:
The k appears because counting sort at each pass requires O(k) time not counting the n operations.
| Variable | Meaning | Typical Values | How It's Determined |
|---|---|---|---|
| n | Array size | 10¹ to 10⁹ | Input size—given |
| d | Number of digits/passes | 1 to 64 | ⌈log_k(max_value)⌉ |
| k | Radix/base | 2 to 65536 | Algorithm design choice |
Why O(d × (n + k)) and not O(d × n × k)?
Each pass of counting sort does the following:
Total per pass: O(n + k), not O(n × k)
The k comes from the accumulation step, which is independent of n. We don't multiply n by k because we're not doing k operations per element—we're doing k operations total for the count array processing.
With d passes, total time is:
T(n, d, k) = d × O(n + k) = O(d × (n + k))
The (n + k) is additive because n and k contribute independently to each pass. If n >> k (many elements, small radix), the n term dominates. If k >> n (few elements, large radix), the k term dominates. In most practical cases, n >> k, so O(d(n+k)) ≈ O(dn).
The number of passes d isn't arbitrary—it's determined by the maximum value M in the array and the chosen radix k:
d = ⌈log_k(M + 1)⌉
This formula captures how many base-k digits are needed to represent M. Let's see this in action:
| Maximum Value M | Base k | d = ⌈log_k(M+1)⌉ | Explanation |
|---|---|---|---|
| 999 | 10 | 3 | 3 decimal digits: 0-999 |
| 1,000,000 | 10 | 7 | 7 decimal digits: 0-9,999,999 |
| 255 (2⁸-1) | 256 | 1 | 1 byte |
| 65,535 (2¹⁶-1) | 256 | 2 | 2 bytes |
| 4,294,967,295 (2³²-1) | 256 | 4 | 4 bytes (32-bit) |
| 2⁶⁴-1 | 256 | 8 | 8 bytes (64-bit) |
| 100 | 2 | 7 | 7 binary digits: 1100100 |
The trade-off between d and k:
Increasing k decreases d (fewer passes) but increases the per-pass overhead:
Mathematical relationship:
For a fixed maximum value M:
d(k) = ⌈log_k(M)⌉ = ⌈ln(M) / ln(k)⌉
This is a decreasing function of k. As k → M, d → 1, but the count array becomes size M, which might be unacceptable.
12345678910111213141516171819202122232425262728293031323334
import math def analyze_radix_choice(max_value, n): """ Analyze time complexity for different radix choices. Shows how the d × (n + k) expression varies with radix k for a given maximum value and array size. """ print(f"Analysis for max_value = {max_value:,}, n = {n:,}") print("=" * 70) print(f"{'Radix k':<12} {'Digits d':<12} {'d × (n+k)':<20} {'Cache-friendly?'}") print("-" * 70) for k in [2, 4, 10, 16, 64, 256, 1024, 4096, 65536]: d = math.ceil(math.log(max_value + 1, k)) if max_value > 0 else 1 complexity = d * (n + k) # Rough cache assessment: L1 cache is typically 32-64 KB # Count array of k integers = 4k or 8k bytes cache_friendly = "Yes" if k * 8 < 32768 else "No" print(f"{k:<12} {d:<12} {complexity:<20,} {cache_friendly}") print() # Example: Sorting 10 million 32-bit integersanalyze_radix_choice(2**32 - 1, 10_000_000) # Example: Sorting 1 million values up to 1000analyze_radix_choice(1000, 1_000_000) # Example: Sorting 100 values up to 10 millionanalyze_radix_choice(10_000_000, 100)For most applications, base 256 (processing one byte at a time) hits the sweet spot:
• 4 passes for 32-bit integers (d = 4) • Count array of 256 integers ≈ 1-2 KB (fits in L1 cache) • Bit manipulation (shift/mask) instead of division/modulo
This explains why base 256 is the standard choice in optimized radix sort implementations.
The central question: When does O(d(n+k)) beat O(n log n)?
Setting up the comparison:
For radix sort to be asymptotically faster than merge sort or quicksort, we need:
d × (n + k) < c × n log n
where c is a constant factor.
Assuming k << n (small radix relative to array size), this simplifies to:
d × n < c × n log n d < c × log n
This tells us something profound: Radix sort wins when the number of digits d is less than log n (the logarithm of the array size).
Case analysis:
Case 1: Fixed-width integers (e.g., 32-bit)
With base 256, d = 4 for any 32-bit integer.
Radix sort: O(4n) = O(n) Merge sort: O(n log n)
For n > 2⁴ = 16, radix sort's 4 passes beat log₂(n) passes of comparisons. For n = 1 million: 4 passes vs 20 "passes" (log₂(10⁶) ≈ 20)
Verdict: Radix sort wins decisively for large arrays of fixed-width integers.
Case 2: Variable-precision integers
If integers can grow arbitrarily large with n (e.g., n numbers where the maximum is n^c for some constant c):
d = O(log(n^c)) = O(c log n) = O(log n)
Radix sort: O(log n × n) = O(n log n)
Verdict: No asymptotic advantage—tied with comparison sorts.
Case 3: Integers bounded by n
If maximum value M ≤ n:
d = O(log_k(n))
With k = n^ε for small ε: d = O(1/ε) = O(1)
Radix sort: O(n)
Verdict: Radix sort can achieve true O(n) with careful radix selection.
| Scenario | Radix Sort Complexity | Merge Sort Complexity | Winner |
|---|---|---|---|
| 32-bit integers, n = 10⁶ | O(4n) ≈ 4 million ops | O(n log n) ≈ 20 million ops | Radix (5× faster) |
| 64-bit integers, n = 10⁶ | O(8n) ≈ 8 million ops | O(n log n) ≈ 20 million ops | Radix (2.5× faster) |
| 32-bit integers, n = 100 | O(4n) ≈ 400 ops | O(n log n) ≈ 700 ops | Radix (slight edge) |
| Arbitrary precision, digits ∝ log n | O(n log n) | O(n log n) | Tie (constants matter) |
| Small values (max = 1000), n = 10⁶ | O(2n) ≈ 2 million ops | O(n log n) ≈ 20 million ops | Radix (10× faster) |
The comparison assumes d is independent of n. If your maximum value grows with n (e.g., sorting n random 64-bit integers where max ≈ 2⁶⁴ regardless of n), d is fixed and radix sort wins. But if max = n² (like sorting squared values), then d = O(log n) and the advantage disappears.
Asymptotic analysis hides constant factors that significantly impact real-world performance. Let's examine these constants for radix sort:
Constant factors in radix sort:
Per-pass overhead: Each counting sort pass involves:
Memory access patterns:
Constant factors in comparison sorts:
Merge sort:
Quicksort:
Practical comparison:
For 32-bit integers on modern CPUs:
For n = 10⁶:
But each radix 'operation' involves multiple memory accesses, while each quicksort comparison is a single integer comparison. The winner depends on:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
import randomimport time def radix_sort_256(arr): """Radix sort with base 256.""" if len(arr) <= 1: return arr[:] RADIX = 256 MASK = 255 n = len(arr) output = [0] * n current = arr[:] for shift in range(0, 32, 8): count = [0] * RADIX for num in current: count[(num >> shift) & MASK] += 1 for i in range(1, RADIX): count[i] += count[i-1] for i in range(n-1, -1, -1): digit = (current[i] >> shift) & MASK count[digit] -= 1 output[count[digit]] = current[i] current, output = output, current return current def benchmark(sizes): """Benchmark radix sort vs built-in sort.""" print(f"{'Size n':<12} {'Radix Sort (s)':<18} {'Built-in Sort (s)':<18} {'Ratio'}") print("-" * 60) for n in sizes: # Generate random 32-bit integers data = [random.randint(0, 2**32 - 1) for _ in range(n)] # Benchmark radix sort data_copy = data[:] start = time.perf_counter() radix_sort_256(data_copy) radix_time = time.perf_counter() - start # Benchmark built-in sort (Timsort) data_copy = data[:] start = time.perf_counter() data_copy.sort() builtin_time = time.perf_counter() - start ratio = radix_time / builtin_time if builtin_time > 0 else float('inf') print(f"{n:<12} {radix_time:<18.6f} {builtin_time:<18.6f} {ratio:.2f}x") # Run benchmark# Note: In practice, radix sort often wins for large arrays of integers# but loses to optimized library sorts for smaller arrays due to overheadbenchmark([1000, 10000, 100000, 1000000])Real-world performance varies significantly across platforms:
• CPU architecture: Newer CPUs with better branch prediction favor comparison sorts; high memory bandwidth favors radix sort • Language/runtime: Low-level implementations (C, Rust) see bigger radix sort wins; interpreted languages have higher overhead • Data characteristics: Partially sorted data favors adaptive algorithms (Timsort); random data is neutral
Radix sort's space complexity is an important consideration, especially for memory-constrained environments.
Space breakdown for LSD radix sort:
Total auxiliary space: O(n + k)
Since k is typically much smaller than n (e.g., k = 256), this simplifies to O(n) auxiliary space—the same as merge sort.
| Algorithm | Auxiliary Space | In-Place? | Notes |
|---|---|---|---|
| LSD Radix Sort | O(n + k) | No | Needs output array for stability |
| MSD Radix Sort | O(n + k + d) or O(n + k) | No/Partial | Recursion depth d; can be in-place with care |
| Merge Sort | O(n) | No | Needs temporary array for merging |
| Quicksort | O(log n) | Yes | Only recursion stack space |
| Heap Sort | O(1) | Yes | Truly in-place |
| Counting Sort | O(n + k) | No | Same as radix—it's the subroutine! |
Can radix sort be made in-place?
Technically, yes, but with significant complexity and performance costs:
American Flag Sort (in-place MSD radix sort):
In-place counting sort is possible but loses stability or requires multiple passes.
Practical recommendation:
For most applications, accept the O(n) space cost. If memory is critical and in-place sorting is essential, use heapsort or carefully-tuned quicksort instead.
For large arrays, memory bandwidth often matters more than total memory usage. Radix sort's sequential reads in the count phase are efficient, but the random writes in the place phase can thrash the cache. This is why some implementations use multi-pass counting (count all digits first, then place all at once) or careful memory prefetching.
Unlike quicksort (with its O(n²) worst case) or insertion sort (with its O(n) best case), radix sort's complexity is remarkably consistent.
Best case: O(d × (n + k))
Occurs when... well, always! The algorithm performs the same operations regardless of input order. There's no data-dependent branching that skips work.
Average case: O(d × (n + k))
Same as best case. The distribution of values doesn't affect the number of operations—every element is processed at every digit position.
Worst case: O(d × (n + k))
Same again! There's no adversarial input that causes degraded performance. This predictability is a significant advantage in real-time systems.
| Algorithm | Best Case | Average Case | Worst Case | Predictable? |
|---|---|---|---|---|
| Radix Sort | O(d(n+k)) | O(d(n+k)) | O(d(n+k)) | ✓ Yes |
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | ✓ Yes |
| Heapsort | O(n log n) | O(n log n) | O(n log n) | ✓ Yes |
| Quicksort | O(n log n) | O(n log n) | O(n²) | ✗ No |
| Insertion Sort | O(n) | O(n²) | O(n²) | ✗ No |
| Timsort | O(n) | O(n log n) | O(n log n) | Partially |
However, practical performance can vary:
While the asymptotic complexity is constant, real-world performance can vary based on:
Value distribution: If most values share common prefixes (like phone numbers in the same area code), some passes may find most elements in one bucket. This doesn't reduce operations but may affect cache behavior.
Memory alignment: How data aligns with cache lines affects actual speed.
Branch prediction: Though minimal, there are some branches (loop conditions, etc.) that can be affected by data patterns.
The key insight: Radix sort's predictable performance makes it ideal for:
LSD radix sort (with stable counting sort) maintains stability in all cases. There's no input that causes it to become unstable. This deterministic behavior—both in complexity and correctness properties—makes radix sort highly reliable.
Amortized analysis:
Radix sort doesn't have an amortized analysis story like dynamic arrays or splay trees—each sort operation takes the same time. However, when considering radix sort in a larger system:
Parallel complexity:
Radix sort has interesting parallelization properties:
LSD Radix Sort (parallel):
With P processors, parallel time: O(d × (n/P + k + log P))
The log P factor comes from combining counts across processors.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
# Conceptual parallel radix sort (pseudocode-style Python)# Real implementation would use multiprocessing or threading def parallel_radix_sort(arr, num_threads=4): """ Parallel LSD radix sort conceptual structure. Key idea: Each thread counts its portion, then we combine counts. """ n = len(arr) chunk_size = n // num_threads RADIX = 256 for shift in range(0, 32, 8): # PARALLEL PHASE 1: Each thread counts its chunk local_counts = [] for t in range(num_threads): start = t * chunk_size end = start + chunk_size if t < num_threads - 1 else n count = [0] * RADIX for i in range(start, end): digit = (arr[i] >> shift) & 0xFF count[digit] += 1 local_counts.append(count) # SEQUENTIAL PHASE: Combine counts (O(k × P) work) global_count = [0] * RADIX for count in local_counts: for i in range(RADIX): global_count[i] += count[i] # Compute prefix sums for i in range(1, RADIX): global_count[i] += global_count[i - 1] # PARALLEL PHASE 2: Place elements # This requires careful coordination to avoid race conditions # One approach: compute starting offsets for each thread's elements # per bucket, then each thread writes to non-overlapping regions # ... (complex coordination logic here) return arr # Time complexity analysis:# - Sequential: O(d × (n + k))# - Parallel with P threads: O(d × (n/P + k + log P))# # Speedup is limited by:# 1. The sequential accumulate phase (Amdahl's law)# 2. Memory bandwidth (all threads compete for bus)# 3. Synchronization overhead for the place phaseMSD Radix Sort (parallel):
MSD has better parallelization potential after the first pass:
With P processors and average bucket size n/k:
This makes MSD radix sort attractive for parallel systems, especially with good data distribution.
Radix sort is exceptionally well-suited for GPU acceleration. Modern GPU radix sort implementations can sort billions of integers per second, far exceeding CPU performance. The regular, predictable memory access patterns (with careful design) map well to GPU architectures. Libraries like CUB (CUDA) provide highly optimized GPU radix sort.
We've thoroughly analyzed radix sort's complexity, moving from mechanical formula manipulation to genuine understanding of when and why this algorithm excels.
What's next:
With a solid understanding of radix sort's complexity, we're ready to explore when radix sort excels—the specific scenarios where this algorithm is the clear best choice, and equally importantly, when to use something else.
You now have a deep understanding of radix sort's O(d × (n + k)) complexity, how it compares to O(n log n) algorithms, and the factors that influence real-world performance. Next, we'll synthesize this knowledge into practical guidelines for when to choose radix sort.