Loading learning content...
Unlike algorithms with fixed complexity regardless of input (like Selection Sort's unwavering O(n²)), Insertion Sort's performance varies dramatically based on the input's initial order. It can be as fast as O(n) or as slow as O(n²), with everything in between possible.
This variability isn't a weakness—it's Insertion Sort's adaptive superpower. Understanding when each case occurs, and precisely how to derive these bounds, is essential for knowing when to deploy Insertion Sort and when to choose alternatives.
This page derives Insertion Sort's time complexity rigorously: best case O(n), worst case O(n²), and average case O(n²). You'll understand exactly which inputs trigger which behavior, the mathematical proofs behind each bound, and the practical implications of this complexity profile.
Before analyzing complexity, we must define what operations we're counting. For sorting algorithms, we typically focus on two fundamental operations:
A[j] > key check that determines whether to shift. This is the dominant operation in comparison-based sorting.A[j+1] = A[j] operation that moves elements right. Also includes the final A[j+1] = key placement.The outer loop:
for i in range(1, n): # Runs n-1 times
The outer loop executes exactly n-1 times regardless of input—we must process each element from index 1 to n-1. This contributes O(n) to the complexity.
The inner loop:
while j >= 0 and A[j] > key: # Runs variable number of times
A[j+1] = A[j]
j -= 1
The inner loop's iteration count varies based on:
keykey's relative value and the sorted region's compositionThe total complexity hinges on the sum of inner loop iterations across all outer loop iterations.
The outer loop contributes a constant O(n) factor. The algorithm's characteristic behavior comes from the inner loop. If the inner loop averages O(1) iterations per outer iteration, total is O(n). If it averages O(n) iterations, total is O(n²).
Best Case Input: An already-sorted array, e.g., [1, 2, 3, 4, 5]. Best Case Complexity: O(n) comparisons, 0 shifts.
Why sorted arrays are fast:
Consider inserting element at index i into the sorted subarray A[0..i-1]:
key = A[i]key with A[i-1] (the largest element in the sorted region)A[i-1] ≤ key, the inner loop condition A[j] > key is immediately falsekey at A[i] (where it already was)For a sorted array, A[i] ≥ A[i-1] for all i, so every insertion terminates after a single comparison.
Formal derivation:
Let C(n) be the number of comparisons for a sorted array of size n.
Total comparisons: $$C(n) = \sum_{i=1}^{n-1} 1 = n - 1 = O(n)$$
Total shifts: $$S(n) = 0$$
Total time: O(n) comparisons + O(n) loop overhead = O(n)
| i | key | Compare A[i-1] > key? | Inner Loop Runs? | Shifts |
|---|---|---|---|---|
| 1 | 2 | 1 > 2? No | No | 0 |
| 2 | 3 | 2 > 3? No | No | 0 |
| 3 | 4 | 3 > 4? No | No | 0 |
| 4 | 5 | 4 > 5? No | No | 0 |
Total: 4 comparisons, 0 shifts for n=5. This is O(n) behavior.
Key insight: The best case occurs when no element needs to be shifted—each element is already at least as large as all elements before it. This is precisely the definition of a sorted array.
Worst Case Input: A reverse-sorted array, e.g., [5, 4, 3, 2, 1]. Worst Case Complexity: O(n²) comparisons, O(n²) shifts.
Why reverse-sorted arrays are slow:
Consider inserting element at index i into the sorted subarray A[0..i-1]:
A[i] is smaller than all elements in A[0..i-1]key ends up at position 0 every timeFor element at index i, we perform i comparisons and i shifts.
Formal derivation:
Let C(n) be comparisons, S(n) be shifts for a reverse-sorted array of size n.
For iteration i (inserting element at index i):
Total comparisons: $$C(n) = \sum_{i=1}^{n-1} i = 1 + 2 + 3 + \cdots + (n-1) = \frac{(n-1)n}{2} = \frac{n^2 - n}{2} = O(n^2)$$
Total shifts: $$S(n) = \sum_{i=1}^{n-1} i = \frac{(n-1)n}{2} = O(n^2)$$
Total time: O(n²) comparisons + O(n²) shifts = O(n²)
| i | key | Comparisons | Shifts | Result |
|---|---|---|---|---|
| 1 | 4 | 1 (with 5) | 1 | [4, 5, 3, 2, 1] |
| 2 | 3 | 2 (with 5, 4) | 2 | [3, 4, 5, 2, 1] |
| 3 | 2 | 3 (with 5, 4, 3) | 3 | [2, 3, 4, 5, 1] |
| 4 | 1 | 4 (with 5, 4, 3, 2) | 4 | [1, 2, 3, 4, 5] |
Total: 1+2+3+4 = 10 comparisons and 10 shifts for n=5.
Using the formula: (5-1)×5/2 = 10. ✓
The pattern: Each new element must travel all the way to position 0, pushing every previously sorted element one position right.
For n=1,000, worst case requires ~500,000 operations. For n=1,000,000, it's ~500,000,000,000 operations. This is why Insertion Sort is impractical for large, arbitrarily-ordered datasets.
Average Case Input: A uniformly random permutation of elements. Average Case Complexity: O(n²) comparisons and shifts, specifically ~n²/4.
The probabilistic setup:
For average case analysis, we assume:
The key insight: On average, each element A[i] is larger than half the elements in A[0..i-1] and smaller than the other half. So it needs to move approximately to the middle—shifting about i/2 elements.
Formal derivation:
When inserting A[i] into A[0..i-1], the key can end up at any of the i+1 positions (0, 1, ..., i) with equal probability 1/(i+1).
Expected comparisons for inserting element at index i:
More precisely, the expected number of comparisons for element i is: $$E[C_i] = \frac{1}{i+1}\left(1 + 2 + 3 + \cdots + i + i\right) = \frac{1}{i+1}\left(\frac{i(i+1)}{2} + i\right) = \frac{i}{2} + 1 - \frac{1}{i+1}$$
Approximately: E[Cᵢ] ≈ i/2
Total expected comparisons: $$E[C(n)] = \sum_{i=1}^{n-1} \frac{i}{2} \approx \frac{1}{2} \cdot \frac{(n-1)n}{2} = \frac{n^2 - n}{4} \approx \frac{n^2}{4}$$
Expected shifts are similar: approximately n²/4.
Total average time: O(n²), specifically ~n²/4 operations.
| Case | n=10 | n=100 | n=1000 |
|---|---|---|---|
| Best (sorted) | 9 | 99 | 999 |
| Average (random) | ~22 | ~2,500 | ~250,000 |
| Worst (reverse) | 45 | 4,950 | 499,500 |
Practical observation: The average case is about half the worst case, but both are O(n²). For random inputs, Insertion Sort's average case is not meaningfully better than its worst case in Big-O terms.
A more elegant way to understand Insertion Sort's complexity uses inversions—the fundamental measure of array disorder we introduced earlier.
Theorem: The number of shifts in Insertion Sort exactly equals the number of inversions in the input array. Therefore, Time(Insertion Sort) = Θ(n + inversions).
Proof:
Each shift moves element A[j] from position j to position j+1, making room for key. This happens because A[j] > key and j < possible_position_of_key.
Before the shift: A[j] appears before key in the array, and A[j] > key. This is an inversion.
After the shift and final placement: key is placed before A[j] in the array, and key < A[j]. The inversion is resolved.
Each shift removes exactly one inversion, and each inversion is removed by exactly one shift.
Therefore: Number of shifts = Number of inversions = I(A)
Complexity: $$T(n) = n - 1 \text{ (outer loop iterations)} + I(A) \text{ (shifts)} = \Theta(n + I(A))$$
| Input Type | Number of Inversions | Time Complexity |
|---|---|---|
| Sorted array | 0 | Θ(n + 0) = Θ(n) |
| Reverse sorted | n(n-1)/2 | Θ(n + n²/2) = Θ(n²) |
| Random permutation | ~n²/4 expected | Θ(n + n²/4) = Θ(n²) |
| Array with k unsorted elements | O(kn) | Θ(n + kn) = O(kn) |
| Each element ≤ c positions from sorted | O(cn) | Θ(n + cn) = O(n) if c is constant |
The power of inversion analysis:
This analysis precisely captures Insertion Sort's adaptivity:
It generalizes all cases: Best, worst, and average are just specific inversion counts.
It predicts performance for ANY input: Given an array, count inversions to estimate Insertion Sort's runtime.
It identifies 'nearly sorted' precisely: An array is 'nearly sorted' if I(A) = O(n), making Insertion Sort linear.
It enables hybrid algorithm design: Use Insertion Sort when inversions are expected to be low; switch to O(n log n) algorithms otherwise.
If you expect each element to be at most k positions away from its sorted position (and k is a small constant), Insertion Sort runs in O(n). This is common after incremental updates to an already-sorted dataset.
While time complexity varies, Insertion Sort's space complexity is constant: O(1) auxiliary space, making it an in-place sorting algorithm.
Why O(1) matters:
Embedded systems: Limited memory environments can use Insertion Sort without allocating extra buffers.
Large elements: When sorting large objects, O(1) extra space means we don't double our memory footprint (unlike Merge Sort's O(n) auxiliary array).
Cache efficiency: Operating in-place improves data locality, keeping frequently accessed elements in CPU cache.
No memory allocation overhead: No dynamic allocation means no allocation failures, no fragmentation, and predictable performance.
Contrast with other algorithms:
An in-place algorithm uses O(1) auxiliary space (or O(log n) for implicit recursion stack). Insertion Sort is strictly in-place—it needs only constant extra memory regardless of input size.
Both Insertion Sort and Selection Sort are O(n²) in the worst case. However, their complexity profiles differ in important ways.
| Metric | Insertion Sort | Selection Sort |
|---|---|---|
| Best Case Time | O(n) — sorted input | O(n²) — always |
| Worst Case Time | O(n²) — reverse sorted | O(n²) — always |
| Average Case Time | O(n²) — random input | O(n²) — always |
| Comparisons (best) | n-1 | n(n-1)/2 |
| Comparisons (worst) | n(n-1)/2 | n(n-1)/2 |
| Swaps/Shifts (best) | 0 | n-1 (or 0 with optimization) |
| Swaps/Shifts (worst) | n(n-1)/2 | n-1 |
| Adaptive? | Yes — adapts to input order | No — fixed work regardless |
Key differences explained:
Selection Sort's consistency: Selection Sort always does exactly n(n-1)/2 comparisons because it always scans the entire remaining portion to find the minimum. Input order doesn't matter—even a sorted array requires those comparisons.
Insertion Sort's adaptivity: Insertion Sort can terminate early on each insertion. If the element is already in place (or close), few comparisons are needed. This adaptivity is Insertion Sort's crucial advantage.
When each wins:
Selection Sort wins when swaps are very expensive and the array is random or reverse-sorted. Selection Sort does only O(n) swaps regardless of input.
Insertion Sort wins for nearly-sorted data, small arrays, online processing, or when stability is required. Its O(n) best case and O(n²/4) average case (vs O(n²/2) for Selection Sort) make it faster in practice for common cases.
For general-purpose sorting of small arrays, Insertion Sort is preferred over Selection Sort. Its adaptivity means it handles common cases efficiently, and its stability is often required. Selection Sort's only advantage—minimal swaps—rarely matters in modern systems.
Theory is validated by experiment. Let's verify our complexity analysis with actual measurements.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
import randomimport time def insertion_sort_counted(arr): """Insertion Sort that counts comparisons and shifts.""" comparisons = 0 shifts = 0 n = len(arr) for i in range(1, n): key = arr[i] j = i - 1 while j >= 0: comparisons += 1 if arr[j] > key: arr[j + 1] = arr[j] shifts += 1 j -= 1 else: break arr[j + 1] = key return comparisons, shifts # Test different input typesdef benchmark(size): print(f"\n--- Array Size: {size} ---") # Best case: sorted sorted_arr = list(range(size)) c, s = insertion_sort_counted(sorted_arr.copy()) print(f"Sorted: {c:>10} comparisons, {s:>10} shifts") print(f" Expected: {size-1:>10} comparisons, {0:>10} shifts") # Worst case: reverse sorted reverse_arr = list(range(size, 0, -1)) c, s = insertion_sort_counted(reverse_arr.copy()) expected = (size - 1) * size // 2 print(f"Reversed: {c:>10} comparisons, {s:>10} shifts") print(f" Expected: {expected:>10} comparisons, {expected:>10} shifts") # Average case: random random_arr = list(range(size)) random.shuffle(random_arr) c, s = insertion_sort_counted(random_arr.copy()) expected_avg = expected // 2 print(f"Random: {c:>10} comparisons, {s:>10} shifts") print(f" Expected: ~{expected_avg:>8} comparisons, ~{expected_avg:>8} shifts") for size in [10, 100, 1000]: benchmark(size)Expected output pattern:
--- Array Size: 1000 ---
Sorted: 999 comparisons, 0 shifts
Expected: 999 comparisons, 0 shifts
Reversed: 499500 comparisons, 499500 shifts
Expected: 499500 comparisons, 499500 shifts
Random: ~250000 comparisons, ~250000 shifts
Expected: ~249750 comparisons, ~249750 shifts
The empirical results match our theoretical predictions:
| Metric | Best Case | Average Case | Worst Case |
|---|---|---|---|
| Time Complexity | O(n) | O(n²) | O(n²) |
| Comparisons | n - 1 | ~n²/4 | n(n-1)/2 |
| Shifts | 0 | ~n²/4 | n(n-1)/2 |
| Input Type | Sorted array | Random permutation | Reverse-sorted array |
| Inversions | 0 | ~n²/4 | n(n-1)/2 |
| Property | Value | Notes |
|---|---|---|
| Space Complexity | O(1) | Only key variable and loop indices |
| In-Place | Yes | Modifies array directly |
| Stable | Yes | Equal elements maintain relative order |
| Online | Yes | Elements can be sorted as they arrive |
| Adaptive | Yes | Runtime adapts to input disorder |
Insertion Sort is the only common quadratic sorting algorithm with O(n) best case. This makes it uniquely suited for nearly-sorted data and hybrid algorithms that use it as a finishing step.
We've rigorously analyzed Insertion Sort's time and space complexity. Here are the essential takeaways:
What's next:
We've analyzed complexity mathematically. The final page explores the practical implications: when does Insertion Sort's complexity profile make it the right choice? How do nearly-sorted arrays arise in practice? And how do production-grade sorting libraries leverage Insertion Sort's unique strengths?
You now understand Insertion Sort's complexity from multiple angles: operation counting, case analysis, and inversion theory. The O(n) best case is Insertion Sort's defining characteristic—the key to understanding when it outperforms asymptotically faster algorithms.