Loading learning content...
How similar are two rankings? If Alice ranks movies as [A, B, C, D] and Bob ranks them [D, C, B, A], they have completely opposite preferences. But if Carol ranks them [A, B, D, C], she's mostly aligned with Alice with just one swap needed. How do we quantify this difference mathematically?
The answer lies in counting inversions—a D&C problem that reveals the degree of disorder in a sequence. An inversion is a pair of elements that are "out of order" relative to their desired ordering. This seemingly simple concept has profound applications: from measuring similarity between rankings to analyzing how "far" a sequence is from being sorted.
The naive approach takes O(n²) time. But with a beautiful insight connecting inversions to merge sort, we can count all inversions in O(n log n) time.
By the end of this page, you will understand what inversions are, why they matter, and how to count them efficiently using a modified merge sort. You'll see how the merge operation naturally counts cross-inversions, achieving O(n log n) complexity. This technique is a cornerstone of D&C problem-solving.
Formal Definition:
Given an array A of n distinct elements, an inversion is a pair of indices (i, j) such that:
In other words, an inversion is a pair of elements that are "out of order" relative to ascending order.
Example:
Consider the array [2, 4, 1, 3, 5]:
| Pair (i, j) | Elements | i < j? | A[i] > A[j]? | Inversion? |
|---|---|---|---|---|
| (0, 1) | 2, 4 | ✓ | ✗ | No |
| (0, 2) | 2, 1 | ✓ | ✓ | Yes |
| (1, 2) | 4, 1 | ✓ | ✓ | Yes |
| (1, 3) | 4, 3 | ✓ | ✓ | Yes |
| (2, 3) | 1, 3 | ✓ | ✗ | No |
| ... | ... | ... | ... | ... |
Total inversions: (0,2), (1,2), (1,3) = 3 inversions
The number of inversions tells us how "far" an array is from being sorted. A sorted array has 0 inversions. A reverse-sorted array has the maximum: n(n-1)/2 inversions (every pair is inverted). The inversion count is essentially a measure of disorder—the work needed to sort the array using adjacent swaps.
Bounds on Inversions:
For n = 5:
Connection to Bubble Sort:
Fascinatingly, the number of inversions equals the number of swaps that bubble sort performs. Each adjacent swap in bubble sort removes exactly one inversion. This isn't coincidental—it's a deep relationship between disorder measures and sorting algorithms.
| Array | Inversions | Description |
|---|---|---|
| [1, 2, 3, 4, 5] | 0 | Sorted (no inversions) |
| [2, 1, 3, 4, 5] | 1 | One swap from sorted |
| [2, 4, 1, 3, 5] | 3 | Moderately disordered |
| [5, 4, 3, 2, 1] | 10 | Reverse sorted (maximum) |
| [3, 1, 2, 5, 4] | 3 | Moderately disordered |
Counting inversions isn't just a theoretical exercise—it has practical applications across multiple domains. Understanding these applications motivates the algorithm.
Collaborative Filtering and Recommendation Systems:
When recommending movies, music, or products, systems often compare users' preference rankings. If User A and User B have similar rankings (few inversions between their orderings), they likely have similar tastes.</p>
The Kendall Tau distance between two rankings is defined as the number of pairwise inversions between them. It's a standard metric in statistics and machine learning for measuring rank correlation.
The Kendall tau rank correlation coefficient is: τ = 1 - 4×(inversions) / n(n-1). It ranges from -1 (completely reversed) to +1 (identical rankings). When inversions = 0, τ = 1 (perfect agreement). When inversions = n(n-1)/2, τ = -1 (perfect disagreement).
The straightforward approach checks all possible pairs (i, j) where i < j and counts those where A[i] > A[j]. With C(n,2) = n(n-1)/2 pairs, this takes O(n²) time.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
from typing import List, Tuple def count_inversions_brute_force(arr: List[int]) -> Tuple[int, List[Tuple[int, int]]]: """ Count inversions using brute force. An inversion is a pair (i, j) where i < j but arr[i] > arr[j]. Time Complexity: O(n²) Space Complexity: O(k) where k is the number of inversions (to store pairs) Returns: Tuple of (count, list of inversion pairs) """ n = len(arr) inversions = [] count = 0 for i in range(n): for j in range(i + 1, n): if arr[i] > arr[j]: inversions.append((i, j)) count += 1 return count, inversions def count_inversions_brute_force_optimized(arr: List[int]) -> int: """ Count inversions without storing pairs (space optimized). Time Complexity: O(n²) Space Complexity: O(1) """ n = len(arr) count = 0 for i in range(n): for j in range(i + 1, n): if arr[i] > arr[j]: count += 1 return count # Demonstrationif __name__ == "__main__": test_cases = [ [1, 2, 3, 4, 5], # Sorted: 0 inversions [2, 1, 3, 4, 5], # One swap: 1 inversion [2, 4, 1, 3, 5], # 3 inversions [5, 4, 3, 2, 1], # Reverse: 10 inversions (max for n=5) ] for arr in test_cases: count, pairs = count_inversions_brute_force(arr) print(f"Array: {arr}") print(f" Inversions: {count}") print(f" Pairs: {pairs}") print()Why O(n²) Is Unacceptable:
For recommendation systems comparing millions of user rankings or genomic analysis with thousands of genes, O(n²) becomes prohibitive:
| n | Pairs Checked | Time @ 10M pairs/sec |
|---|---|---|
| 1,000 | ~500,000 | 0.05 seconds |
| 10,000 | ~50 million | 5 seconds |
| 100,000 | ~5 billion | ~8 minutes |
| 1,000,000 | ~500 billion | ~14 hours |
When Netflix compares user preferences or researchers analyze genomic rearrangements, they need fast inversion counting. The O(n log n) D&C solution makes this practical for millions of elements.
The key insight is that inversions can be efficiently counted during merge sort. When we merge two sorted halves, we naturally encounter and count all inversions that span the boundary.
Categorizing Inversions:
For any inversion (i, j) with i < j and A[i] > A[j], exactly one of the following is true:
This is an exhaustive partition—every inversion falls into exactly one category. This is the foundation of our D&C approach.
Left inversions are counted recursively on the left half. Right inversions are counted recursively on the right half. Split inversions—the cross-boundary ones—are counted during the merge operation. Merge sort's merge step naturally reveals all split inversions!
How Merge Counts Split Inversions:
During merge, we combine two sorted subarrays. When we take an element from the right subarray before exhausting the left subarray, every remaining element in the left subarray forms a split inversion with this element.
Example: Merging [3, 5, 7] and [2, 4, 8]
Left: [3, 5, 7] Right: [2, 4, 8]
i → ^ j → ^
Step 1: Compare 3, 2. Take 2 (from right).
2 < 3, so 2 has inversions with ALL remaining left elements: (3,2), (5,2), (7,2)
Count += 3 (elements remaining in left = 3)
Step 2: Compare 3, 4. Take 3 (from left).
Taking from left means no new inversions counted.
Step 3: Compare 5, 4. Take 4 (from right).
4 < 5, so inversions with remaining left: (5,4), (7,4)
Count += 2
Step 4: Compare 5, 8. Take 5 (from left). No inversions.
Step 5: Compare 7, 8. Take 7 (from left). No inversions.
Step 6: Take 8 (right exhausted or only right remains). No inversions.
Total split inversions: 3 + 2 = 5
Merged: [2, 3, 4, 5, 7, 8]
The algorithm sorts the array as a side effect. This is necessary—the merge step only correctly counts split inversions when both halves are sorted. Each element in the left half that remains when we pick from the right is guaranteed to be greater than the picked element. This is precisely the inversion condition!
Let's implement the complete algorithm, which is essentially merge sort with inversion counting integrated into the merge step.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
from typing import List, Tuple def count_inversions_dc(arr: List[int]) -> int: """ Count inversions using Divide and Conquer (merge sort based). Time Complexity: O(n log n) Space Complexity: O(n) Args: arr: List of comparable elements Returns: Number of inversions in the array """ if len(arr) <= 1: return 0 # Work on a copy to avoid modifying original temp = arr.copy() return _merge_count(temp, 0, len(temp) - 1) def _merge_count(arr: List[int], left: int, right: int) -> int: """ Recursively count inversions and sort the array. Returns the number of inversions in arr[left:right+1]. """ if left >= right: return 0 mid = (left + right) // 2 # Count inversions in left half left_inversions = _merge_count(arr, left, mid) # Count inversions in right half right_inversions = _merge_count(arr, mid + 1, right) # Count split inversions during merge split_inversions = _merge_and_count(arr, left, mid, right) return left_inversions + right_inversions + split_inversions def _merge_and_count(arr: List[int], left: int, mid: int, right: int) -> int: """ Merge two sorted subarrays and count split inversions. Left subarray: arr[left:mid+1] (already sorted) Right subarray: arr[mid+1:right+1] (already sorted) Key insight: When we take an element from the right subarray, all remaining elements in the left subarray form inversions with it. """ # Copy both halves to temporary arrays left_arr = arr[left:mid + 1] right_arr = arr[mid + 1:right + 1] inversions = 0 i = 0 # Index for left_arr j = 0 # Index for right_arr k = left # Index for merged result n_left = len(left_arr) n_right = len(right_arr) while i < n_left and j < n_right: if left_arr[i] <= right_arr[j]: # Taking from left: no new inversions arr[k] = left_arr[i] i += 1 else: # Taking from right: all remaining left elements are inversions # Because left_arr[i:] are all > right_arr[j] arr[k] = right_arr[j] inversions += (n_left - i) # Count remaining left elements j += 1 k += 1 # Copy remaining elements (no new inversions in these steps) while i < n_left: arr[k] = left_arr[i] i += 1 k += 1 while j < n_right: arr[k] = right_arr[j] j += 1 k += 1 return inversions def count_inversions_with_details(arr: List[int]) -> Tuple[int, List[int]]: """ Count inversions and return the sorted array. Useful for debugging and understanding the algorithm. """ working_copy = arr.copy() count = _merge_count(working_copy, 0, len(working_copy) - 1) return count, working_copy # ============ DEMONSTRATION ============if __name__ == "__main__": test_cases = [ [1, 2, 3, 4, 5], # 0 inversions [2, 1, 3, 4, 5], # 1 inversion: (2,1) [2, 4, 1, 3, 5], # 3 inversions [5, 4, 3, 2, 1], # 10 inversions (maximum) [1, 3, 5, 2, 4, 6], # 3 inversions: (3,2), (5,2), (5,4) [8, 4, 2, 1], # 6 inversions ] print("Inversion Counting using Divide and Conquer") print("=" * 50) for arr in test_cases: count = count_inversions_dc(arr) _, sorted_arr = count_inversions_with_details(arr) print(f"Array: {arr}") print(f" Inversions: {count}") print(f" Sorted: {sorted_arr}") # Verify with brute force n = len(arr) bf_count = sum(1 for i in range(n) for j in range(i+1, n) if arr[i] > arr[j]) assert count == bf_count, f"Mismatch: DC={count}, BF={bf_count}" print() print("All tests passed! ✓")The magic happens in inversions += (n_left - i). When we pick from the right array, all remaining elements in the left array (from index i to the end, which is n_left - i elements) are greater than the picked element. Each of these forms a split inversion with the picked element.
The algorithm's complexity directly mirrors merge sort because it is merge sort with constant additional work per comparison.
Recurrence Relation:
Let T(n) be the time to count inversions in an array of n elements:
$$T(n) = 2T(n/2) + O(n)$$
Where:
Applying Master Theorem:
This is Case 2 (a = 2, b = 2, f(n) = Θ(n) = Θ(n^log₂2)):
$$T(n) = Θ(n \log n)$$
| Approach | Time | Space | Practical for n=1M? |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | ✗ ~14 hours |
| D&C (Merge Sort) | O(n log n) | O(n) | ✓ ~20 seconds |
| Improvement Factor | ~n / log n | ~50,000× faster |
Space Complexity:
The space usage is standard for merge sort. Unlike the closest pair problem, there's no need for multiple sorted views, so space is simpler.
From O(n²) to O(n log n) means: • 1K elements: 500K → 10K ops • 1M elements: 500B → 20M ops This is the difference between hours and seconds.
Why This Works:
The key insight is that counting split inversions during merge is O(n) for the merge, not O(n) per element. The line inversions += (n_left - i) counts multiple inversions in O(1) time by leveraging the sorted property of both halves.
Without sorting, we'd need to compare each left element with each right element: O(n²) per level. With sorting, we get all that information from a single linear scan: O(n) per level.
This is the power of Divide and Conquer: structuring the problem so that combining solutions is cheap.
The inversion counting concept extends to various related problems and has connections to other algorithmic techniques.
| Problem | Description | Approach |
|---|---|---|
| Count Smaller Elements After Self | For each element, count smaller elements to its right | Modified merge sort tracking indices |
| Count of Range Sum | Count subarrays with sum in [lower, upper] | Merge sort on prefix sums |
| Reverse Pairs | Count (i, j) where i < j and arr[i] > 2*arr[j] | Merge sort with modified condition |
| Local Inversions vs Global | Compare adjacent inversions to all inversions | Can use O(n) insight for special cases |
Inversions can also be counted using a Binary Indexed Tree (BIT). Process elements from right to left, querying the BIT for elements smaller than current (inversions with elements to the right). This is also O(n log n) and sometimes more intuitive for related problems.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
"""Related Problem: Count of Smaller Numbers After Self For each element, count how many elements to its right are smaller.This is essentially counting inversions per element. Example:Input: [5, 2, 6, 1]Output: [2, 1, 1, 0] Explanation:- 5: elements 2, 1 are smaller and to the right → 2- 2: element 1 is smaller and to the right → 1- 6: element 1 is smaller and to the right → 1- 1: no elements to the right → 0""" from typing import List, Tuple def count_smaller(nums: List[int]) -> List[int]: """ Count smaller elements after self using modified merge sort. We track original indices to accumulate counts for each position. """ n = len(nums) if n == 0: return [] # Pair each number with its original index indexed = [(nums[i], i) for i in range(n)] result = [0] * n def merge_count(arr: List[Tuple[int, int]], temp: List[Tuple[int, int]], left: int, right: int) -> None: if left >= right: return mid = (left + right) // 2 merge_count(arr, temp, left, mid) merge_count(arr, temp, mid + 1, right) # Merge and count i, j, k = left, mid + 1, left right_count = 0 # Count of right elements merged so far while i <= mid and j <= right: if arr[i][0] > arr[j][0]: # Element from right is smaller temp[k] = arr[j] right_count += 1 j += 1 else: # Element from left temp[k] = arr[i] # All right elements merged so far are smaller than arr[i] result[arr[i][1]] += right_count i += 1 k += 1 while i <= mid: temp[k] = arr[i] result[arr[i][1]] += right_count i += 1 k += 1 while j <= right: temp[k] = arr[j] j += 1 k += 1 arr[left:right + 1] = temp[left:right + 1] temp = [(0, 0)] * n merge_count(indexed, temp, 0, n - 1) return result # Testnums = [5, 2, 6, 1]result = count_smaller(nums)print(f"Input: {nums}")print(f"Output: {result}")# Expected: [2, 1, 1, 0]Counting inversions demonstrates a powerful D&C pattern: augmenting a known algorithm (merge sort) to solve a seemingly unrelated problem. Let's consolidate the key insights:
You now understand one of the most elegant D&C augmentations: counting inversions via merge sort. This technique exemplifies how D&C problems often involve finding the right decomposition where the combine step does double duty—both solving the original problem and computing additional information. Next, we'll explore Karatsuba multiplication, which shows D&C applied to arithmetic.