Data Structures & AlgorithmsKruskal's Algorithm

Kruskal's Algorithm — Edge-Centric MST Construction

LevelAdvanced

Duration75 mins

TopicKruskal's Algorithm

4 / 4

Time Complexity: O(E log E)

The Complete Complexity Picture

We've now explored all components of Kruskal's algorithm: edge-based construction, sorting by weight, and Union-Find for cycle detection. It's time to assemble the complete picture and rigorously analyze the algorithm's time and space complexity.\n\nUnderstanding why Kruskal's algorithm is O(E log E) isn't just academic—it informs practical decisions about when to use Kruskal's versus Prim's algorithm, how to optimize implementations for specific graph types, and how performance scales as your data grows.\n\nThis page synthesizes everything we've learned into a comprehensive complexity analysis, complete with comparisons to alternative approaches.

What You Will Learn

By the end of this page, you will understand the rigorous derivation of O(E log E) complexity, see how each algorithm step contributes to the total, compare Kruskal's and Prim's performance characteristics, and know exactly when to choose each algorithm.

Component-by-Component Analysis

Let's dissect Kruskal's algorithm into its fundamental operations and analyze each one precisely.\n\nAlgorithm Steps Recap:\n\n1. Edge Extraction: Collect all edges into a list\n2. Edge Sorting: Sort edges by weight\n3. Union-Find Initialization: Create V singleton sets\n4. Edge Processing: For each edge, check cycle and potentially add\n\nLet V = number of vertices, E = number of edges.

complexity-breakdown.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
═══════════════════════════════════════════════════════════════════════════
KRUSKAL'S ALGORITHM: STEP-BY-STEP COMPLEXITY ANALYSIS
═══════════════════════════════════════════════════════════════════════════
 
Given: Graph G = (V, E) with |V| = V vertices, |E| = E edges
 
───────────────────────────────────────────────────────────────────────────
STEP 1: Edge Extraction
───────────────────────────────────────────────────────────────────────────
 
Time complexity depends on graph representation:
 
  Edge List:        O(1)   - edges already in list form
  Adjacency List:   O(E)   - iterate through all adjacency entries
  Adjacency Matrix: O(V²)  - scan entire matrix
 
For analysis, assume O(E) (adjacency list is common)
 
───────────────────────────────────────────────────────────────────────────
STEP 2: Edge Sorting
───────────────────────────────────────────────────────────────────────────
 
Using comparison-based sorting (e.g., merge sort, quicksort):
 
  Time:  O(E log E)
  Space: O(E) for merge sort, O(log E) for quicksort
 
This is typically the BOTTLENECK.
 
Alternative scenarios:
  - Pre-sorted edges:         O(E) with adaptive sort
  - Integer weights [0, W]:   O(E + W) with counting sort
  - Weights in [0, 1]:        O(E) expected with bucket sort
 
───────────────────────────────────────────────────────────────────────────
STEP 3: Union-Find Initialization
───────────────────────────────────────────────────────────────────────────
 
Create V singleton sets:
  
  Time:  O(V)
  Space: O(V) for parent and rank arrays
 
───────────────────────────────────────────────────────────────────────────
STEP 4: Edge Processing Loop
───────────────────────────────────────────────────────────────────────────
 
Process up to E edges:
  - For each edge (u, v):
    - Call find(u) and find(v): 2 × O(α(V))
    - If different components, call union(u, v): O(α(V))
  
  Total find operations: 2E (two per edge)
  Total union operations: at most V-1 (one per MST edge)
  
  Total time: O(E × α(V) + V × α(V)) = O(E × α(V))
  
  Since α(V) ≤ 4 for all practical V:
    O(E × α(V)) ≈ O(E) (effectively linear!)
 
───────────────────────────────────────────────────────────────────────────
TOTAL COMPLEXITY
───────────────────────────────────────────────────────────────────────────
 
Time:
  O(E)           Edge extraction (from adjacency list)
+ O(E log E)     Edge sorting                          ← DOMINATES
+ O(V)           Union-Find initialization
+ O(E × α(V))    Edge processing
─────────────────────────────────────────────────────────────
= O(E log E + E × α(V))
= O(E log E)     Since log E >> α(V) for large E
 
Space:
  O(E) for edge list (input representation)
  O(V) for Union-Find (parent + rank arrays)
  O(E) or O(log E) for sorting (depends on algorithm)
───────────────────────────────────────────────────────────────────────────
Total: O(V + E) additional space beyond input

Kruskal's Algorithm Complexity Summary
Step	Time Complexity	Space Complexity	Notes
Edge extraction	O(E)	O(E)	From adjacency list
Edge sorting	O(E log E)	O(E) or O(log E)	Dominates total time
Union-Find init	O(V)	O(V)	Parent + rank arrays
Edge processing	O(E × α(V))	O(1)	Nearly linear
TOTAL	O(E log E)	O(V + E)	Sorting dominates

Why O(E log E) = O(E log V)

You'll often see Kruskal's complexity stated as either O(E log E) or O(E log V). Let's prove these are equivalent.\n\nThe Relationship Between E and V:\n\nFor a simple graph (no multi-edges or self-loops):\n- Minimum edges (tree): E ≥ V - 1 for a connected graph\n- Maximum edges (complete graph): E ≤ V(V-1)/2 = O(V²)\n\nTherefore: V - 1 ≤ E ≤ V(V-1)/2\n\nProving the Equivalence:\n\nUpper Bound: log E ≤ log V² = 2 log V = O(log V)\n\nLower Bound for connected graphs: log E ≥ log(V-1) = Ω(log V)\n\nTherefore: log E = Θ(log V) for connected graphs\n\nConclusion: O(E log E) = O(E log V)\n\nBoth expressions are correct. O(E log E) is more precise for sparse graphs, while O(E log V) emphasizes the relationship to vertex count.

complexity-equivalence.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
═══════════════════════════════════════════════════════════════════════════
THEOREM: O(E log E) = O(E log V) for simple connected graphs
═══════════════════════════════════════════════════════════════════════════
 
GIVEN:
  - Simple graph: no multi-edges, no self-loops
  - Connected: E ≥ V - 1 (at least a spanning tree)
  - Maximum: E ≤ V(V-1)/2 (complete graph)
 
PROOF:
 
Part 1: Show log E = O(log V)
─────────────────────────────────────────────────────────────────
  E ≤ V(V-1)/2 < V²
  
  Taking log of both sides:
    log E < log V²
    log E < 2 log V
  
  Therefore: log E = O(log V)  ✓
 
Part 2: Show log V = O(log E) for connected graphs
─────────────────────────────────────────────────────────────────
  E ≥ V - 1  (connected graph has at least V-1 edges)
  
  For V ≥ 2: V - 1 ≥ V/2, so E ≥ V/2
  
  Taking log:
    log E ≥ log(V/2) = log V - 1
  
  Therefore: log V ≤ log E + 1 = O(log E)  ✓
 
Part 3: Combine results
─────────────────────────────────────────────────────────────────
  log E = O(log V)  from Part 1
  log V = O(log E)  from Part 2
  
  Therefore: log E = Θ(log V)
  
  And thus: E log E = Θ(E log V)
  
  So O(E log E) = O(E log V)  ✓
 
═══════════════════════════════════════════════════════════════════════════
PRACTICAL IMPLICATIONS
═══════════════════════════════════════════════════════════════════════════
 
For sparse graphs (E ≈ V):
  E log E ≈ V log V
  
For medium density (E ≈ V log V):
  E log E ≈ V log V × log(V log V) ≈ V log² V × (1 + log log V / log V)
          ≈ V log² V
 
For dense graphs (E ≈ V²):
  E log E ≈ V² × log V² ≈ 2V² log V
 
The scaling behavior differs based on density:
  Sparse:  O(V log V)
  Medium:  O(V log² V)  
  Dense:   O(V² log V)

When to Use Which Expression

Use O(E log E) when discussing sparse graphs or comparing directly with edge-focused algorithms. Use O(E log V) when emphasizing the relationship to vertex count or comparing with vertex-focused algorithms like Prim's. Both are correct and equivalent.

Space Complexity Deep Dive

While time complexity often receives more attention, space complexity is equally important for practical implementations, especially when dealing with very large graphs.\n\nInput Space:\n\nThe graph itself requires space to represent:\n- Edge list: O(E) — each edge stored once\n- Adjacency list: O(V + E) — vertex array + edge entries\n- Adjacency matrix: O(V²) — even if sparse\n\nAlgorithm Working Space:\n\n1. Edge Array for Sorting: O(E)\n - May be able to sort in-place if input is modifiable\n - Some implementations create a copy: additional O(E)\n\n2. Sorting Overhead:\n - Merge sort: O(E) auxiliary space\n - Quick sort: O(log E) stack space\n - Heap sort: O(1) auxiliary, but often slower\n\n3. Union-Find Structure:\n - Parent array: O(V)\n - Rank array: O(V)\n - Total: O(V)\n\n4. MST Result:\n - V-1 edges: O(V)\n\nTotal Working Space: O(V + E) or O(V) beyond input, depending on whether edge array is copied.

space-optimized-kruskal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
from typing import List, Tuple, Iterator
import heapq
 
Edge = Tuple[int, int, float]
 
 
class SpaceOptimizedUnionFind:
    """
    Union-Find using only a single array that encodes both parent and rank.
    
    If value >= 0: it's the parent (or self if equal to index)
    If value < 0: it's a root, and |value| is the size of the tree
    
    Space: O(V) with just one array instead of two
    """
    __slots__ = ['data']
    
    def __init__(self, n: int):
        # Negative values indicate roots; |value| = size
        self.data = [-1] * n  # Each vertex is root of size-1 tree
    
    def find(self, x: int) -> int:
        if self.data[x] < 0:
            return x  # Negative means root
        self.data[x] = self.find(self.data[x])  # Path compression
        return self.data[x]
    
    def union(self, x: int, y: int) -> bool:
        root_x = self.find(x)
        root_y = self.find(y)
        
        if root_x == root_y:
            return False
        
        # Union by size: smaller tree under larger tree
        if self.data[root_x] > self.data[root_y]:
            # root_y has larger size (more negative)
            self.data[root_y] += self.data[root_x]  # Add sizes
            self.data[root_x] = root_y
        else:
            self.data[root_x] += self.data[root_y]
            self.data[root_y] = root_x
        
        return True
 
 
def kruskal_space_optimized(
    num_vertices: int,
    edges: List[Edge]
) -> Tuple[List[Edge], float]:
    """
    Space-optimized Kruskal's implementation.
    
    Optimizations:
      1. Sort edges in-place (modifies input)
      2. Single-array Union-Find
      3. Early termination
    
    Space: O(V) beyond input (just Union-Find)
    """
    # In-place sort to avoid O(E) copy
    edges.sort(key=lambda e: e[2])
    
    uf = SpaceOptimizedUnionFind(num_vertices)
    mst_edges: List[Edge] = []
    mst_weight = 0.0
    
    for u, v, w in edges:
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == num_vertices - 1:
                break
    
    return mst_edges, mst_weight
 
 
def kruskal_streaming(
    num_vertices: int,
    edge_generator: Iterator[Edge]
) -> Tuple[List[Edge], float]:
    """
    Kruskal's for streaming edges using a heap.
    
    Useful when edges are generated on-the-fly and you don't want
    to store all E edges in memory.
    
    Space: O(V) for Union-Find + O(E) for heap worst case
           But if MST found early, heap may be smaller
    """
    # Build min-heap lazily from streaming edges
    heap = []
    uf = SpaceOptimizedUnionFind(num_vertices)
    mst_edges = []
    mst_weight = 0.0
    edges_needed = num_vertices - 1
    
    for u, v, w in edge_generator:
        heapq.heappush(heap, (w, u, v))
        
        # Try to add edges from heap while MST incomplete
        while heap and len(mst_edges) < edges_needed:
            min_w, min_u, min_v = heapq.heappop(heap)
            if uf.union(min_u, min_v):
                mst_edges.append((min_u, min_v, min_w))
                mst_weight += min_w
    
    # Process remaining heap if MST not yet complete
    while heap and len(mst_edges) < edges_needed:
        w, u, v = heapq.heappop(heap)
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# SPACE COMPARISON
# ═══════════════════════════════════════════════════════════════════
 
def space_comparison():
    """
    Compare space usage of different implementations.
    """
    V = 10000
    E = 500000  # Moderately dense graph
    
    print(f"Graph: V = {V:,}, E = {E:,}")
    print()
    print("SPACE REQUIREMENTS:")
    print("-" * 50)
    print(f"Edge list input:              {E * 3 * 8 / 1e6:.1f} MB")
    print(f"  (E edges × 3 fields × 8 bytes)")
    print()
    print("Standard Kruskal's:")
    print(f"  Sorted edge copy:           {E * 3 * 8 / 1e6:.1f} MB")
    print(f"  Union-Find (2 arrays):      {V * 2 * 8 / 1e6:.3f} MB")
    print(f"  Merge sort auxiliary:       {E * 3 * 8 / 1e6:.1f} MB")
    print(f"  Total additional:           {(E * 6 * 8 + V * 2 * 8) / 1e6:.1f} MB")
    print()
    print("Space-Optimized Kruskal's:")
    print(f"  In-place sort:              0 MB (modifies input)")
    print(f"  Union-Find (1 array):       {V * 8 / 1e6:.3f} MB")
    print(f"  Total additional:           {V * 8 / 1e6:.3f} MB")
    print()
    print(f"Space savings: {(E * 6 * 8) / (V * 8):.0f}x less memory!")

Memory Matters for Large Graphs

For graphs with millions of edges, the difference between O(E) and O(V) additional space can be gigabytes. The space-optimized implementation uses in-place sorting and a single-array Union-Find to minimize memory overhead.

Kruskal's vs Prim's: Complexity Comparison

Both Kruskal's and Prim's algorithms produce minimum spanning trees, but they have different complexity profiles based on graph characteristics and implementation choices.\n\nPrim's Algorithm Complexity:\n\nWith a binary heap (priority queue):\n- Time: O((V + E) log V) = O(E log V) for connected graphs\n- Space: O(V) for distances + O(V) for heap\n\nWith a Fibonacci heap:\n- Time: O(E + V log V)\n- Space: O(V)\n\nKruskal's Algorithm Complexity:\n\n- Time: O(E log E) = O(E log V)\n- Space: O(V + E)\n\nComparing the Time Complexities:\n\nBoth are O(E log V) with standard implementations! But the constants and practical performance differ.

Kruskal's vs Prim's Algorithm Comparison
Aspect	Kruskal's	Prim's (Binary Heap)	Prim's (Fibonacci Heap)
Time complexity	O(E log E)	O(E log V)	O(E + V log V)
Dominant operation	Edge sorting	Priority queue ops	Decrease-key ops
Space complexity	O(V + E)	O(V)	O(V)
Sparse graphs (E≈V)	O(V log V)	O(V log V)	O(V log V)
Dense graphs (E≈V²)	O(V² log V)	O(V² log V)	O(V² + V log V) = O(V²)
Data structure	Union-Find	Binary heap	Fibonacci heap
Starting point	None needed	Any vertex	Any vertex
Disconnected graphs	Finds MSF naturally	Only one component	Only one component

algorithm-selection.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
═══════════════════════════════════════════════════════════════════════════
ALGORITHM SELECTION GUIDE
═══════════════════════════════════════════════════════════════════════════
 
CHOOSE KRUSKAL'S WHEN:
──────────────────────────────────────────────────────────────────────────
 
1. Graph is sparse (E ≈ O(V) or E ≈ O(V log V))
   - Sorting E edges is fast when E is small
   - Union-Find overhead is minimal
   
2. Edges are given as a list (not adjacency structure)
   - No extraction overhead
   - Natural fit for the algorithm
   
3. Edges are already sorted or nearly sorted
   - Timsort achieves O(E) on nearly-sorted data
   - Eliminates the sorting bottleneck
   
4. Graph may be disconnected
   - Kruskal's naturally produces minimum spanning forest
   - No special handling needed
   
5. Simple implementation is desired
   - Union-Find is simpler than priority queues in some languages
   - Fewer edge cases to handle
 
6. Multiple MSTs needed (edge weights are tie-broken differently)
   - Easy to generate all MSTs by handling ties
   
 
CHOOSE PRIM'S WHEN:
──────────────────────────────────────────────────────────────────────────
 
1. Graph is dense (E ≈ O(V²))
   - Sorting V² edges is O(V² log V)
   - Prim's with adjacency matrix is O(V²), no log factor!
   - This is a significant speedup for dense graphs
   
2. Graph is stored as adjacency list/matrix
   - No edge extraction needed
   - Frontier expansion is natural
   
3. You need the MST rooted at a specific vertex
   - Prim's naturally grows from a chosen root
   
4. Fibonacci heaps are available and E >> V
   - O(E + V log V) beats O(E log E) when E is large
   
5. Memory is constrained
   - Prim's needs only O(V) beyond the graph
   - Kruskal's needs O(E) for edge list
   
6. Graph is implicit (edges generated on demand)
   - Prim's explores neighbors lazily
   - Kruskal's needs all edges upfront for sorting
 
═══════════════════════════════════════════════════════════════════════════
SPECIAL CASES
═══════════════════════════════════════════════════════════════════════════
 
Very sparse (E = V - 1, tree input):
  Both: O(V log V) - essentially equivalent
 
Very dense (E = V(V-1)/2, complete graph):
  Kruskal's: O(V² log V) - sorting dominates
  Prim's (adjacency matrix): O(V²) - no heap, just array scan
  Winner: Prim's by log V factor!
 
Geometric graphs (MST of points in plane):
  Specialized algorithms exist that beat both:
  Delaunay triangulation + MST = O(V log V)
 
Dynamic graphs (edges added/removed):
  Neither is ideal; consider specialized dynamic MST structures

Dense Graph Warning

For complete or nearly complete graphs (E ≈ V²), Prim's algorithm with an adjacency matrix and simple array-based distance tracking achieves O(V²) time—beating Kruskal's O(V² log V) by a logarithmic factor. This can be significant: for V = 10,000, that's a 13× speedup.

Empirical Performance Analysis

Theoretical complexity tells part of the story, but real-world performance depends on constants, cache behavior, and implementation details. Let's examine actual benchmarks.\n\nFactors Affecting Real Performance:\n\n1. Cache Efficiency:\n - Sorting accesses edges sequentially (good cache behavior)\n - Union-Find has random access patterns (cache misses)\n - Priority queues have moderate cache behavior\n\n2. Operation Costs:\n - Comparison: ~1 CPU cycle\n - Memory access (cache hit): ~4 cycles\n - Memory access (cache miss): ~100+ cycles\n - Division/modulo: ~20-40 cycles\n\n3. Constant Factors:\n - Timsort: ~1.5× comparison vs raw merge sort\n - Path compression: ~2-3 pointer chases average\n - Heap operations: ~log₂(n) comparisons per op

benchmark-kruskal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
import time
import random
from typing import List, Tuple
import heapq
 
Edge = Tuple[int, int, float]
 
 
def generate_random_graph(
    num_vertices: int, 
    num_edges: int
) -> List[Edge]:
    """Generate a random connected graph."""
    edges = []
    
    # First, ensure connectivity with a random spanning tree
    vertices = list(range(num_vertices))
    random.shuffle(vertices)
    for i in range(1, num_vertices):
        u = vertices[i]
        v = vertices[random.randint(0, i - 1)]
        weight = random.random() * 100
        edges.append((min(u, v), max(u, v), weight))
    
    # Add remaining edges
    edge_set = set((e[0], e[1]) for e in edges)
    while len(edges) < num_edges:
        u = random.randint(0, num_vertices - 1)
        v = random.randint(0, num_vertices - 1)
        if u != v:
            key = (min(u, v), max(u, v))
            if key not in edge_set:
                edge_set.add(key)
                edges.append((key[0], key[1], random.random() * 100))
    
    return edges
 
 
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True
 
 
def benchmark_kruskal(V: int, E: int, trials: int = 3) -> dict:
    """
    Benchmark Kruskal's algorithm on random graphs.
    Returns timing breakdown.
    """
    results = {
        'sort_time': 0,
        'union_find_time': 0,
        'total_time': 0
    }
    
    for _ in range(trials):
        edges = generate_random_graph(V, E)
        
        # Time sorting
        start = time.perf_counter()
        sorted_edges = sorted(edges, key=lambda e: e[2])
        sort_time = time.perf_counter() - start
        
        # Time Union-Find processing
        start = time.perf_counter()
        uf = UnionFind(V)
        mst_count = 0
        for u, v, w in sorted_edges:
            if uf.union(u, v):
                mst_count += 1
                if mst_count == V - 1:
                    break
        uf_time = time.perf_counter() - start
        
        results['sort_time'] += sort_time
        results['union_find_time'] += uf_time
        results['total_time'] += sort_time + uf_time
    
    # Average over trials
    for k in results:
        results[k] /= trials
    
    return results
 
 
def run_benchmarks():
    """
    Run benchmarks on various graph sizes.
    """
    print("═" * 70)
    print("KRUSKAL'S ALGORITHM BENCHMARK")
    print("═" * 70)
    print()
    
    test_cases = [
        # (V, E, description)
        (1000, 5000, "Sparse (E = 5V)"),
        (1000, 50000, "Medium (E = 50V)"),
        (1000, 200000, "Dense (E = 200V)"),
        (10000, 50000, "Large sparse"),
        (10000, 500000, "Large medium"),
        (10000, 2000000, "Large dense"),
    ]
    
    print(f"{'Graph':<20} {'Sort':>12} {'Union-Find':>12} {'Total':>12} {'Sort %':>8}")
    print("-" * 70)
    
    for V, E, desc in test_cases:
        result = benchmark_kruskal(V, E)
        sort_pct = 100 * result['sort_time'] / result['total_time']
        
        print(f"{desc:<20} "
              f"{result['sort_time']*1000:>10.2f}ms "
              f"{result['union_find_time']*1000:>10.2f}ms "
              f"{result['total_time']*1000:>10.2f}ms "
              f"{sort_pct:>7.1f}%")
    
    print()
    print("OBSERVATIONS:")
    print("-" * 70)
    print("• Sorting consistently takes 80-95% of total time")
    print("• Union-Find is nearly O(1) per operation in practice")
    print("• For large dense graphs, sorting dominates overwhelmingly")
 
 
if __name__ == "__main__":
    run_benchmarks()

Typical Benchmark Results:\n\n| Graph | Sort Time | Union-Find Time | Sort % |\n|-------|-----------|-----------------|--------|\n| 1K vertices, 5K edges (sparse) | 0.8ms | 0.2ms | 80% |\n| 1K vertices, 200K edges (dense) | 35ms | 4ms | 90% |\n| 10K vertices, 2M edges (large dense) | 800ms | 70ms | 92% |\n\nInsights:\n\n1. Sorting consistently dominates (80-95% of time)\n2. Union-Find is effectively constant per operation\n3. The O(E log E) bound is tight and accurate\n4. Cache effects are significant: sorted edge access is sequential and fast

Optimization Techniques

While Kruskal's algorithm is already efficient, several techniques can improve performance in specific scenarios.\n\n1. Lazy Edge Extraction (Heap-Based)\n\nInstead of sorting all edges upfront, use a min-heap to extract edges in sorted order on demand. If the MST is found early, we avoid sorting edges we never examine.\n\n- Build heap: O(E)\n- Extract k edges: O(k log E)\n- Total for finding MST: O(E + (V-1) log E) if few edges examined\n\nBest for: Very dense graphs where V << E\n\n2. Early Termination\n\nAlways check if MST is complete (V-1 edges added) and stop early. This is especially important for dense graphs where you might only need to examine 1% of edges.\n\n3. Integer Weight Optimization\n\nIf weights are integers in a small range, counting sort or radix sort achieves O(E) time, eliminating the log factor entirely.\n\n4. Parallel Sorting\n\nFor very large graphs, use parallel sorting algorithms. With P processors, sorting can be O(E log E / P + E), providing near-linear speedup.\n\n5. Filter-Kruskal Variant\n\nFor very dense graphs, first filter out edges that "probably" won't be in the MST using sampling, then run Kruskal's on the filtered set. Expected time: O(E + V log² V).

kruskal-optimizations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
from typing import List, Tuple
import heapq
from collections import defaultdict
 
Edge = Tuple[int, int, float]
 
 
# ═══════════════════════════════════════════════════════════════════
# OPTIMIZATION 1: Heap-based lazy extraction
# ═══════════════════════════════════════════════════════════════════
 
def kruskal_lazy_heap(
    num_vertices: int,
    edges: List[Edge]
) -> Tuple[List[Edge], float]:
    """
    Kruskal's with lazy edge extraction using heap.
    
    Time: O(E + k log E) where k = number of edges examined
    For dense graphs where k << E, this can be much faster!
    """
    # Build heap in O(E)
    heap = [(w, u, v) for u, v, w in edges]
    heapq.heapify(heap)
    
    uf = UnionFind(num_vertices)
    mst = []
    weight = 0.0
    
    while heap and len(mst) < num_vertices - 1:
        w, u, v = heapq.heappop(heap)  # O(log E) per extraction
        if uf.union(u, v):
            mst.append((u, v, w))
            weight += w
    
    return mst, weight
 
 
# ═══════════════════════════════════════════════════════════════════
# OPTIMIZATION 2: Integer weight counting sort
# ═══════════════════════════════════════════════════════════════════
 
def kruskal_counting_sort(
    num_vertices: int,
    edges: List[Edge],
    max_weight: int
) -> Tuple[List[Edge], float]:
    """
    Kruskal's with counting sort for integer weights.
    
    Time: O(E + max_weight) for sorting + O(E × α(V)) for processing
    When max_weight = O(E), total is O(E) - no log factor!
    """
    # Bucket edges by weight
    buckets = defaultdict(list)
    for u, v, w in edges:
        buckets[int(w)].append((u, v, w))
    
    uf = UnionFind(num_vertices)
    mst = []
    weight = 0.0
    
    # Process buckets in order (0 to max_weight)
    for w in range(max_weight + 1):
        for edge in buckets[w]:
            u, v, edge_w = edge
            if uf.union(u, v):
                mst.append(edge)
                weight += edge_w
                if len(mst) == num_vertices - 1:
                    return mst, weight
    
    return mst, weight
 
 
# ═══════════════════════════════════════════════════════════════════
# OPTIMIZATION 3: Filter-Kruskal for very dense graphs
# ═══════════════════════════════════════════════════════════════════
 
def kruskal_filter(
    num_vertices: int,
    edges: List[Edge],
    sample_size: int = None
) -> Tuple[List[Edge], float]:
    """
    Filter-Kruskal: filter edges before full sorting.
    
    Idea: Sample edges to estimate MST weight threshold.
    Edges much heavier than the threshold are unlikely to be in MST.
    
    Expected time: O(E + V log² V) for very dense graphs
    """
    import random
    
    if sample_size is None:
        sample_size = min(len(edges), num_vertices * 10)
    
    if len(edges) <= sample_size:
        # Not enough edges to benefit from filtering
        return kruskal_standard(num_vertices, edges)
    
    # Sample edges and find approximate MST
    sample = random.sample(edges, sample_size)
    sample_mst, sample_weight = kruskal_standard(num_vertices, sample)
    
    if len(sample_mst) < num_vertices - 1:
        # Sample wasn't connected, fall back to full algorithm
        return kruskal_standard(num_vertices, edges)
    
    # Threshold: 2× maximum edge weight in sample MST
    threshold = max(e[2] for e in sample_mst) * 2
    
    # Filter edges
    filtered = [e for e in edges if e[2] <= threshold]
    
    # Run Kruskal's on filtered edges
    result_mst, result_weight = kruskal_standard(num_vertices, filtered)
    
    if len(result_mst) < num_vertices - 1:
        # Filtering was too aggressive, use full edge set
        remaining = [e for e in edges if e[2] > threshold]
        all_edges = filtered + remaining
        return kruskal_standard(num_vertices, all_edges)
    
    return result_mst, result_weight
 
 
def kruskal_standard(num_vertices, edges):
    """Standard Kruskal's for comparison."""
    sorted_edges = sorted(edges, key=lambda e: e[2])
    uf = UnionFind(num_vertices)
    mst = []
    weight = 0.0
    
    for u, v, w in sorted_edges:
        if uf.union(u, v):
            mst.append((u, v, w))
            weight += w
            if len(mst) == num_vertices - 1:
                break
    
    return mst, weight
 
 
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

Optimization Techniques Summary
Technique	Best For	Time Improvement	Trade-off
Lazy heap	Dense graphs, E >> V	O(E + V log E) vs O(E log E)	Heap overhead per extraction
Counting sort	Integer weights, small range	O(E) vs O(E log E)	Memory for buckets: O(max_weight)
Early termination	All graphs	Up to 99% for complete graphs	None (always use this)
Filter-Kruskal	Very dense, E ≈ V²	O(E + V log² V) expected	Probabilistic, may need fallback
Parallel sort	Large E, multi-core	~P× speedup, P = cores	Thread overhead, diminishing returns

Complexity in Practice: Real-World Graphs

Let's examine how Kruskal's algorithm performs on actual graph types you might encounter.\n\nRoad Networks:\n- V = millions of intersections\n- E = a few × V (most intersections connect to 3-4 roads)\n- Very sparse: E ≈ 3V\n- Kruskal's time: O(V log V) — excellent!\n\nSocial Networks:\n- V = millions of users\n- E = billions of friendships\n- Medium density: E ≈ 100V to 1000V\n- Kruskal's time: O(V log V × 100-1000) — still manageable\n\nComplete Distance Matrices:\n- E.g., traveling salesman with all pairwise distances\n- E = V(V-1)/2 = O(V²)\n- Kruskal's time: O(V² log V) — consider Prim's instead\n\nGrid Graphs (Image Processing):\n- V = width × height pixels\n- E ≈ 2V (each pixel connects to neighbors)\n- Very sparse, regular structure\n- Kruskal's time: O(V log V) — excellent!\n\nCommunication Networks:\n- V = routers/switches\n- E = physical links (typically 2-10 per node)\n- Very sparse\n- Kruskal's time: O(V log V) — very fast!

scaling-analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
═══════════════════════════════════════════════════════════════════════════
KRUSKAL'S ALGORITHM SCALING FOR REAL-WORLD GRAPHS
═══════════════════════════════════════════════════════════════════════════
 
ROAD NETWORK (e.g., US road system)
───────────────────────────────────────────────────────────────────────────
V = 24 million intersections
E = 58 million road segments (≈ 2.4V, very sparse)
 
Time breakdown:
  Sort 58M edges:  ~5-10 seconds
  Union-Find:      ~1 second
  Total:           ~10 seconds on modern hardware
 
Memory:
  Edge list:       ~1.5 GB (58M × 24 bytes)
  Union-Find:      ~400 MB (24M × 16 bytes)
 
 
SOCIAL NETWORK (e.g., Facebook-scale)
───────────────────────────────────────────────────────────────────────────
V = 3 billion users
E = 200 billion friendships (≈ 70V, medium density)
 
Time breakdown:
  Sort 200B edges: ~60-90 minutes (distributed sorting needed)
  Union-Find:      ~10 minutes
  Total:           ~1-2 hours (needs parallelization)
 
Challenge: Data doesn't fit in single machine memory
 
 
COMPLETE GRAPH (e.g., TSP distances)
───────────────────────────────────────────────────────────────────────────
V = 10,000 cities
E = 50 million edges (V² / 2)
 
Time breakdown:
  Sort 50M edges:  ~10 seconds
  Union-Find:      ~1 second
  Total:           ~11 seconds
 
Comparison with Prim's (adjacency matrix):
  Prim's:          ~1 second (O(V²) = O(100 million) ops)
  Kruskal's:       ~11 seconds
 
Winner: Prim's by 10× for dense graphs!
 
 
GRID GRAPH (e.g., 4K Image)
───────────────────────────────────────────────────────────────────────────
V = 3840 × 2160 = 8.3 million pixels
E = 16.5 million edges (each pixel to 2 neighbors on average)
 
Time breakdown:
  Sort 16.5M edges: ~1.5 seconds
  Union-Find:       ~0.5 seconds
  Total:            ~2 seconds
 
Very fast for typical image processing applications!
 
 
═══════════════════════════════════════════════════════════════════════════
SUMMARY: WHEN KRUSKAL'S EXCELS
═══════════════════════════════════════════════════════════════════════════
 
Kruskal's shines when:
  ✓ E = O(V) to O(V log V) - sparse graphs
  ✓ Edges already available as a list
  ✓ Algorithm simplicity valued over raw speed
  ✓ Graph may be disconnected (need MSF)
 
Consider alternatives when:
  ✗ E = Θ(V²) - complete or near-complete graphs
  ✗ Graph is too large for memory (streaming edges)
  ✗ Incremental/dynamic MST updates needed

Rule of Thumb

If E < V × log V, Kruskal's is likely the best choice. If E > V × log V, consider Prim's with a priority queue. If E ≈ V², strongly consider Prim's with an adjacency matrix (no priority queue needed).

Summary: Mastering Kruskal's Complexity

We've completed our rigorous analysis of Kruskal's algorithm complexity. Let's consolidate the essential insights:

Key Takeaways

•O(E log E) = O(E log V) — Both expressions are equivalent for connected graphs and describe the same behavior.
•Sorting dominates — Edge sorting accounts for 80-95% of runtime; Union-Find is effectively O(1) per operation.
•Space: O(V + E) — Union-Find needs O(V), edge list needs O(E), but optimizations can reduce additional space to O(V).
•Kruskal's vs Prim's: both O(E log V) — But Kruskal excels for sparse graphs; Prim excels for dense graphs.
•Optimizations exist — Heap-based extraction, counting sort, filter-Kruskal can improve performance in specific scenarios.
•Practical excellence — For most real-world graphs (road networks, grids, communication networks), Kruskal's is simple and fast.

Kruskal's Algorithm Complete Reference
Property	Value	Notes
Time complexity	O(E log E)	Dominated by sorting
Space complexity	O(V + E)	Can be O(V) with in-place sort
Best case	O(E)	Pre-sorted edges or integer weights
Worst case	O(E log E)	Random edge weights
Sorting fraction	80-95%	Of total runtime
Union-Find per op	O(α(n)) ≈ O(1)	Practically constant
Sparse graph time	O(V log V)	When E = O(V)
Dense graph time	O(V² log V)	When E = O(V²)

Module Complete

Congratulations! You've mastered Kruskal's algorithm—from the edge-centric philosophy through sorting, Union-Find cycle detection, and rigorous complexity analysis. You now have the deep understanding needed to implement, optimize, and correctly apply Kruskal's algorithm to solve minimum spanning tree problems.

4 / 4

Loading learning content...

Data Structures & AlgorithmsKruskal's Algorithm

Kruskal's Algorithm — Edge-Centric MST Construction

LevelAdvanced

Duration75 mins

TopicKruskal's Algorithm

4 / 4

Time Complexity: O(E log E)

The Complete Complexity Picture

What You Will Learn

Component-by-Component Analysis

complexity-breakdown.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
═══════════════════════════════════════════════════════════════════════════
KRUSKAL'S ALGORITHM: STEP-BY-STEP COMPLEXITY ANALYSIS
═══════════════════════════════════════════════════════════════════════════
 
Given: Graph G = (V, E) with |V| = V vertices, |E| = E edges
 
───────────────────────────────────────────────────────────────────────────
STEP 1: Edge Extraction
───────────────────────────────────────────────────────────────────────────
 
Time complexity depends on graph representation:
 
  Edge List:        O(1)   - edges already in list form
  Adjacency List:   O(E)   - iterate through all adjacency entries
  Adjacency Matrix: O(V²)  - scan entire matrix
 
For analysis, assume O(E) (adjacency list is common)
 
───────────────────────────────────────────────────────────────────────────
STEP 2: Edge Sorting
───────────────────────────────────────────────────────────────────────────
 
Using comparison-based sorting (e.g., merge sort, quicksort):
 
  Time:  O(E log E)
  Space: O(E) for merge sort, O(log E) for quicksort
 
This is typically the BOTTLENECK.
 
Alternative scenarios:
  - Pre-sorted edges:         O(E) with adaptive sort
  - Integer weights [0, W]:   O(E + W) with counting sort
  - Weights in [0, 1]:        O(E) expected with bucket sort
 
───────────────────────────────────────────────────────────────────────────
STEP 3: Union-Find Initialization
───────────────────────────────────────────────────────────────────────────
 
Create V singleton sets:
  
  Time:  O(V)
  Space: O(V) for parent and rank arrays
 
───────────────────────────────────────────────────────────────────────────
STEP 4: Edge Processing Loop
───────────────────────────────────────────────────────────────────────────
 
Process up to E edges:
  - For each edge (u, v):
    - Call find(u) and find(v): 2 × O(α(V))
    - If different components, call union(u, v): O(α(V))
  
  Total find operations: 2E (two per edge)
  Total union operations: at most V-1 (one per MST edge)
  
  Total time: O(E × α(V) + V × α(V)) = O(E × α(V))
  
  Since α(V) ≤ 4 for all practical V:
    O(E × α(V)) ≈ O(E) (effectively linear!)
 
───────────────────────────────────────────────────────────────────────────
TOTAL COMPLEXITY
───────────────────────────────────────────────────────────────────────────
 
Time:
  O(E)           Edge extraction (from adjacency list)
+ O(E log E)     Edge sorting                          ← DOMINATES
+ O(V)           Union-Find initialization
+ O(E × α(V))    Edge processing
─────────────────────────────────────────────────────────────
= O(E log E + E × α(V))
= O(E log E)     Since log E >> α(V) for large E
 
Space:
  O(E) for edge list (input representation)
  O(V) for Union-Find (parent + rank arrays)
  O(E) or O(log E) for sorting (depends on algorithm)
───────────────────────────────────────────────────────────────────────────
Total: O(V + E) additional space beyond input

Kruskal's Algorithm Complexity Summary
Step	Time Complexity	Space Complexity	Notes
Edge extraction	O(E)	O(E)	From adjacency list
Edge sorting	O(E log E)	O(E) or O(log E)	Dominates total time
Union-Find init	O(V)	O(V)	Parent + rank arrays
Edge processing	O(E × α(V))	O(1)	Nearly linear
TOTAL	O(E log E)	O(V + E)	Sorting dominates

Why O(E log E) = O(E log V)

complexity-equivalence.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
═══════════════════════════════════════════════════════════════════════════
THEOREM: O(E log E) = O(E log V) for simple connected graphs
═══════════════════════════════════════════════════════════════════════════
 
GIVEN:
  - Simple graph: no multi-edges, no self-loops
  - Connected: E ≥ V - 1 (at least a spanning tree)
  - Maximum: E ≤ V(V-1)/2 (complete graph)
 
PROOF:
 
Part 1: Show log E = O(log V)
─────────────────────────────────────────────────────────────────
  E ≤ V(V-1)/2 < V²
  
  Taking log of both sides:
    log E < log V²
    log E < 2 log V
  
  Therefore: log E = O(log V)  ✓
 
Part 2: Show log V = O(log E) for connected graphs
─────────────────────────────────────────────────────────────────
  E ≥ V - 1  (connected graph has at least V-1 edges)
  
  For V ≥ 2: V - 1 ≥ V/2, so E ≥ V/2
  
  Taking log:
    log E ≥ log(V/2) = log V - 1
  
  Therefore: log V ≤ log E + 1 = O(log E)  ✓
 
Part 3: Combine results
─────────────────────────────────────────────────────────────────
  log E = O(log V)  from Part 1
  log V = O(log E)  from Part 2
  
  Therefore: log E = Θ(log V)
  
  And thus: E log E = Θ(E log V)
  
  So O(E log E) = O(E log V)  ✓
 
═══════════════════════════════════════════════════════════════════════════
PRACTICAL IMPLICATIONS
═══════════════════════════════════════════════════════════════════════════
 
For sparse graphs (E ≈ V):
  E log E ≈ V log V
  
For medium density (E ≈ V log V):
  E log E ≈ V log V × log(V log V) ≈ V log² V × (1 + log log V / log V)
          ≈ V log² V
 
For dense graphs (E ≈ V²):
  E log E ≈ V² × log V² ≈ 2V² log V
 
The scaling behavior differs based on density:
  Sparse:  O(V log V)
  Medium:  O(V log² V)  
  Dense:   O(V² log V)

When to Use Which Expression

Space Complexity Deep Dive

space-optimized-kruskal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
from typing import List, Tuple, Iterator
import heapq
 
Edge = Tuple[int, int, float]
 
 
class SpaceOptimizedUnionFind:
    """
    Union-Find using only a single array that encodes both parent and rank.
    
    If value >= 0: it's the parent (or self if equal to index)
    If value < 0: it's a root, and |value| is the size of the tree
    
    Space: O(V) with just one array instead of two
    """
    __slots__ = ['data']
    
    def __init__(self, n: int):
        # Negative values indicate roots; |value| = size
        self.data = [-1] * n  # Each vertex is root of size-1 tree
    
    def find(self, x: int) -> int:
        if self.data[x] < 0:
            return x  # Negative means root
        self.data[x] = self.find(self.data[x])  # Path compression
        return self.data[x]
    
    def union(self, x: int, y: int) -> bool:
        root_x = self.find(x)
        root_y = self.find(y)
        
        if root_x == root_y:
            return False
        
        # Union by size: smaller tree under larger tree
        if self.data[root_x] > self.data[root_y]:
            # root_y has larger size (more negative)
            self.data[root_y] += self.data[root_x]  # Add sizes
            self.data[root_x] = root_y
        else:
            self.data[root_x] += self.data[root_y]
            self.data[root_y] = root_x
        
        return True
 
 
def kruskal_space_optimized(
    num_vertices: int,
    edges: List[Edge]
) -> Tuple[List[Edge], float]:
    """
    Space-optimized Kruskal's implementation.
    
    Optimizations:
      1. Sort edges in-place (modifies input)
      2. Single-array Union-Find
      3. Early termination
    
    Space: O(V) beyond input (just Union-Find)
    """
    # In-place sort to avoid O(E) copy
    edges.sort(key=lambda e: e[2])
    
    uf = SpaceOptimizedUnionFind(num_vertices)
    mst_edges: List[Edge] = []
    mst_weight = 0.0
    
    for u, v, w in edges:
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == num_vertices - 1:
                break
    
    return mst_edges, mst_weight
 
 
def kruskal_streaming(
    num_vertices: int,
    edge_generator: Iterator[Edge]
) -> Tuple[List[Edge], float]:
    """
    Kruskal's for streaming edges using a heap.
    
    Useful when edges are generated on-the-fly and you don't want
    to store all E edges in memory.
    
    Space: O(V) for Union-Find + O(E) for heap worst case
           But if MST found early, heap may be smaller
    """
    # Build min-heap lazily from streaming edges
    heap = []
    uf = SpaceOptimizedUnionFind(num_vertices)
    mst_edges = []
    mst_weight = 0.0
    edges_needed = num_vertices - 1
    
    for u, v, w in edge_generator:
        heapq.heappush(heap, (w, u, v))
        
        # Try to add edges from heap while MST incomplete
        while heap and len(mst_edges) < edges_needed:
            min_w, min_u, min_v = heapq.heappop(heap)
            if uf.union(min_u, min_v):
                mst_edges.append((min_u, min_v, min_w))
                mst_weight += min_w
    
    # Process remaining heap if MST not yet complete
    while heap and len(mst_edges) < edges_needed:
        w, u, v = heapq.heappop(heap)
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# SPACE COMPARISON
# ═══════════════════════════════════════════════════════════════════
 
def space_comparison():
    """
    Compare space usage of different implementations.
    """
    V = 10000
    E = 500000  # Moderately dense graph
    
    print(f"Graph: V = {V:,}, E = {E:,}")
    print()
    print("SPACE REQUIREMENTS:")
    print("-" * 50)
    print(f"Edge list input:              {E * 3 * 8 / 1e6:.1f} MB")
    print(f"  (E edges × 3 fields × 8 bytes)")
    print()
    print("Standard Kruskal's:")
    print(f"  Sorted edge copy:           {E * 3 * 8 / 1e6:.1f} MB")
    print(f"  Union-Find (2 arrays):      {V * 2 * 8 / 1e6:.3f} MB")
    print(f"  Merge sort auxiliary:       {E * 3 * 8 / 1e6:.1f} MB")
    print(f"  Total additional:           {(E * 6 * 8 + V * 2 * 8) / 1e6:.1f} MB")
    print()
    print("Space-Optimized Kruskal's:")
    print(f"  In-place sort:              0 MB (modifies input)")
    print(f"  Union-Find (1 array):       {V * 8 / 1e6:.3f} MB")
    print(f"  Total additional:           {V * 8 / 1e6:.3f} MB")
    print()
    print(f"Space savings: {(E * 6 * 8) / (V * 8):.0f}x less memory!")

Memory Matters for Large Graphs

Kruskal's vs Prim's: Complexity Comparison

Kruskal's vs Prim's Algorithm Comparison
Aspect	Kruskal's	Prim's (Binary Heap)	Prim's (Fibonacci Heap)
Time complexity	O(E log E)	O(E log V)	O(E + V log V)
Dominant operation	Edge sorting	Priority queue ops	Decrease-key ops
Space complexity	O(V + E)	O(V)	O(V)
Sparse graphs (E≈V)	O(V log V)	O(V log V)	O(V log V)
Dense graphs (E≈V²)	O(V² log V)	O(V² log V)	O(V² + V log V) = O(V²)
Data structure	Union-Find	Binary heap	Fibonacci heap
Starting point	None needed	Any vertex	Any vertex
Disconnected graphs	Finds MSF naturally	Only one component	Only one component

algorithm-selection.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
═══════════════════════════════════════════════════════════════════════════
ALGORITHM SELECTION GUIDE
═══════════════════════════════════════════════════════════════════════════
 
CHOOSE KRUSKAL'S WHEN:
──────────────────────────────────────────────────────────────────────────
 
1. Graph is sparse (E ≈ O(V) or E ≈ O(V log V))
   - Sorting E edges is fast when E is small
   - Union-Find overhead is minimal
   
2. Edges are given as a list (not adjacency structure)
   - No extraction overhead
   - Natural fit for the algorithm
   
3. Edges are already sorted or nearly sorted
   - Timsort achieves O(E) on nearly-sorted data
   - Eliminates the sorting bottleneck
   
4. Graph may be disconnected
   - Kruskal's naturally produces minimum spanning forest
   - No special handling needed
   
5. Simple implementation is desired
   - Union-Find is simpler than priority queues in some languages
   - Fewer edge cases to handle
 
6. Multiple MSTs needed (edge weights are tie-broken differently)
   - Easy to generate all MSTs by handling ties
   
 
CHOOSE PRIM'S WHEN:
──────────────────────────────────────────────────────────────────────────
 
1. Graph is dense (E ≈ O(V²))
   - Sorting V² edges is O(V² log V)
   - Prim's with adjacency matrix is O(V²), no log factor!
   - This is a significant speedup for dense graphs
   
2. Graph is stored as adjacency list/matrix
   - No edge extraction needed
   - Frontier expansion is natural
   
3. You need the MST rooted at a specific vertex
   - Prim's naturally grows from a chosen root
   
4. Fibonacci heaps are available and E >> V
   - O(E + V log V) beats O(E log E) when E is large
   
5. Memory is constrained
   - Prim's needs only O(V) beyond the graph
   - Kruskal's needs O(E) for edge list
   
6. Graph is implicit (edges generated on demand)
   - Prim's explores neighbors lazily
   - Kruskal's needs all edges upfront for sorting
 
═══════════════════════════════════════════════════════════════════════════
SPECIAL CASES
═══════════════════════════════════════════════════════════════════════════
 
Very sparse (E = V - 1, tree input):
  Both: O(V log V) - essentially equivalent
 
Very dense (E = V(V-1)/2, complete graph):
  Kruskal's: O(V² log V) - sorting dominates
  Prim's (adjacency matrix): O(V²) - no heap, just array scan
  Winner: Prim's by log V factor!
 
Geometric graphs (MST of points in plane):
  Specialized algorithms exist that beat both:
  Delaunay triangulation + MST = O(V log V)
 
Dynamic graphs (edges added/removed):
  Neither is ideal; consider specialized dynamic MST structures

Dense Graph Warning

Empirical Performance Analysis

benchmark-kruskal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
import time
import random
from typing import List, Tuple
import heapq
 
Edge = Tuple[int, int, float]
 
 
def generate_random_graph(
    num_vertices: int, 
    num_edges: int
) -> List[Edge]:
    """Generate a random connected graph."""
    edges = []
    
    # First, ensure connectivity with a random spanning tree
    vertices = list(range(num_vertices))
    random.shuffle(vertices)
    for i in range(1, num_vertices):
        u = vertices[i]
        v = vertices[random.randint(0, i - 1)]
        weight = random.random() * 100
        edges.append((min(u, v), max(u, v), weight))
    
    # Add remaining edges
    edge_set = set((e[0], e[1]) for e in edges)
    while len(edges) < num_edges:
        u = random.randint(0, num_vertices - 1)
        v = random.randint(0, num_vertices - 1)
        if u != v:
            key = (min(u, v), max(u, v))
            if key not in edge_set:
                edge_set.add(key)
                edges.append((key[0], key[1], random.random() * 100))
    
    return edges
 
 
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True
 
 
def benchmark_kruskal(V: int, E: int, trials: int = 3) -> dict:
    """
    Benchmark Kruskal's algorithm on random graphs.
    Returns timing breakdown.
    """
    results = {
        'sort_time': 0,
        'union_find_time': 0,
        'total_time': 0
    }
    
    for _ in range(trials):
        edges = generate_random_graph(V, E)
        
        # Time sorting
        start = time.perf_counter()
        sorted_edges = sorted(edges, key=lambda e: e[2])
        sort_time = time.perf_counter() - start
        
        # Time Union-Find processing
        start = time.perf_counter()
        uf = UnionFind(V)
        mst_count = 0
        for u, v, w in sorted_edges:
            if uf.union(u, v):
                mst_count += 1
                if mst_count == V - 1:
                    break
        uf_time = time.perf_counter() - start
        
        results['sort_time'] += sort_time
        results['union_find_time'] += uf_time
        results['total_time'] += sort_time + uf_time
    
    # Average over trials
    for k in results:
        results[k] /= trials
    
    return results
 
 
def run_benchmarks():
    """
    Run benchmarks on various graph sizes.
    """
    print("═" * 70)
    print("KRUSKAL'S ALGORITHM BENCHMARK")
    print("═" * 70)
    print()
    
    test_cases = [
        # (V, E, description)
        (1000, 5000, "Sparse (E = 5V)"),
        (1000, 50000, "Medium (E = 50V)"),
        (1000, 200000, "Dense (E = 200V)"),
        (10000, 50000, "Large sparse"),
        (10000, 500000, "Large medium"),
        (10000, 2000000, "Large dense"),
    ]
    
    print(f"{'Graph':<20} {'Sort':>12} {'Union-Find':>12} {'Total':>12} {'Sort %':>8}")
    print("-" * 70)
    
    for V, E, desc in test_cases:
        result = benchmark_kruskal(V, E)
        sort_pct = 100 * result['sort_time'] / result['total_time']
        
        print(f"{desc:<20} "
              f"{result['sort_time']*1000:>10.2f}ms "
              f"{result['union_find_time']*1000:>10.2f}ms "
              f"{result['total_time']*1000:>10.2f}ms "
              f"{sort_pct:>7.1f}%")
    
    print()
    print("OBSERVATIONS:")
    print("-" * 70)
    print("• Sorting consistently takes 80-95% of total time")
    print("• Union-Find is nearly O(1) per operation in practice")
    print("• For large dense graphs, sorting dominates overwhelmingly")
 
 
if __name__ == "__main__":
    run_benchmarks()

Optimization Techniques

kruskal-optimizations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
from typing import List, Tuple
import heapq
from collections import defaultdict
 
Edge = Tuple[int, int, float]
 
 
# ═══════════════════════════════════════════════════════════════════
# OPTIMIZATION 1: Heap-based lazy extraction
# ═══════════════════════════════════════════════════════════════════
 
def kruskal_lazy_heap(
    num_vertices: int,
    edges: List[Edge]
) -> Tuple[List[Edge], float]:
    """
    Kruskal's with lazy edge extraction using heap.
    
    Time: O(E + k log E) where k = number of edges examined
    For dense graphs where k << E, this can be much faster!
    """
    # Build heap in O(E)
    heap = [(w, u, v) for u, v, w in edges]
    heapq.heapify(heap)
    
    uf = UnionFind(num_vertices)
    mst = []
    weight = 0.0
    
    while heap and len(mst) < num_vertices - 1:
        w, u, v = heapq.heappop(heap)  # O(log E) per extraction
        if uf.union(u, v):
            mst.append((u, v, w))
            weight += w
    
    return mst, weight
 
 
# ═══════════════════════════════════════════════════════════════════
# OPTIMIZATION 2: Integer weight counting sort
# ═══════════════════════════════════════════════════════════════════
 
def kruskal_counting_sort(
    num_vertices: int,
    edges: List[Edge],
    max_weight: int
) -> Tuple[List[Edge], float]:
    """
    Kruskal's with counting sort for integer weights.
    
    Time: O(E + max_weight) for sorting + O(E × α(V)) for processing
    When max_weight = O(E), total is O(E) - no log factor!
    """
    # Bucket edges by weight
    buckets = defaultdict(list)
    for u, v, w in edges:
        buckets[int(w)].append((u, v, w))
    
    uf = UnionFind(num_vertices)
    mst = []
    weight = 0.0
    
    # Process buckets in order (0 to max_weight)
    for w in range(max_weight + 1):
        for edge in buckets[w]:
            u, v, edge_w = edge
            if uf.union(u, v):
                mst.append(edge)
                weight += edge_w
                if len(mst) == num_vertices - 1:
                    return mst, weight
    
    return mst, weight
 
 
# ═══════════════════════════════════════════════════════════════════
# OPTIMIZATION 3: Filter-Kruskal for very dense graphs
# ═══════════════════════════════════════════════════════════════════
 
def kruskal_filter(
    num_vertices: int,
    edges: List[Edge],
    sample_size: int = None
) -> Tuple[List[Edge], float]:
    """
    Filter-Kruskal: filter edges before full sorting.
    
    Idea: Sample edges to estimate MST weight threshold.
    Edges much heavier than the threshold are unlikely to be in MST.
    
    Expected time: O(E + V log² V) for very dense graphs
    """
    import random
    
    if sample_size is None:
        sample_size = min(len(edges), num_vertices * 10)
    
    if len(edges) <= sample_size:
        # Not enough edges to benefit from filtering
        return kruskal_standard(num_vertices, edges)
    
    # Sample edges and find approximate MST
    sample = random.sample(edges, sample_size)
    sample_mst, sample_weight = kruskal_standard(num_vertices, sample)
    
    if len(sample_mst) < num_vertices - 1:
        # Sample wasn't connected, fall back to full algorithm
        return kruskal_standard(num_vertices, edges)
    
    # Threshold: 2× maximum edge weight in sample MST
    threshold = max(e[2] for e in sample_mst) * 2
    
    # Filter edges
    filtered = [e for e in edges if e[2] <= threshold]
    
    # Run Kruskal's on filtered edges
    result_mst, result_weight = kruskal_standard(num_vertices, filtered)
    
    if len(result_mst) < num_vertices - 1:
        # Filtering was too aggressive, use full edge set
        remaining = [e for e in edges if e[2] > threshold]
        all_edges = filtered + remaining
        return kruskal_standard(num_vertices, all_edges)
    
    return result_mst, result_weight
 
 
def kruskal_standard(num_vertices, edges):
    """Standard Kruskal's for comparison."""
    sorted_edges = sorted(edges, key=lambda e: e[2])
    uf = UnionFind(num_vertices)
    mst = []
    weight = 0.0
    
    for u, v, w in sorted_edges:
        if uf.union(u, v):
            mst.append((u, v, w))
            weight += w
            if len(mst) == num_vertices - 1:
                break
    
    return mst, weight
 
 
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

Optimization Techniques Summary
Technique	Best For	Time Improvement	Trade-off
Lazy heap	Dense graphs, E >> V	O(E + V log E) vs O(E log E)	Heap overhead per extraction
Counting sort	Integer weights, small range	O(E) vs O(E log E)	Memory for buckets: O(max_weight)
Early termination	All graphs	Up to 99% for complete graphs	None (always use this)
Filter-Kruskal	Very dense, E ≈ V²	O(E + V log² V) expected	Probabilistic, may need fallback
Parallel sort	Large E, multi-core	~P× speedup, P = cores	Thread overhead, diminishing returns

Complexity in Practice: Real-World Graphs

scaling-analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
═══════════════════════════════════════════════════════════════════════════
KRUSKAL'S ALGORITHM SCALING FOR REAL-WORLD GRAPHS
═══════════════════════════════════════════════════════════════════════════
 
ROAD NETWORK (e.g., US road system)
───────────────────────────────────────────────────────────────────────────
V = 24 million intersections
E = 58 million road segments (≈ 2.4V, very sparse)
 
Time breakdown:
  Sort 58M edges:  ~5-10 seconds
  Union-Find:      ~1 second
  Total:           ~10 seconds on modern hardware
 
Memory:
  Edge list:       ~1.5 GB (58M × 24 bytes)
  Union-Find:      ~400 MB (24M × 16 bytes)
 
 
SOCIAL NETWORK (e.g., Facebook-scale)
───────────────────────────────────────────────────────────────────────────
V = 3 billion users
E = 200 billion friendships (≈ 70V, medium density)
 
Time breakdown:
  Sort 200B edges: ~60-90 minutes (distributed sorting needed)
  Union-Find:      ~10 minutes
  Total:           ~1-2 hours (needs parallelization)
 
Challenge: Data doesn't fit in single machine memory
 
 
COMPLETE GRAPH (e.g., TSP distances)
───────────────────────────────────────────────────────────────────────────
V = 10,000 cities
E = 50 million edges (V² / 2)
 
Time breakdown:
  Sort 50M edges:  ~10 seconds
  Union-Find:      ~1 second
  Total:           ~11 seconds
 
Comparison with Prim's (adjacency matrix):
  Prim's:          ~1 second (O(V²) = O(100 million) ops)
  Kruskal's:       ~11 seconds
 
Winner: Prim's by 10× for dense graphs!
 
 
GRID GRAPH (e.g., 4K Image)
───────────────────────────────────────────────────────────────────────────
V = 3840 × 2160 = 8.3 million pixels
E = 16.5 million edges (each pixel to 2 neighbors on average)
 
Time breakdown:
  Sort 16.5M edges: ~1.5 seconds
  Union-Find:       ~0.5 seconds
  Total:            ~2 seconds
 
Very fast for typical image processing applications!
 
 
═══════════════════════════════════════════════════════════════════════════
SUMMARY: WHEN KRUSKAL'S EXCELS
═══════════════════════════════════════════════════════════════════════════
 
Kruskal's shines when:
  ✓ E = O(V) to O(V log V) - sparse graphs
  ✓ Edges already available as a list
  ✓ Algorithm simplicity valued over raw speed
  ✓ Graph may be disconnected (need MSF)
 
Consider alternatives when:
  ✗ E = Θ(V²) - complete or near-complete graphs
  ✗ Graph is too large for memory (streaming edges)
  ✗ Incremental/dynamic MST updates needed

Rule of Thumb

Summary: Mastering Kruskal's Complexity

We've completed our rigorous analysis of Kruskal's algorithm complexity. Let's consolidate the essential insights:

Key Takeaways

•O(E log E) = O(E log V) — Both expressions are equivalent for connected graphs and describe the same behavior.
•Sorting dominates — Edge sorting accounts for 80-95% of runtime; Union-Find is effectively O(1) per operation.
•Space: O(V + E) — Union-Find needs O(V), edge list needs O(E), but optimizations can reduce additional space to O(V).
•Kruskal's vs Prim's: both O(E log V) — But Kruskal excels for sparse graphs; Prim excels for dense graphs.
•Optimizations exist — Heap-based extraction, counting sort, filter-Kruskal can improve performance in specific scenarios.
•Practical excellence — For most real-world graphs (road networks, grids, communication networks), Kruskal's is simple and fast.

Kruskal's Algorithm Complete Reference
Property	Value	Notes
Time complexity	O(E log E)	Dominated by sorting
Space complexity	O(V + E)	Can be O(V) with in-place sort
Best case	O(E)	Pre-sorted edges or integer weights
Worst case	O(E log E)	Random edge weights
Sorting fraction	80-95%	Of total runtime
Union-Find per op	O(α(n)) ≈ O(1)	Practically constant
Sparse graph time	O(V log V)	When E = O(V)
Dense graph time	O(V² log V)	When E = O(V²)

Module Complete

4 / 4