Data Structures & AlgorithmsKruskal's Algorithm

Kruskal's Algorithm — Edge-Centric MST Construction

LevelAdvanced

Duration75 mins

TopicKruskal's Algorithm

2 / 4

Sorting Edges by Weight

The Preprocessing Step That Defines Complexity

In the previous page, we established that Kruskal's algorithm processes edges in order of increasing weight, adding each edge if it doesn't create a cycle. But this greedy approach hinges on a critical preprocessing step: sorting all edges by weight.\n\nThis sorting step isn't merely a detail—it's the dominant factor in Kruskal's algorithm's time complexity. Understanding why we sort, how we sort, and what alternatives exist is essential for mastering this algorithm and applying it effectively in practice.\n\nConsider this: if your graph has 1 million edges, sorting them requires approximately 20 million comparison operations (E log E). The subsequent processing—checking for cycles and merging components—takes only about 2-3 million operations with proper Union-Find. The sorting step does 80-90% of the work.

What You Will Learn

By the end of this page, you will understand why edge sorting is essential for Kruskal's greedy approach, analyze how different sorting algorithms affect overall performance, explore practical considerations for different graph representations, and discover alternative approaches for special cases.

Why Sorting is Non-Negotiable

The correctness of Kruskal's algorithm fundamentally depends on processing edges from lightest to heaviest. Let's understand why this ordering is non-negotiable.\n\nThe Greedy Choice Property\n\nKruskal's algorithm is a greedy algorithm—it makes locally optimal choices (selecting the cheapest available safe edge) that lead to a globally optimal solution (the minimum spanning tree). For this greedy approach to work correctly, we need a specific processing order.\n\nTheorem: In Kruskal's algorithm, processing edges in non-decreasing weight order ensures that:\n1. Every edge we add is guaranteed to be in some MST (by the Cut Property)\n2. Every edge we reject is guaranteed to not be in any MST (by the Cycle Property)\n\nProof Sketch:\n\nWhen we consider edge e = (u, v), one of two things is true:\n\nCase 1: u and v are in different components.\n- Consider the cut that separates u's component from the rest of the graph\n- Edge e crosses this cut\n- All lighter edges either don't cross this cut (they're within one component) or were already processed\n- Therefore, e is the minimum-weight edge crossing this cut\n- By the Cut Property, e belongs to some MST ✓\n\nCase 2: u and v are in the same component.\n- There exists a path P from u to v using edges already in our forest\n- Adding e would create a cycle C = P ∪ {e}\n- All edges in P were added earlier, so they all weigh ≤ weight(e)\n- Since e has maximum weight in cycle C, by the Cycle Property, e is not in any MST ✓

ordering-matters.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Consider this graph:
 
    A───────B
    │\     /│
  2 │ \ 1 / │ 3
    │  \ /  │
    │   X   │
    │  / \  │
  1 │ / 3 \ │ 2
    │/     \│
    C───────D
        1
 
Edges:
  AC: weight 2    AD: weight 1
  AB: weight 1    BC: weight 1
  CD: weight 1    BD: weight 3
 
True MST: AD(1) + AB(1) + BC(1) or AD(1) + AB(1) + CD(1)
Minimum weight: 3
 
═══════════════════════════════════════════════════════════════════
CORRECT: Processing in sorted order
═══════════════════════════════════════════════════════════════════
Sorted: AD(1), AB(1), BC(1), CD(1), AC(2), BD(3)
 
1. AD(1): A≠D, add. Components: {A,D}, {B}, {C}
2. AB(1): A≠B, add. Components: {A,B,D}, {C}
3. BC(1): B≠C, add. Components: {A,B,C,D}
4. CD(1): C=D (same component), REJECT
...
MST weight: 1+1+1 = 3 ✓ (optimal!)
 
═══════════════════════════════════════════════════════════════════
INCORRECT: Processing in arbitrary order (unsorted)
═══════════════════════════════════════════════════════════════════
Arbitrary order: AC(2), BD(3), AD(1), AB(1), BC(1), CD(1)
 
1. AC(2): A≠C, add. Components: {A,C}, {B}, {D}
2. BD(3): B≠D, add. Components: {A,C}, {B,D}
3. AD(1): A≠D, add. Components: {A,C,B,D}
4. AB(1): A=B (same component), REJECT
...
MST weight: 2+3+1 = 6 ✗ (NOT optimal!)
 
The greedy approach only works with sorted edges!

Critical Dependency on Order

The greedy approach of 'take the cheapest safe edge' only produces optimal results when we globally order edges by weight first. Processing edges in any other order—such as the order they're discovered during traversal or the order they appear in an input file—will likely produce suboptimal spanning trees.

Sorting Algorithm Analysis for Kruskal's

The choice of sorting algorithm directly impacts Kruskal's overall runtime. Let's analyze the options systematically.\n\nComparison-Based Sorting: The Standard Approach\n\nFor arbitrary edge weights (floating-point, very large integers, etc.), comparison-based sorting is the default. The lower bound for comparison-based sorting is Ω(E log E), and we have several algorithms achieving this:\n\nMerge Sort:\n- Time: O(E log E) worst-case\n- Space: O(E) additional\n- Stable: Yes\n- Pros: Guaranteed O(E log E), stable, good cache performance with proper implementation\n\nQuick Sort:\n- Time: O(E log E) average, O(E²) worst-case\n- Space: O(log E) stack space\n- Stable: No (standard version)\n- Pros: Excellent average performance, in-place, good cache locality\n\nHeap Sort:\n- Time: O(E log E) worst-case\n- Space: O(1) additional\n- Stable: No\n- Pros: In-place, guaranteed O(E log E), no extra memory\n\nIntrosort (Hybrid):\n- Time: O(E log E) worst-case\n- Space: O(log E)\n- Stable: No\n- Pros: Combines quicksort's average-case performance with heapsort's worst-case guarantee\n- This is what most standard library sort functions use (C++ std::sort, Python's Timsort variant)

Sorting Algorithms for Edge Sorting
Algorithm	Time Complexity	Space	Stable	Best For
Merge Sort	O(E log E)	O(E)	Yes	When stability matters, guaranteed performance
Quick Sort	O(E log E) avg	O(log E)	No	Average-case performance-critical applications
Heap Sort	O(E log E)	O(1)	No	Memory-constrained environments
Introsort	O(E log E)	O(log E)	No	Library implementations, general use
Tim Sort	O(E log E)	O(E)	Yes	Nearly-sorted data, Python/Java default
Radix Sort	O(E × k)	O(E)	Yes	Integer weights with bounded range
Counting Sort	O(E + W)	O(W)	Yes	Small weight range, dense distribution

Non-Comparison Sorting: When Weights Have Special Structure\n\nIf edge weights have special properties, we can beat O(E log E):\n\nCounting Sort (when weight range is small):\n\nIf all weights are integers in range [0, W], counting sort achieves O(E + W) time. For practical cases where W ≈ E or W is polynomial in E, this is O(E), making the overall Kruskal's algorithm dominated by the Union-Find operations rather than sorting!\n\nRadix Sort (when weights are integers with bounded digits):\n\nFor k-digit integers (or integers up to E^k for some constant k), radix sort achieves O(E × k) = O(E) time. Again, this shifts the complexity bottleneck.\n\nBucket Sort (when weights are uniformly distributed):\n\nIf weights are uniformly distributed in a known range, bucket sort can achieve O(E) expected time.

sorting-implementations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import heapq
from typing import List, Tuple
 
Edge = Tuple[int, int, float]  # (u, v, weight)
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 1: Standard Library Sort (Recommended for most cases)
# ═══════════════════════════════════════════════════════════════════
def sort_edges_standard(edges: List[Edge]) -> List[Edge]:
    """
    Uses Python's Timsort via sorted().
    Time: O(E log E)
    Space: O(E) for the new list
    """
    return sorted(edges, key=lambda e: e[2])
 
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 2: In-Place Sort (When memory is constrained)
# ═══════════════════════════════════════════════════════════════════
def sort_edges_inplace(edges: List[Edge]) -> None:
    """
    Sorts edges in-place using Python's sort().
    Time: O(E log E)
    Space: O(1) additional (modifies original list)
    """
    edges.sort(key=lambda e: e[2])
 
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 3: Counting Sort (When weights are small integers)
# ═══════════════════════════════════════════════════════════════════
def sort_edges_counting(edges: List[Edge], max_weight: int) -> List[Edge]:
    """
    Counting sort for integer weights in [0, max_weight].
    Time: O(E + max_weight)
    Space: O(E + max_weight)
    
    Use when max_weight is O(E) or smaller for O(E) sorting!
    """
    # Count occurrences of each weight
    counts = [0] * (max_weight + 1)
    for _, _, w in edges:
        counts[int(w)] += 1
    
    # Convert to cumulative counts (positions)
    for i in range(1, len(counts)):
        counts[i] += counts[i - 1]
    
    # Build sorted output
    output = [None] * len(edges)
    for edge in reversed(edges):  # Reverse for stability
        w = int(edge[2])
        counts[w] -= 1
        output[counts[w]] = edge
    
    return output
 
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 4: Partial Sort via Heap (When you might terminate early)
# ═══════════════════════════════════════════════════════════════════
def edges_by_weight_lazy(edges: List[Edge]):
    """
    Yields edges in sorted order using a min-heap.
    Only builds the heap O(E), then pops are O(log E) each.
    
    If you don't need all edges (graph is sparse or dense),
    this can be faster than full sorting!
    
    Building heap: O(E)
    Each pop: O(log E)
    Getting k edges: O(E + k log E)
    
    Beats O(E log E) when k << E.
    """
    # Build min-heap: O(E)
    heap = [(w, u, v) for u, v, w in edges]
    heapq.heapify(heap)
    
    # Yield in sorted order
    while heap:
        w, u, v = heapq.heappop(heap)
        yield (u, v, w)
 
 
# ═══════════════════════════════════════════════════════════════════
# DEMO: Comparing approaches
# ═══════════════════════════════════════════════════════════════════
import time
import random
 
def benchmark_sorting():
    """Compare sorting approaches on random edges."""
    # Generate random edges
    E = 100000
    edges = [(random.randint(0, 1000), 
              random.randint(0, 1000), 
              random.randint(1, 1000)) for _ in range(E)]
    
    # Approach 1: Standard sort
    start = time.time()
    sorted1 = sort_edges_standard(edges.copy())
    time1 = time.time() - start
    print(f"Standard sort: {time1:.4f}s")
    
    # Approach 2: Counting sort (weights are 1-1000)
    start = time.time()
    sorted2 = sort_edges_counting(edges.copy(), 1000)
    time2 = time.time() - start
    print(f"Counting sort: {time2:.4f}s")
    
    # Approach 3: Lazy heap (get first V-1 = ~200 edges)
    start = time.time()
    lazy_gen = edges_by_weight_lazy(edges.copy())
    sorted3 = [next(lazy_gen) for _ in range(200)]
    time3 = time.time() - start
    print(f"Lazy heap (200 edges): {time3:.4f}s")
    
    # Verify correctness
    assert sorted1[0][2] == sorted2[0][2] == sorted3[0][2]
    print("All approaches produce correct minimum!")

The O(E log E) Complexity Analysis

Kruskal's algorithm has time complexity O(E log E), which is often stated as "dominated by sorting." Let's rigorously analyze this claim and understand when it's accurate.\n\nComponent-by-Component Analysis:\n\nStep 1: Build Edge List\n- If edges are already in a list: O(1)\n- If extracting from adjacency list: O(E)\n- If extracting from adjacency matrix: O(V²)\n\nStep 2: Sort Edges\n- Comparison-based sorting: O(E log E)\n- This is the bottleneck for most practical scenarios\n\nStep 3: Initialize Union-Find\n- Create V singleton sets: O(V)\n\nStep 4: Process Sorted Edges\n- For each edge (up to E edges):\n - Find operations: O(α(V)) each with path compression\n - Union operations: O(α(V)) each with union by rank\n- Total: O(E × α(V)) where α is the inverse Ackermann function\n- Since α(V) ≤ 4 for all practical V (up to 10^80!), this is effectively O(E)\n\nTotal: O(E log E + E × α(V)) = O(E log E)

complexity-breakdown.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
═══════════════════════════════════════════════════════════════════
KRUSKAL'S ALGORITHM COMPLEXITY BREAKDOWN
═══════════════════════════════════════════════════════════════════
 
For a graph with V vertices and E edges:
 
┌─────────────────────────────────────────────────────────────────┐
│ STEP                      │ TIME COMPLEXITY                    │
├───────────────────────────┼─────────────────────────────────────┤
│ Extract edge list         │ O(E) or O(V²) from adj. matrix     │
│ Sort edges                │ O(E log E)                ← DOMINATES │
│ Initialize Union-Find     │ O(V)                               │
│ Process E edges           │ O(E × α(V)) ≈ O(E)                 │
│  - E × Find operations    │ O(E × α(V))                        │
│  - ≤ V-1 Union operations │ O(V × α(V))                        │
└───────────────────────────┴─────────────────────────────────────┘
 
Total: O(E log E)
 
═══════════════════════════════════════════════════════════════════
WHY SORTING DOMINATES
═══════════════════════════════════════════════════════════════════
 
For a graph with V = 10,000 vertices and E = 100,000 edges:
 
Sorting:      E log E = 100,000 × 17 = 1,700,000 comparisons
Union-Find:   E × α(V) ≈ 100,000 × 4 = 400,000 operations
 
Sorting does ~80% of the work!
 
For a dense graph with V = 1,000 and E = V² = 1,000,000:
 
Sorting:      E log E = 1,000,000 × 20 = 20,000,000 comparisons
Union-Find:   E × α(V) ≈ 1,000,000 × 3 = 3,000,000 operations
 
Sorting does ~87% of the work!
 
═══════════════════════════════════════════════════════════════════
WHEN SORTING DOESN'T DOMINATE
═══════════════════════════════════════════════════════════════════
 
1. Pre-sorted edges: Sorting is O(E) with good algorithms (Timsort)
2. Integer weights: Radix/counting sort gives O(E)
3. Lazy evaluation: Heap gives O(E + k log E) for k edges needed
 
In these cases, Union-Find becomes the bottleneck: O(E × α(V))

Why O(E log E) = O(E log V):\n\nYou'll sometimes see Kruskal's complexity stated as O(E log V) instead of O(E log E). These are equivalent because:\n\nFor a simple graph: E ≤ V(V-1)/2 = O(V²)\n\nTherefore: log E ≤ log V² = 2 log V = O(log V)\n\nAnd trivially: log E ≥ log V (since a connected graph needs at least V-1 edges)\n\nSo log E = Θ(log V), making O(E log E) = O(E log V).\n\nHowever, O(E log E) is more precise when the graph is very sparse (E ≈ V), as in a tree or chain.

Practical Implication

When optimizing Kruskal's algorithm for specific use cases, focus on the sorting step first. Using a faster sorting algorithm, exploiting structure in edge weights, or avoiding full sorting (via lazy extraction) can provide significant speedups.

Graph Representation and Edge Extraction

The way your graph is represented affects how efficiently you can extract edges for Kruskal's algorithm. Let's examine the common representations.\n\nEdge List Representation:\n\nThis is the ideal representation for Kruskal's algorithm. Edges are already collected—just sort and process.\n\n\nedges = [(0, 1, 4), (0, 2, 3), (1, 2, 1), ...]\n\n\n- Extraction time: O(1) — edges already in list form\n- Memory: O(E)\n- Ideal for Kruskal's\n\nAdjacency List Representation:\n\nCommon for general graph algorithms. We need to extract edges into a sortable list.\n\n\nadj = {\n 0: [(1, 4), (2, 3)],\n 1: [(0, 4), (2, 1)],\n ...\n}\n\n\n- Extraction time: O(E) — iterate through all adjacency lists\n- Care needed: For undirected graphs, each edge appears twice. Either extract once or deduplicate.\n\nAdjacency Matrix Representation:\n\nUsed for dense graphs. Least efficient for Kruskal's extraction.\n\n\nmatrix[i][j] = weight of edge (i, j) or INF if no edge\n\n\n- Extraction time: O(V²) — must scan entire matrix\n- For dense graphs where E = Θ(V²), this matches O(E)\n- For sparse graphs, this is wasteful

edge-extraction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
from typing import List, Dict, Tuple, Set
import math
 
Edge = Tuple[int, int, float]
 
# ═══════════════════════════════════════════════════════════════════
# FROM EDGE LIST
# ═══════════════════════════════════════════════════════════════════
def extract_from_edge_list(edge_list: List[Edge]) -> List[Edge]:
    """
    Trivial case: edges already in the format we need.
    Time: O(1) to reference, O(E) to copy
    """
    return edge_list.copy()  # Or just use directly if modification is safe
 
 
# ═══════════════════════════════════════════════════════════════════
# FROM ADJACENCY LIST (Undirected Graph)
# ═══════════════════════════════════════════════════════════════════
def extract_from_adjacency_list_undirected(
    adj: Dict[int, List[Tuple[int, float]]]
) -> List[Edge]:
    """
    Extract edges from adjacency list representation.
    For undirected graphs, each edge (u,v) appears as:
      - v in adj[u] with weight w
      - u in adj[v] with weight w
    
    We extract once by enforcing u < v.
    Time: O(E)
    """
    edges = []
    for u in adj:
        for v, weight in adj[u]:
            if u < v:  # Only take each edge once
                edges.append((u, v, weight))
    return edges
 
 
def extract_from_adjacency_list_directed(
    adj: Dict[int, List[Tuple[int, float]]]
) -> List[Edge]:
    """
    For directed graphs, every adjacency entry is a unique edge.
    Time: O(E)
    """
    edges = []
    for u in adj:
        for v, weight in adj[u]:
            edges.append((u, v, weight))
    return edges
 
 
# ═══════════════════════════════════════════════════════════════════
# FROM ADJACENCY MATRIX
# ═══════════════════════════════════════════════════════════════════
def extract_from_adjacency_matrix(
    matrix: List[List[float]], 
    V: int,
    undirected: bool = True
) -> List[Edge]:
    """
    Extract edges from V×V adjacency matrix.
    matrix[i][j] = weight of edge i→j, or math.inf / None if no edge.
    
    Time: O(V²) — must examine entire matrix
    
    For sparse graphs, this is inefficient!
    For dense graphs where E = Θ(V²), this is acceptable.
    """
    edges = []
    
    if undirected:
        # Only examine upper triangle to avoid duplicates
        for i in range(V):
            for j in range(i + 1, V):
                weight = matrix[i][j]
                if weight is not None and weight != math.inf:
                    edges.append((i, j, weight))
    else:
        # Examine all entries for directed graph
        for i in range(V):
            for j in range(V):
                if i != j:  # Exclude self-loops
                    weight = matrix[i][j]
                    if weight is not None and weight != math.inf:
                        edges.append((i, j, weight))
    
    return edges
 
 
# ═══════════════════════════════════════════════════════════════════
# DEMONSTRATION
# ═══════════════════════════════════════════════════════════════════
def demo():
    # Example graph:
    #     0 --4-- 1
    #     |\    /|
    #     3  2  1 5
    #     | /    \|
    #     2 --6-- 3
    
    # As edge list
    edge_list = [
        (0, 1, 4), (0, 2, 3), (0, 3, 2),
        (1, 2, 1), (1, 3, 5), (2, 3, 6)
    ]
    
    # As adjacency list
    adj_list = {
        0: [(1, 4), (2, 3), (3, 2)],
        1: [(0, 4), (2, 1), (3, 5)],
        2: [(0, 3), (1, 1), (3, 6)],
        3: [(0, 2), (1, 5), (2, 6)]
    }
    
    # As adjacency matrix
    INF = math.inf
    adj_matrix = [
        [0,   4,   3,   2  ],
        [4,   0,   1,   5  ],
        [3,   1,   0,   6  ],
        [2,   5,   6,   0  ]
    ]
    
    # Extract and verify
    from_list = extract_from_edge_list(edge_list)
    from_adj = extract_from_adjacency_list_undirected(adj_list)
    from_matrix = extract_from_adjacency_matrix(adj_matrix, 4)
    
    print(f"From edge list: {len(from_list)} edges")
    print(f"From adj list:  {len(from_adj)} edges")
    print(f"From adj matrix: {len(from_matrix)} edges")
    
    # All should have 6 edges
    assert len(from_list) == len(from_adj) == len(from_matrix) == 6

Choose Representation Wisely

If you know you'll be running Kruskal's algorithm, consider maintaining an edge list alongside your primary graph representation. The O(E) space overhead is usually worthwhile for O(1) extraction versus O(V²) matrix scanning.

Lazy Sorting with Heaps

An important optimization for Kruskal's algorithm is lazy sorting—not sorting the entire edge list upfront, but instead extracting edges in sorted order as needed using a min-heap.\n\nThe Key Insight:\n\nKruskal's algorithm typically adds V-1 edges to the MST. In a sparse graph where E ≈ V, we process almost all edges anyway. But in a denser graph where E >> V, we might only need the smallest 10-20% of edges before completing the MST. Full sorting does unnecessary work!\n\nHeap-Based Approach:\n\n1. Build a min-heap from all edges: O(E) time\n2. Extract minimum edge: O(log E) per extraction\n3. Total for k extractions: O(E + k log E)\n\nWhen Lazy Sorting Wins:\n\n- If we need only k edges (where k << E), lazy sorting takes O(E + k log E)\n- Full sorting takes O(E log E)\n- Lazy wins when: E + k log E < E log E\n- This simplifies to: k < E - E/log E ≈ E (always true for k < E)\n\nBut practically, the break-even point depends on constants. Lazy sorting has overhead from heap operations, while full sorting benefits from cache-efficient implementations.

lazy-kruskal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
import heapq
from typing import List, Tuple, Optional
 
Edge = Tuple[int, int, float]
 
class UnionFind:
    """Standard Union-Find with path compression and union by rank."""
    
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]
    
    def union(self, x: int, y: int) -> bool:
        """Returns True if x and y were in different components."""
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        
        # Union by rank
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        
        return True
 
 
def kruskal_lazy(vertices: int, edges: List[Edge]) -> Tuple[List[Edge], float]:
    """
    Kruskal's algorithm with lazy edge extraction via min-heap.
    
    Time Complexity:
      - Heapify: O(E)
      - At most E heap pops: O(E log E) worst case
      - But typically only O(V) edges needed: O(E + V log E)
    
    For dense graphs (E = Θ(V²)), this can be faster than full sorting!
    """
    # Build min-heap: (weight, u, v)
    # We rearrange to put weight first for heap ordering
    heap = [(w, u, v) for u, v, w in edges]
    heapq.heapify(heap)  # O(E)
    
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0.0
    edges_needed = vertices - 1
    edges_examined = 0
    
    while heap and len(mst_edges) < edges_needed:
        w, u, v = heapq.heappop(heap)  # O(log E)
        edges_examined += 1
        
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
    
    # Statistics
    if len(mst_edges) == edges_needed:
        print(f"MST found! Examined {edges_examined}/{len(edges)} edges "
              f"({100*edges_examined/len(edges):.1f}%)")
    else:
        print("Graph is disconnected - no spanning tree exists")
    
    return mst_edges, mst_weight
 
 
def kruskal_standard(vertices: int, edges: List[Edge]) -> Tuple[List[Edge], float]:
    """
    Standard Kruskal's with full sorting for comparison.
    """
    sorted_edges = sorted(edges, key=lambda e: e[2])  # O(E log E)
    
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0.0
    
    for u, v, w in sorted_edges:
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == vertices - 1:
                break
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# BENCHMARK: When does lazy sorting help?
# ═══════════════════════════════════════════════════════════════════
import time
import random
 
def benchmark_lazy_vs_standard():
    """Compare lazy heap approach vs full sorting."""
    
    # Dense graph: V vertices, many edges
    V = 1000
    E_dense = V * (V - 1) // 4  # ~25% of complete graph = 249,750 edges
    
    edges = []
    for i in range(V):
        for j in range(i + 1, V):
            if random.random() < 0.25:  # 25% edge density
                edges.append((i, j, random.random() * 100))
    
    print(f"Graph: {V} vertices, {len(edges)} edges")
    print(f"Expected edges in MST: {V - 1}")
    print()
    
    # Standard approach (full sort)
    edges_copy = edges.copy()
    start = time.time()
    mst1, weight1 = kruskal_standard(V, edges_copy)
    time_standard = time.time() - start
    print(f"Standard (full sort): {time_standard:.4f}s")
    
    # Lazy approach (heap)
    edges_copy = edges.copy()
    start = time.time()
    mst2, weight2 = kruskal_lazy(V, edges_copy)
    time_lazy = time.time() - start
    print(f"Lazy (heap):          {time_lazy:.4f}s")
    
    print()
    print(f"Speedup from lazy: {time_standard/time_lazy:.2f}x")
    print(f"MST weights match: {abs(weight1 - weight2) < 0.001}")
 
 
if __name__ == "__main__":
    benchmark_lazy_vs_standard()

When to Use Lazy Sorting

Use heap-based lazy extraction when: (1) the graph is dense (E >> V), so only a fraction of edges will be examined; (2) memory is tight and you can't afford to store a sorted copy; (3) edges arrive in a stream and you need to process them incrementally. Use full sorting when: the graph is sparse, sorting is highly optimized in your environment, or you need all edges in sorted order for other purposes.

Pre-Sorted and Special-Case Scenarios

In many practical applications, edges aren't arbitrary—they have structure that can be exploited to speed up or skip the sorting step entirely.\n\nScenario 1: Edges Already Sorted\n\nIf edges come from a source that naturally orders them by weight (e.g., a sorted database table, output from another algorithm), we can skip sorting entirely!\n\nAdaptive sorting algorithms like Timsort detect nearly-sorted input and run in O(E) time instead of O(E log E). Python's sorted() and .sort() use Timsort.\n\nScenario 2: Integer Weights with Small Range\n\nIf edge weights are integers in range [0, W], use counting sort for O(E + W) time.\n\nExample applications:\n- Road networks with distance in discrete units (miles, km)\n- Network latency bucketed into categories\n- Grid graphs with unit weights\n\nScenario 3: Weights from a Known Distribution\n\nIf weights are uniformly distributed in [0, 1], bucket sort can achieve O(E) expected time.\n\nScenario 4: Geometric Graphs\n\nIn Euclidean MST problems (MST of points in the plane), spatial data structures can help generate edges in approximately sorted order, or prune the edge set significantly before sorting.

kruskal-special-cases.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
from typing import List, Tuple
from collections import defaultdict
 
Edge = Tuple[int, int, float]
 
# ═══════════════════════════════════════════════════════════════════
# CASE 1: Integer weights with small range - use Counting Sort
# ═══════════════════════════════════════════════════════════════════
def kruskal_integer_weights(
    vertices: int, 
    edges: List[Edge], 
    max_weight: int
) -> Tuple[List[Edge], int]:
    """
    Optimized Kruskal's for integer weights in [0, max_weight].
    
    Instead of O(E log E) comparison sort, uses O(E + W) counting sort.
    When W = O(E), this makes sorting O(E), and overall algorithm is
    dominated by Union-Find: O(E × α(V)) ≈ O(E).
    
    Args:
        vertices: Number of vertices
        edges: List of (u, v, weight) with integer weights
        max_weight: Maximum possible weight
    """
    # Counting sort: bucket edges by weight
    buckets = defaultdict(list)
    for u, v, w in edges:
        buckets[int(w)].append((u, v, w))
    
    # Process edges in sorted order (iterate through buckets)
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0
    
    for weight in range(max_weight + 1):
        if weight in buckets:
            for u, v, w in buckets[weight]:
                if uf.union(u, v):
                    mst_edges.append((u, v, w))
                    mst_weight += w
                    if len(mst_edges) == vertices - 1:
                        return mst_edges, mst_weight
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# CASE 2: Pre-sorted edges - skip sorting entirely
# ═══════════════════════════════════════════════════════════════════
def kruskal_presorted(
    vertices: int, 
    sorted_edges: List[Edge]
) -> Tuple[List[Edge], float]:
    """
    Kruskal's when edges are already sorted by weight.
    
    Time: O(E × α(V)) ≈ O(E)
    
    The sorting bottleneck is eliminated!
    """
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0.0
    
    for u, v, w in sorted_edges:  # Already in sorted order
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == vertices - 1:
                break
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# CASE 3: Grid graph - unit weights
# ═══════════════════════════════════════════════════════════════════
def kruskal_grid_unit_weights(rows: int, cols: int) -> int:
    """
    MST of a grid graph where all edges have weight 1.
    
    Since all weights are equal, no sorting needed!
    Just find any spanning tree.
    
    For an r×c grid:
    - Vertices: r × c
    - MST needs: r × c - 1 edges
    - All edges have weight 1
    - MST weight: r × c - 1
    
    Time: O(r × c)
    """
    vertices = rows * cols
    # Any spanning tree works! Just traverse all cells.
    # Each cell except the first adds exactly 1 edge.
    return vertices - 1  # MST weight
 
 
def kruskal_grid_random_weights(
    rows: int, 
    cols: int, 
    horizontal_weights: List[List[float]],
    vertical_weights: List[List[float]]
) -> Tuple[List[Edge], float]:
    """
    MST of a grid graph with arbitrary edge weights.
    
    Grid structure:
    - Horizontal edges: (i,j) -- (i,j+1) for j in 0..cols-2
    - Vertical edges: (i,j) -- (i+1,j) for i in 0..rows-2
    
    horizontal_weights[i][j] = weight of edge (i,j)--(i,j+1)
    vertical_weights[i][j] = weight of edge (i,j)--(i+1,j)
    """
    def vertex_id(r: int, c: int) -> int:
        return r * cols + c
    
    edges = []
    
    # Collect horizontal edges
    for i in range(rows):
        for j in range(cols - 1):
            u = vertex_id(i, j)
            v = vertex_id(i, j + 1)
            w = horizontal_weights[i][j]
            edges.append((u, v, w))
    
    # Collect vertical edges
    for i in range(rows - 1):
        for j in range(cols):
            u = vertex_id(i, j)
            v = vertex_id(i + 1, j)
            w = vertical_weights[i][j]
            edges.append((u, v, w))
    
    # Standard Kruskal's
    sorted_edges = sorted(edges, key=lambda e: e[2])
    uf = UnionFind(rows * cols)
    mst_edges = []
    mst_weight = 0.0
    
    for u, v, w in sorted_edges:
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == rows * cols - 1:
                break
    
    return mst_edges, mst_weight
 
 
# Union-Find implementation
class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x: int, y: int) -> bool:
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

Recognizing Special Cases

Before running Kruskal's algorithm, always ask: Do I know anything about the edge weights? Are they pre-sorted? Are they integers in a small range? Are they all equal? Do they come from a geometric source? Exploiting structure can reduce complexity from O(E log E) to O(E) or better.

Practical Implementation Guidelines

Based on our analysis, here are practical guidelines for implementing the sorting phase effectively.\n\nGuideline 1: Use Standard Library Sorting\n\nUnless you have a specific reason to optimize, use your language's built-in sort. It's heavily optimized, handles edge cases, and is well-tested.\n\npython\n# Python\nedges.sort(key=lambda e: e[2]) # In-place\nsorted_edges = sorted(edges, key=lambda e: e[2]) # New list\n\n\njavascript\n// JavaScript\nedges.sort((a, b) => a[2] - b[2]);\n\n\ncpp\n// C++\nstd::sort(edges.begin(), edges.end(), \n [](const Edge& a, const Edge& b) { return a.weight < b.weight; });\n\n\nGuideline 2: Early Termination\n\nAlways check if you've found V-1 edges and terminate early. In dense graphs, this saves significant work.

Implementation Checklist

•Use standard library sort — It's already O(E log E) with excellent constant factors
•Sort by weight only — Don't include vertex IDs in the sort key unless needed for tie-breaking
•In-place sort when possible — Saves O(E) memory allocation
•Early termination — Stop after adding V-1 edges
•Consider lazy extraction — Use heap for very dense graphs
•Exploit weight structure — Use counting/radix sort for integer weights
•Verify graph connectivity — Check if MST was found (V-1 edges added)

Sorting Strategy Decision Matrix
Graph Type	Edge Weights	Recommended Strategy
Sparse (E ≈ V)	Any	Standard library sort
Medium (E ≈ V log V)	Any	Standard library sort
Dense (E ≈ V²)	Float/arbitrary	Lazy heap extraction
Dense (E ≈ V²)	Small integers	Counting sort + iteration
Any	Pre-sorted	No sorting needed
Any	All equal	No sorting needed (any spanning tree)
Geometric (Euclidean)	Distances	Spatial data structures + filtering

Summary: Mastering the Sorting Phase

We've thoroughly analyzed the sorting phase of Kruskal's algorithm—the step that typically determines overall performance. Let's consolidate the key insights:

Key Takeaways

•Sorting is essential — The greedy approach only works with edges processed in sorted order. Arbitrary processing produces suboptimal results.
•O(E log E) dominates — For comparison-based sorting, this step accounts for 80-90% of Kruskal's runtime.
•Standard library sorts are excellent — Use them unless you have specific knowledge to exploit.
•Lazy extraction can help — For dense graphs, a min-heap avoids sorting edges you'll never examine.
•Special cases abound — Pre-sorted edges, integer weights, equal weights—all can be exploited for faster performance.
•Graph representation matters — Edge lists are ideal; adjacency matrices require O(V²) extraction overhead.

What's Next:\n\nWith sorted edges in hand, we need a way to determine whether adding an edge would create a cycle. The next page explores the Union-Find data structure—the elegant solution that makes Kruskal's cycle detection nearly constant time per operation.

Page Complete

You now understand the critical role of edge sorting in Kruskal's algorithm. From standard comparison-based sorts to exploiting special structure in edge weights, you have the knowledge to implement and optimize this phase effectively.

2 / 4

Loading learning content...

Data Structures & AlgorithmsKruskal's Algorithm

Kruskal's Algorithm — Edge-Centric MST Construction

LevelAdvanced

Duration75 mins

TopicKruskal's Algorithm

2 / 4

Sorting Edges by Weight

The Preprocessing Step That Defines Complexity

What You Will Learn

Why Sorting is Non-Negotiable

ordering-matters.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Consider this graph:
 
    A───────B
    │\     /│
  2 │ \ 1 / │ 3
    │  \ /  │
    │   X   │
    │  / \  │
  1 │ / 3 \ │ 2
    │/     \│
    C───────D
        1
 
Edges:
  AC: weight 2    AD: weight 1
  AB: weight 1    BC: weight 1
  CD: weight 1    BD: weight 3
 
True MST: AD(1) + AB(1) + BC(1) or AD(1) + AB(1) + CD(1)
Minimum weight: 3
 
═══════════════════════════════════════════════════════════════════
CORRECT: Processing in sorted order
═══════════════════════════════════════════════════════════════════
Sorted: AD(1), AB(1), BC(1), CD(1), AC(2), BD(3)
 
1. AD(1): A≠D, add. Components: {A,D}, {B}, {C}
2. AB(1): A≠B, add. Components: {A,B,D}, {C}
3. BC(1): B≠C, add. Components: {A,B,C,D}
4. CD(1): C=D (same component), REJECT
...
MST weight: 1+1+1 = 3 ✓ (optimal!)
 
═══════════════════════════════════════════════════════════════════
INCORRECT: Processing in arbitrary order (unsorted)
═══════════════════════════════════════════════════════════════════
Arbitrary order: AC(2), BD(3), AD(1), AB(1), BC(1), CD(1)
 
1. AC(2): A≠C, add. Components: {A,C}, {B}, {D}
2. BD(3): B≠D, add. Components: {A,C}, {B,D}
3. AD(1): A≠D, add. Components: {A,C,B,D}
4. AB(1): A=B (same component), REJECT
...
MST weight: 2+3+1 = 6 ✗ (NOT optimal!)
 
The greedy approach only works with sorted edges!

Critical Dependency on Order

Sorting Algorithm Analysis for Kruskal's

Sorting Algorithms for Edge Sorting
Algorithm	Time Complexity	Space	Stable	Best For
Merge Sort	O(E log E)	O(E)	Yes	When stability matters, guaranteed performance
Quick Sort	O(E log E) avg	O(log E)	No	Average-case performance-critical applications
Heap Sort	O(E log E)	O(1)	No	Memory-constrained environments
Introsort	O(E log E)	O(log E)	No	Library implementations, general use
Tim Sort	O(E log E)	O(E)	Yes	Nearly-sorted data, Python/Java default
Radix Sort	O(E × k)	O(E)	Yes	Integer weights with bounded range
Counting Sort	O(E + W)	O(W)	Yes	Small weight range, dense distribution

sorting-implementations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import heapq
from typing import List, Tuple
 
Edge = Tuple[int, int, float]  # (u, v, weight)
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 1: Standard Library Sort (Recommended for most cases)
# ═══════════════════════════════════════════════════════════════════
def sort_edges_standard(edges: List[Edge]) -> List[Edge]:
    """
    Uses Python's Timsort via sorted().
    Time: O(E log E)
    Space: O(E) for the new list
    """
    return sorted(edges, key=lambda e: e[2])
 
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 2: In-Place Sort (When memory is constrained)
# ═══════════════════════════════════════════════════════════════════
def sort_edges_inplace(edges: List[Edge]) -> None:
    """
    Sorts edges in-place using Python's sort().
    Time: O(E log E)
    Space: O(1) additional (modifies original list)
    """
    edges.sort(key=lambda e: e[2])
 
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 3: Counting Sort (When weights are small integers)
# ═══════════════════════════════════════════════════════════════════
def sort_edges_counting(edges: List[Edge], max_weight: int) -> List[Edge]:
    """
    Counting sort for integer weights in [0, max_weight].
    Time: O(E + max_weight)
    Space: O(E + max_weight)
    
    Use when max_weight is O(E) or smaller for O(E) sorting!
    """
    # Count occurrences of each weight
    counts = [0] * (max_weight + 1)
    for _, _, w in edges:
        counts[int(w)] += 1
    
    # Convert to cumulative counts (positions)
    for i in range(1, len(counts)):
        counts[i] += counts[i - 1]
    
    # Build sorted output
    output = [None] * len(edges)
    for edge in reversed(edges):  # Reverse for stability
        w = int(edge[2])
        counts[w] -= 1
        output[counts[w]] = edge
    
    return output
 
 
# ═══════════════════════════════════════════════════════════════════
# APPROACH 4: Partial Sort via Heap (When you might terminate early)
# ═══════════════════════════════════════════════════════════════════
def edges_by_weight_lazy(edges: List[Edge]):
    """
    Yields edges in sorted order using a min-heap.
    Only builds the heap O(E), then pops are O(log E) each.
    
    If you don't need all edges (graph is sparse or dense),
    this can be faster than full sorting!
    
    Building heap: O(E)
    Each pop: O(log E)
    Getting k edges: O(E + k log E)
    
    Beats O(E log E) when k << E.
    """
    # Build min-heap: O(E)
    heap = [(w, u, v) for u, v, w in edges]
    heapq.heapify(heap)
    
    # Yield in sorted order
    while heap:
        w, u, v = heapq.heappop(heap)
        yield (u, v, w)
 
 
# ═══════════════════════════════════════════════════════════════════
# DEMO: Comparing approaches
# ═══════════════════════════════════════════════════════════════════
import time
import random
 
def benchmark_sorting():
    """Compare sorting approaches on random edges."""
    # Generate random edges
    E = 100000
    edges = [(random.randint(0, 1000), 
              random.randint(0, 1000), 
              random.randint(1, 1000)) for _ in range(E)]
    
    # Approach 1: Standard sort
    start = time.time()
    sorted1 = sort_edges_standard(edges.copy())
    time1 = time.time() - start
    print(f"Standard sort: {time1:.4f}s")
    
    # Approach 2: Counting sort (weights are 1-1000)
    start = time.time()
    sorted2 = sort_edges_counting(edges.copy(), 1000)
    time2 = time.time() - start
    print(f"Counting sort: {time2:.4f}s")
    
    # Approach 3: Lazy heap (get first V-1 = ~200 edges)
    start = time.time()
    lazy_gen = edges_by_weight_lazy(edges.copy())
    sorted3 = [next(lazy_gen) for _ in range(200)]
    time3 = time.time() - start
    print(f"Lazy heap (200 edges): {time3:.4f}s")
    
    # Verify correctness
    assert sorted1[0][2] == sorted2[0][2] == sorted3[0][2]
    print("All approaches produce correct minimum!")

The O(E log E) Complexity Analysis

complexity-breakdown.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
═══════════════════════════════════════════════════════════════════
KRUSKAL'S ALGORITHM COMPLEXITY BREAKDOWN
═══════════════════════════════════════════════════════════════════
 
For a graph with V vertices and E edges:
 
┌─────────────────────────────────────────────────────────────────┐
│ STEP                      │ TIME COMPLEXITY                    │
├───────────────────────────┼─────────────────────────────────────┤
│ Extract edge list         │ O(E) or O(V²) from adj. matrix     │
│ Sort edges                │ O(E log E)                ← DOMINATES │
│ Initialize Union-Find     │ O(V)                               │
│ Process E edges           │ O(E × α(V)) ≈ O(E)                 │
│  - E × Find operations    │ O(E × α(V))                        │
│  - ≤ V-1 Union operations │ O(V × α(V))                        │
└───────────────────────────┴─────────────────────────────────────┘
 
Total: O(E log E)
 
═══════════════════════════════════════════════════════════════════
WHY SORTING DOMINATES
═══════════════════════════════════════════════════════════════════
 
For a graph with V = 10,000 vertices and E = 100,000 edges:
 
Sorting:      E log E = 100,000 × 17 = 1,700,000 comparisons
Union-Find:   E × α(V) ≈ 100,000 × 4 = 400,000 operations
 
Sorting does ~80% of the work!
 
For a dense graph with V = 1,000 and E = V² = 1,000,000:
 
Sorting:      E log E = 1,000,000 × 20 = 20,000,000 comparisons
Union-Find:   E × α(V) ≈ 1,000,000 × 3 = 3,000,000 operations
 
Sorting does ~87% of the work!
 
═══════════════════════════════════════════════════════════════════
WHEN SORTING DOESN'T DOMINATE
═══════════════════════════════════════════════════════════════════
 
1. Pre-sorted edges: Sorting is O(E) with good algorithms (Timsort)
2. Integer weights: Radix/counting sort gives O(E)
3. Lazy evaluation: Heap gives O(E + k log E) for k edges needed
 
In these cases, Union-Find becomes the bottleneck: O(E × α(V))

Practical Implication

Graph Representation and Edge Extraction

edge-extraction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
from typing import List, Dict, Tuple, Set
import math
 
Edge = Tuple[int, int, float]
 
# ═══════════════════════════════════════════════════════════════════
# FROM EDGE LIST
# ═══════════════════════════════════════════════════════════════════
def extract_from_edge_list(edge_list: List[Edge]) -> List[Edge]:
    """
    Trivial case: edges already in the format we need.
    Time: O(1) to reference, O(E) to copy
    """
    return edge_list.copy()  # Or just use directly if modification is safe
 
 
# ═══════════════════════════════════════════════════════════════════
# FROM ADJACENCY LIST (Undirected Graph)
# ═══════════════════════════════════════════════════════════════════
def extract_from_adjacency_list_undirected(
    adj: Dict[int, List[Tuple[int, float]]]
) -> List[Edge]:
    """
    Extract edges from adjacency list representation.
    For undirected graphs, each edge (u,v) appears as:
      - v in adj[u] with weight w
      - u in adj[v] with weight w
    
    We extract once by enforcing u < v.
    Time: O(E)
    """
    edges = []
    for u in adj:
        for v, weight in adj[u]:
            if u < v:  # Only take each edge once
                edges.append((u, v, weight))
    return edges
 
 
def extract_from_adjacency_list_directed(
    adj: Dict[int, List[Tuple[int, float]]]
) -> List[Edge]:
    """
    For directed graphs, every adjacency entry is a unique edge.
    Time: O(E)
    """
    edges = []
    for u in adj:
        for v, weight in adj[u]:
            edges.append((u, v, weight))
    return edges
 
 
# ═══════════════════════════════════════════════════════════════════
# FROM ADJACENCY MATRIX
# ═══════════════════════════════════════════════════════════════════
def extract_from_adjacency_matrix(
    matrix: List[List[float]], 
    V: int,
    undirected: bool = True
) -> List[Edge]:
    """
    Extract edges from V×V adjacency matrix.
    matrix[i][j] = weight of edge i→j, or math.inf / None if no edge.
    
    Time: O(V²) — must examine entire matrix
    
    For sparse graphs, this is inefficient!
    For dense graphs where E = Θ(V²), this is acceptable.
    """
    edges = []
    
    if undirected:
        # Only examine upper triangle to avoid duplicates
        for i in range(V):
            for j in range(i + 1, V):
                weight = matrix[i][j]
                if weight is not None and weight != math.inf:
                    edges.append((i, j, weight))
    else:
        # Examine all entries for directed graph
        for i in range(V):
            for j in range(V):
                if i != j:  # Exclude self-loops
                    weight = matrix[i][j]
                    if weight is not None and weight != math.inf:
                        edges.append((i, j, weight))
    
    return edges
 
 
# ═══════════════════════════════════════════════════════════════════
# DEMONSTRATION
# ═══════════════════════════════════════════════════════════════════
def demo():
    # Example graph:
    #     0 --4-- 1
    #     |\    /|
    #     3  2  1 5
    #     | /    \|
    #     2 --6-- 3
    
    # As edge list
    edge_list = [
        (0, 1, 4), (0, 2, 3), (0, 3, 2),
        (1, 2, 1), (1, 3, 5), (2, 3, 6)
    ]
    
    # As adjacency list
    adj_list = {
        0: [(1, 4), (2, 3), (3, 2)],
        1: [(0, 4), (2, 1), (3, 5)],
        2: [(0, 3), (1, 1), (3, 6)],
        3: [(0, 2), (1, 5), (2, 6)]
    }
    
    # As adjacency matrix
    INF = math.inf
    adj_matrix = [
        [0,   4,   3,   2  ],
        [4,   0,   1,   5  ],
        [3,   1,   0,   6  ],
        [2,   5,   6,   0  ]
    ]
    
    # Extract and verify
    from_list = extract_from_edge_list(edge_list)
    from_adj = extract_from_adjacency_list_undirected(adj_list)
    from_matrix = extract_from_adjacency_matrix(adj_matrix, 4)
    
    print(f"From edge list: {len(from_list)} edges")
    print(f"From adj list:  {len(from_adj)} edges")
    print(f"From adj matrix: {len(from_matrix)} edges")
    
    # All should have 6 edges
    assert len(from_list) == len(from_adj) == len(from_matrix) == 6

Choose Representation Wisely

Lazy Sorting with Heaps

lazy-kruskal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
import heapq
from typing import List, Tuple, Optional
 
Edge = Tuple[int, int, float]
 
class UnionFind:
    """Standard Union-Find with path compression and union by rank."""
    
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]
    
    def union(self, x: int, y: int) -> bool:
        """Returns True if x and y were in different components."""
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        
        # Union by rank
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        
        return True
 
 
def kruskal_lazy(vertices: int, edges: List[Edge]) -> Tuple[List[Edge], float]:
    """
    Kruskal's algorithm with lazy edge extraction via min-heap.
    
    Time Complexity:
      - Heapify: O(E)
      - At most E heap pops: O(E log E) worst case
      - But typically only O(V) edges needed: O(E + V log E)
    
    For dense graphs (E = Θ(V²)), this can be faster than full sorting!
    """
    # Build min-heap: (weight, u, v)
    # We rearrange to put weight first for heap ordering
    heap = [(w, u, v) for u, v, w in edges]
    heapq.heapify(heap)  # O(E)
    
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0.0
    edges_needed = vertices - 1
    edges_examined = 0
    
    while heap and len(mst_edges) < edges_needed:
        w, u, v = heapq.heappop(heap)  # O(log E)
        edges_examined += 1
        
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
    
    # Statistics
    if len(mst_edges) == edges_needed:
        print(f"MST found! Examined {edges_examined}/{len(edges)} edges "
              f"({100*edges_examined/len(edges):.1f}%)")
    else:
        print("Graph is disconnected - no spanning tree exists")
    
    return mst_edges, mst_weight
 
 
def kruskal_standard(vertices: int, edges: List[Edge]) -> Tuple[List[Edge], float]:
    """
    Standard Kruskal's with full sorting for comparison.
    """
    sorted_edges = sorted(edges, key=lambda e: e[2])  # O(E log E)
    
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0.0
    
    for u, v, w in sorted_edges:
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == vertices - 1:
                break
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# BENCHMARK: When does lazy sorting help?
# ═══════════════════════════════════════════════════════════════════
import time
import random
 
def benchmark_lazy_vs_standard():
    """Compare lazy heap approach vs full sorting."""
    
    # Dense graph: V vertices, many edges
    V = 1000
    E_dense = V * (V - 1) // 4  # ~25% of complete graph = 249,750 edges
    
    edges = []
    for i in range(V):
        for j in range(i + 1, V):
            if random.random() < 0.25:  # 25% edge density
                edges.append((i, j, random.random() * 100))
    
    print(f"Graph: {V} vertices, {len(edges)} edges")
    print(f"Expected edges in MST: {V - 1}")
    print()
    
    # Standard approach (full sort)
    edges_copy = edges.copy()
    start = time.time()
    mst1, weight1 = kruskal_standard(V, edges_copy)
    time_standard = time.time() - start
    print(f"Standard (full sort): {time_standard:.4f}s")
    
    # Lazy approach (heap)
    edges_copy = edges.copy()
    start = time.time()
    mst2, weight2 = kruskal_lazy(V, edges_copy)
    time_lazy = time.time() - start
    print(f"Lazy (heap):          {time_lazy:.4f}s")
    
    print()
    print(f"Speedup from lazy: {time_standard/time_lazy:.2f}x")
    print(f"MST weights match: {abs(weight1 - weight2) < 0.001}")
 
 
if __name__ == "__main__":
    benchmark_lazy_vs_standard()

When to Use Lazy Sorting

Pre-Sorted and Special-Case Scenarios

kruskal-special-cases.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
from typing import List, Tuple
from collections import defaultdict
 
Edge = Tuple[int, int, float]
 
# ═══════════════════════════════════════════════════════════════════
# CASE 1: Integer weights with small range - use Counting Sort
# ═══════════════════════════════════════════════════════════════════
def kruskal_integer_weights(
    vertices: int, 
    edges: List[Edge], 
    max_weight: int
) -> Tuple[List[Edge], int]:
    """
    Optimized Kruskal's for integer weights in [0, max_weight].
    
    Instead of O(E log E) comparison sort, uses O(E + W) counting sort.
    When W = O(E), this makes sorting O(E), and overall algorithm is
    dominated by Union-Find: O(E × α(V)) ≈ O(E).
    
    Args:
        vertices: Number of vertices
        edges: List of (u, v, weight) with integer weights
        max_weight: Maximum possible weight
    """
    # Counting sort: bucket edges by weight
    buckets = defaultdict(list)
    for u, v, w in edges:
        buckets[int(w)].append((u, v, w))
    
    # Process edges in sorted order (iterate through buckets)
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0
    
    for weight in range(max_weight + 1):
        if weight in buckets:
            for u, v, w in buckets[weight]:
                if uf.union(u, v):
                    mst_edges.append((u, v, w))
                    mst_weight += w
                    if len(mst_edges) == vertices - 1:
                        return mst_edges, mst_weight
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# CASE 2: Pre-sorted edges - skip sorting entirely
# ═══════════════════════════════════════════════════════════════════
def kruskal_presorted(
    vertices: int, 
    sorted_edges: List[Edge]
) -> Tuple[List[Edge], float]:
    """
    Kruskal's when edges are already sorted by weight.
    
    Time: O(E × α(V)) ≈ O(E)
    
    The sorting bottleneck is eliminated!
    """
    uf = UnionFind(vertices)
    mst_edges = []
    mst_weight = 0.0
    
    for u, v, w in sorted_edges:  # Already in sorted order
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == vertices - 1:
                break
    
    return mst_edges, mst_weight
 
 
# ═══════════════════════════════════════════════════════════════════
# CASE 3: Grid graph - unit weights
# ═══════════════════════════════════════════════════════════════════
def kruskal_grid_unit_weights(rows: int, cols: int) -> int:
    """
    MST of a grid graph where all edges have weight 1.
    
    Since all weights are equal, no sorting needed!
    Just find any spanning tree.
    
    For an r×c grid:
    - Vertices: r × c
    - MST needs: r × c - 1 edges
    - All edges have weight 1
    - MST weight: r × c - 1
    
    Time: O(r × c)
    """
    vertices = rows * cols
    # Any spanning tree works! Just traverse all cells.
    # Each cell except the first adds exactly 1 edge.
    return vertices - 1  # MST weight
 
 
def kruskal_grid_random_weights(
    rows: int, 
    cols: int, 
    horizontal_weights: List[List[float]],
    vertical_weights: List[List[float]]
) -> Tuple[List[Edge], float]:
    """
    MST of a grid graph with arbitrary edge weights.
    
    Grid structure:
    - Horizontal edges: (i,j) -- (i,j+1) for j in 0..cols-2
    - Vertical edges: (i,j) -- (i+1,j) for i in 0..rows-2
    
    horizontal_weights[i][j] = weight of edge (i,j)--(i,j+1)
    vertical_weights[i][j] = weight of edge (i,j)--(i+1,j)
    """
    def vertex_id(r: int, c: int) -> int:
        return r * cols + c
    
    edges = []
    
    # Collect horizontal edges
    for i in range(rows):
        for j in range(cols - 1):
            u = vertex_id(i, j)
            v = vertex_id(i, j + 1)
            w = horizontal_weights[i][j]
            edges.append((u, v, w))
    
    # Collect vertical edges
    for i in range(rows - 1):
        for j in range(cols):
            u = vertex_id(i, j)
            v = vertex_id(i + 1, j)
            w = vertical_weights[i][j]
            edges.append((u, v, w))
    
    # Standard Kruskal's
    sorted_edges = sorted(edges, key=lambda e: e[2])
    uf = UnionFind(rows * cols)
    mst_edges = []
    mst_weight = 0.0
    
    for u, v, w in sorted_edges:
        if uf.union(u, v):
            mst_edges.append((u, v, w))
            mst_weight += w
            if len(mst_edges) == rows * cols - 1:
                break
    
    return mst_edges, mst_weight
 
 
# Union-Find implementation
class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n
    
    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x: int, y: int) -> bool:
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

Recognizing Special Cases

Practical Implementation Guidelines

Implementation Checklist

•Use standard library sort — It's already O(E log E) with excellent constant factors
•Sort by weight only — Don't include vertex IDs in the sort key unless needed for tie-breaking
•In-place sort when possible — Saves O(E) memory allocation
•Early termination — Stop after adding V-1 edges
•Consider lazy extraction — Use heap for very dense graphs
•Exploit weight structure — Use counting/radix sort for integer weights
•Verify graph connectivity — Check if MST was found (V-1 edges added)

Sorting Strategy Decision Matrix
Graph Type	Edge Weights	Recommended Strategy
Sparse (E ≈ V)	Any	Standard library sort
Medium (E ≈ V log V)	Any	Standard library sort
Dense (E ≈ V²)	Float/arbitrary	Lazy heap extraction
Dense (E ≈ V²)	Small integers	Counting sort + iteration
Any	Pre-sorted	No sorting needed
Any	All equal	No sorting needed (any spanning tree)
Geometric (Euclidean)	Distances	Spatial data structures + filtering

Summary: Mastering the Sorting Phase

We've thoroughly analyzed the sorting phase of Kruskal's algorithm—the step that typically determines overall performance. Let's consolidate the key insights:

Key Takeaways

•Sorting is essential — The greedy approach only works with edges processed in sorted order. Arbitrary processing produces suboptimal results.
•O(E log E) dominates — For comparison-based sorting, this step accounts for 80-90% of Kruskal's runtime.
•Standard library sorts are excellent — Use them unless you have specific knowledge to exploit.
•Lazy extraction can help — For dense graphs, a min-heap avoids sorting edges you'll never examine.
•Special cases abound — Pre-sorted edges, integer weights, equal weights—all can be exploited for faster performance.
•Graph representation matters — Edge lists are ideal; adjacency matrices require O(V²) extraction overhead.

Page Complete

2 / 4