Loading learning content...
In the previous page, we established that Kruskal's algorithm processes edges in order of increasing weight, adding each edge if it doesn't create a cycle. But this greedy approach hinges on a critical preprocessing step: sorting all edges by weight.\n\nThis sorting step isn't merely a detail—it's the dominant factor in Kruskal's algorithm's time complexity. Understanding why we sort, how we sort, and what alternatives exist is essential for mastering this algorithm and applying it effectively in practice.\n\nConsider this: if your graph has 1 million edges, sorting them requires approximately 20 million comparison operations (E log E). The subsequent processing—checking for cycles and merging components—takes only about 2-3 million operations with proper Union-Find. The sorting step does 80-90% of the work.
By the end of this page, you will understand why edge sorting is essential for Kruskal's greedy approach, analyze how different sorting algorithms affect overall performance, explore practical considerations for different graph representations, and discover alternative approaches for special cases.
The correctness of Kruskal's algorithm fundamentally depends on processing edges from lightest to heaviest. Let's understand why this ordering is non-negotiable.\n\nThe Greedy Choice Property\n\nKruskal's algorithm is a greedy algorithm—it makes locally optimal choices (selecting the cheapest available safe edge) that lead to a globally optimal solution (the minimum spanning tree). For this greedy approach to work correctly, we need a specific processing order.\n\nTheorem: In Kruskal's algorithm, processing edges in non-decreasing weight order ensures that:\n1. Every edge we add is guaranteed to be in some MST (by the Cut Property)\n2. Every edge we reject is guaranteed to not be in any MST (by the Cycle Property)\n\nProof Sketch:\n\nWhen we consider edge e = (u, v), one of two things is true:\n\nCase 1: u and v are in different components.\n- Consider the cut that separates u's component from the rest of the graph\n- Edge e crosses this cut\n- All lighter edges either don't cross this cut (they're within one component) or were already processed\n- Therefore, e is the minimum-weight edge crossing this cut\n- By the Cut Property, e belongs to some MST ✓\n\nCase 2: u and v are in the same component.\n- There exists a path P from u to v using edges already in our forest\n- Adding e would create a cycle C = P ∪ {e}\n- All edges in P were added earlier, so they all weigh ≤ weight(e)\n- Since e has maximum weight in cycle C, by the Cycle Property, e is not in any MST ✓
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
Consider this graph: A───────B │\ /│ 2 │ \ 1 / │ 3 │ \ / │ │ X │ │ / \ │ 1 │ / 3 \ │ 2 │/ \│ C───────D 1 Edges: AC: weight 2 AD: weight 1 AB: weight 1 BC: weight 1 CD: weight 1 BD: weight 3 True MST: AD(1) + AB(1) + BC(1) or AD(1) + AB(1) + CD(1)Minimum weight: 3 ═══════════════════════════════════════════════════════════════════CORRECT: Processing in sorted order═══════════════════════════════════════════════════════════════════Sorted: AD(1), AB(1), BC(1), CD(1), AC(2), BD(3) 1. AD(1): A≠D, add. Components: {A,D}, {B}, {C}2. AB(1): A≠B, add. Components: {A,B,D}, {C}3. BC(1): B≠C, add. Components: {A,B,C,D}4. CD(1): C=D (same component), REJECT...MST weight: 1+1+1 = 3 ✓ (optimal!) ═══════════════════════════════════════════════════════════════════INCORRECT: Processing in arbitrary order (unsorted)═══════════════════════════════════════════════════════════════════Arbitrary order: AC(2), BD(3), AD(1), AB(1), BC(1), CD(1) 1. AC(2): A≠C, add. Components: {A,C}, {B}, {D}2. BD(3): B≠D, add. Components: {A,C}, {B,D}3. AD(1): A≠D, add. Components: {A,C,B,D}4. AB(1): A=B (same component), REJECT...MST weight: 2+3+1 = 6 ✗ (NOT optimal!) The greedy approach only works with sorted edges!The greedy approach of 'take the cheapest safe edge' only produces optimal results when we globally order edges by weight first. Processing edges in any other order—such as the order they're discovered during traversal or the order they appear in an input file—will likely produce suboptimal spanning trees.
The choice of sorting algorithm directly impacts Kruskal's overall runtime. Let's analyze the options systematically.\n\nComparison-Based Sorting: The Standard Approach\n\nFor arbitrary edge weights (floating-point, very large integers, etc.), comparison-based sorting is the default. The lower bound for comparison-based sorting is Ω(E log E), and we have several algorithms achieving this:\n\nMerge Sort:\n- Time: O(E log E) worst-case\n- Space: O(E) additional\n- Stable: Yes\n- Pros: Guaranteed O(E log E), stable, good cache performance with proper implementation\n\nQuick Sort:\n- Time: O(E log E) average, O(E²) worst-case\n- Space: O(log E) stack space\n- Stable: No (standard version)\n- Pros: Excellent average performance, in-place, good cache locality\n\nHeap Sort:\n- Time: O(E log E) worst-case\n- Space: O(1) additional\n- Stable: No\n- Pros: In-place, guaranteed O(E log E), no extra memory\n\nIntrosort (Hybrid):\n- Time: O(E log E) worst-case\n- Space: O(log E)\n- Stable: No\n- Pros: Combines quicksort's average-case performance with heapsort's worst-case guarantee\n- This is what most standard library sort functions use (C++ std::sort, Python's Timsort variant)
| Algorithm | Time Complexity | Space | Stable | Best For |
|---|---|---|---|---|
| Merge Sort | O(E log E) | O(E) | Yes | When stability matters, guaranteed performance |
| Quick Sort | O(E log E) avg | O(log E) | No | Average-case performance-critical applications |
| Heap Sort | O(E log E) | O(1) | No | Memory-constrained environments |
| Introsort | O(E log E) | O(log E) | No | Library implementations, general use |
| Tim Sort | O(E log E) | O(E) | Yes | Nearly-sorted data, Python/Java default |
| Radix Sort | O(E × k) | O(E) | Yes | Integer weights with bounded range |
| Counting Sort | O(E + W) | O(W) | Yes | Small weight range, dense distribution |
Non-Comparison Sorting: When Weights Have Special Structure\n\nIf edge weights have special properties, we can beat O(E log E):\n\nCounting Sort (when weight range is small):\n\nIf all weights are integers in range [0, W], counting sort achieves O(E + W) time. For practical cases where W ≈ E or W is polynomial in E, this is O(E), making the overall Kruskal's algorithm dominated by the Union-Find operations rather than sorting!\n\nRadix Sort (when weights are integers with bounded digits):\n\nFor k-digit integers (or integers up to E^k for some constant k), radix sort achieves O(E × k) = O(E) time. Again, this shifts the complexity bottleneck.\n\nBucket Sort (when weights are uniformly distributed):\n\nIf weights are uniformly distributed in a known range, bucket sort can achieve O(E) expected time.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
import heapqfrom typing import List, Tuple Edge = Tuple[int, int, float] # (u, v, weight) # ═══════════════════════════════════════════════════════════════════# APPROACH 1: Standard Library Sort (Recommended for most cases)# ═══════════════════════════════════════════════════════════════════def sort_edges_standard(edges: List[Edge]) -> List[Edge]: """ Uses Python's Timsort via sorted(). Time: O(E log E) Space: O(E) for the new list """ return sorted(edges, key=lambda e: e[2]) # ═══════════════════════════════════════════════════════════════════# APPROACH 2: In-Place Sort (When memory is constrained)# ═══════════════════════════════════════════════════════════════════def sort_edges_inplace(edges: List[Edge]) -> None: """ Sorts edges in-place using Python's sort(). Time: O(E log E) Space: O(1) additional (modifies original list) """ edges.sort(key=lambda e: e[2]) # ═══════════════════════════════════════════════════════════════════# APPROACH 3: Counting Sort (When weights are small integers)# ═══════════════════════════════════════════════════════════════════def sort_edges_counting(edges: List[Edge], max_weight: int) -> List[Edge]: """ Counting sort for integer weights in [0, max_weight]. Time: O(E + max_weight) Space: O(E + max_weight) Use when max_weight is O(E) or smaller for O(E) sorting! """ # Count occurrences of each weight counts = [0] * (max_weight + 1) for _, _, w in edges: counts[int(w)] += 1 # Convert to cumulative counts (positions) for i in range(1, len(counts)): counts[i] += counts[i - 1] # Build sorted output output = [None] * len(edges) for edge in reversed(edges): # Reverse for stability w = int(edge[2]) counts[w] -= 1 output[counts[w]] = edge return output # ═══════════════════════════════════════════════════════════════════# APPROACH 4: Partial Sort via Heap (When you might terminate early)# ═══════════════════════════════════════════════════════════════════def edges_by_weight_lazy(edges: List[Edge]): """ Yields edges in sorted order using a min-heap. Only builds the heap O(E), then pops are O(log E) each. If you don't need all edges (graph is sparse or dense), this can be faster than full sorting! Building heap: O(E) Each pop: O(log E) Getting k edges: O(E + k log E) Beats O(E log E) when k << E. """ # Build min-heap: O(E) heap = [(w, u, v) for u, v, w in edges] heapq.heapify(heap) # Yield in sorted order while heap: w, u, v = heapq.heappop(heap) yield (u, v, w) # ═══════════════════════════════════════════════════════════════════# DEMO: Comparing approaches# ═══════════════════════════════════════════════════════════════════import timeimport random def benchmark_sorting(): """Compare sorting approaches on random edges.""" # Generate random edges E = 100000 edges = [(random.randint(0, 1000), random.randint(0, 1000), random.randint(1, 1000)) for _ in range(E)] # Approach 1: Standard sort start = time.time() sorted1 = sort_edges_standard(edges.copy()) time1 = time.time() - start print(f"Standard sort: {time1:.4f}s") # Approach 2: Counting sort (weights are 1-1000) start = time.time() sorted2 = sort_edges_counting(edges.copy(), 1000) time2 = time.time() - start print(f"Counting sort: {time2:.4f}s") # Approach 3: Lazy heap (get first V-1 = ~200 edges) start = time.time() lazy_gen = edges_by_weight_lazy(edges.copy()) sorted3 = [next(lazy_gen) for _ in range(200)] time3 = time.time() - start print(f"Lazy heap (200 edges): {time3:.4f}s") # Verify correctness assert sorted1[0][2] == sorted2[0][2] == sorted3[0][2] print("All approaches produce correct minimum!")Kruskal's algorithm has time complexity O(E log E), which is often stated as "dominated by sorting." Let's rigorously analyze this claim and understand when it's accurate.\n\nComponent-by-Component Analysis:\n\nStep 1: Build Edge List\n- If edges are already in a list: O(1)\n- If extracting from adjacency list: O(E)\n- If extracting from adjacency matrix: O(V²)\n\nStep 2: Sort Edges\n- Comparison-based sorting: O(E log E)\n- This is the bottleneck for most practical scenarios\n\nStep 3: Initialize Union-Find\n- Create V singleton sets: O(V)\n\nStep 4: Process Sorted Edges\n- For each edge (up to E edges):\n - Find operations: O(α(V)) each with path compression\n - Union operations: O(α(V)) each with union by rank\n- Total: O(E × α(V)) where α is the inverse Ackermann function\n- Since α(V) ≤ 4 for all practical V (up to 10^80!), this is effectively O(E)\n\nTotal: O(E log E + E × α(V)) = O(E log E)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
═══════════════════════════════════════════════════════════════════KRUSKAL'S ALGORITHM COMPLEXITY BREAKDOWN═══════════════════════════════════════════════════════════════════ For a graph with V vertices and E edges: ┌─────────────────────────────────────────────────────────────────┐│ STEP │ TIME COMPLEXITY │├───────────────────────────┼─────────────────────────────────────┤│ Extract edge list │ O(E) or O(V²) from adj. matrix ││ Sort edges │ O(E log E) ← DOMINATES ││ Initialize Union-Find │ O(V) ││ Process E edges │ O(E × α(V)) ≈ O(E) ││ - E × Find operations │ O(E × α(V)) ││ - ≤ V-1 Union operations │ O(V × α(V)) │└───────────────────────────┴─────────────────────────────────────┘ Total: O(E log E) ═══════════════════════════════════════════════════════════════════WHY SORTING DOMINATES═══════════════════════════════════════════════════════════════════ For a graph with V = 10,000 vertices and E = 100,000 edges: Sorting: E log E = 100,000 × 17 = 1,700,000 comparisonsUnion-Find: E × α(V) ≈ 100,000 × 4 = 400,000 operations Sorting does ~80% of the work! For a dense graph with V = 1,000 and E = V² = 1,000,000: Sorting: E log E = 1,000,000 × 20 = 20,000,000 comparisonsUnion-Find: E × α(V) ≈ 1,000,000 × 3 = 3,000,000 operations Sorting does ~87% of the work! ═══════════════════════════════════════════════════════════════════WHEN SORTING DOESN'T DOMINATE═══════════════════════════════════════════════════════════════════ 1. Pre-sorted edges: Sorting is O(E) with good algorithms (Timsort)2. Integer weights: Radix/counting sort gives O(E)3. Lazy evaluation: Heap gives O(E + k log E) for k edges needed In these cases, Union-Find becomes the bottleneck: O(E × α(V))Why O(E log E) = O(E log V):\n\nYou'll sometimes see Kruskal's complexity stated as O(E log V) instead of O(E log E). These are equivalent because:\n\nFor a simple graph: E ≤ V(V-1)/2 = O(V²)\n\nTherefore: log E ≤ log V² = 2 log V = O(log V)\n\nAnd trivially: log E ≥ log V (since a connected graph needs at least V-1 edges)\n\nSo log E = Θ(log V), making O(E log E) = O(E log V).\n\nHowever, O(E log E) is more precise when the graph is very sparse (E ≈ V), as in a tree or chain.
When optimizing Kruskal's algorithm for specific use cases, focus on the sorting step first. Using a faster sorting algorithm, exploiting structure in edge weights, or avoiding full sorting (via lazy extraction) can provide significant speedups.
The way your graph is represented affects how efficiently you can extract edges for Kruskal's algorithm. Let's examine the common representations.\n\nEdge List Representation:\n\nThis is the ideal representation for Kruskal's algorithm. Edges are already collected—just sort and process.\n\n\nedges = [(0, 1, 4), (0, 2, 3), (1, 2, 1), ...]\n\n\n- Extraction time: O(1) — edges already in list form\n- Memory: O(E)\n- Ideal for Kruskal's\n\nAdjacency List Representation:\n\nCommon for general graph algorithms. We need to extract edges into a sortable list.\n\n\nadj = {\n 0: [(1, 4), (2, 3)],\n 1: [(0, 4), (2, 1)],\n ...\n}\n\n\n- Extraction time: O(E) — iterate through all adjacency lists\n- Care needed: For undirected graphs, each edge appears twice. Either extract once or deduplicate.\n\nAdjacency Matrix Representation:\n\nUsed for dense graphs. Least efficient for Kruskal's extraction.\n\n\nmatrix[i][j] = weight of edge (i, j) or INF if no edge\n\n\n- Extraction time: O(V²) — must scan entire matrix\n- For dense graphs where E = Θ(V²), this matches O(E)\n- For sparse graphs, this is wasteful
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
from typing import List, Dict, Tuple, Setimport math Edge = Tuple[int, int, float] # ═══════════════════════════════════════════════════════════════════# FROM EDGE LIST# ═══════════════════════════════════════════════════════════════════def extract_from_edge_list(edge_list: List[Edge]) -> List[Edge]: """ Trivial case: edges already in the format we need. Time: O(1) to reference, O(E) to copy """ return edge_list.copy() # Or just use directly if modification is safe # ═══════════════════════════════════════════════════════════════════# FROM ADJACENCY LIST (Undirected Graph)# ═══════════════════════════════════════════════════════════════════def extract_from_adjacency_list_undirected( adj: Dict[int, List[Tuple[int, float]]]) -> List[Edge]: """ Extract edges from adjacency list representation. For undirected graphs, each edge (u,v) appears as: - v in adj[u] with weight w - u in adj[v] with weight w We extract once by enforcing u < v. Time: O(E) """ edges = [] for u in adj: for v, weight in adj[u]: if u < v: # Only take each edge once edges.append((u, v, weight)) return edges def extract_from_adjacency_list_directed( adj: Dict[int, List[Tuple[int, float]]]) -> List[Edge]: """ For directed graphs, every adjacency entry is a unique edge. Time: O(E) """ edges = [] for u in adj: for v, weight in adj[u]: edges.append((u, v, weight)) return edges # ═══════════════════════════════════════════════════════════════════# FROM ADJACENCY MATRIX# ═══════════════════════════════════════════════════════════════════def extract_from_adjacency_matrix( matrix: List[List[float]], V: int, undirected: bool = True) -> List[Edge]: """ Extract edges from V×V adjacency matrix. matrix[i][j] = weight of edge i→j, or math.inf / None if no edge. Time: O(V²) — must examine entire matrix For sparse graphs, this is inefficient! For dense graphs where E = Θ(V²), this is acceptable. """ edges = [] if undirected: # Only examine upper triangle to avoid duplicates for i in range(V): for j in range(i + 1, V): weight = matrix[i][j] if weight is not None and weight != math.inf: edges.append((i, j, weight)) else: # Examine all entries for directed graph for i in range(V): for j in range(V): if i != j: # Exclude self-loops weight = matrix[i][j] if weight is not None and weight != math.inf: edges.append((i, j, weight)) return edges # ═══════════════════════════════════════════════════════════════════# DEMONSTRATION# ═══════════════════════════════════════════════════════════════════def demo(): # Example graph: # 0 --4-- 1 # |\ /| # 3 2 1 5 # | / \| # 2 --6-- 3 # As edge list edge_list = [ (0, 1, 4), (0, 2, 3), (0, 3, 2), (1, 2, 1), (1, 3, 5), (2, 3, 6) ] # As adjacency list adj_list = { 0: [(1, 4), (2, 3), (3, 2)], 1: [(0, 4), (2, 1), (3, 5)], 2: [(0, 3), (1, 1), (3, 6)], 3: [(0, 2), (1, 5), (2, 6)] } # As adjacency matrix INF = math.inf adj_matrix = [ [0, 4, 3, 2 ], [4, 0, 1, 5 ], [3, 1, 0, 6 ], [2, 5, 6, 0 ] ] # Extract and verify from_list = extract_from_edge_list(edge_list) from_adj = extract_from_adjacency_list_undirected(adj_list) from_matrix = extract_from_adjacency_matrix(adj_matrix, 4) print(f"From edge list: {len(from_list)} edges") print(f"From adj list: {len(from_adj)} edges") print(f"From adj matrix: {len(from_matrix)} edges") # All should have 6 edges assert len(from_list) == len(from_adj) == len(from_matrix) == 6If you know you'll be running Kruskal's algorithm, consider maintaining an edge list alongside your primary graph representation. The O(E) space overhead is usually worthwhile for O(1) extraction versus O(V²) matrix scanning.
An important optimization for Kruskal's algorithm is lazy sorting—not sorting the entire edge list upfront, but instead extracting edges in sorted order as needed using a min-heap.\n\nThe Key Insight:\n\nKruskal's algorithm typically adds V-1 edges to the MST. In a sparse graph where E ≈ V, we process almost all edges anyway. But in a denser graph where E >> V, we might only need the smallest 10-20% of edges before completing the MST. Full sorting does unnecessary work!\n\nHeap-Based Approach:\n\n1. Build a min-heap from all edges: O(E) time\n2. Extract minimum edge: O(log E) per extraction\n3. Total for k extractions: O(E + k log E)\n\nWhen Lazy Sorting Wins:\n\n- If we need only k edges (where k << E), lazy sorting takes O(E + k log E)\n- Full sorting takes O(E log E)\n- Lazy wins when: E + k log E < E log E\n- This simplifies to: k < E - E/log E ≈ E (always true for k < E)\n\nBut practically, the break-even point depends on constants. Lazy sorting has overhead from heap operations, while full sorting benefits from cache-efficient implementations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
import heapqfrom typing import List, Tuple, Optional Edge = Tuple[int, int, float] class UnionFind: """Standard Union-Find with path compression and union by rank.""" def __init__(self, n: int): self.parent = list(range(n)) self.rank = [0] * n def find(self, x: int) -> int: if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) # Path compression return self.parent[x] def union(self, x: int, y: int) -> bool: """Returns True if x and y were in different components.""" px, py = self.find(x), self.find(y) if px == py: return False # Union by rank if self.rank[px] < self.rank[py]: px, py = py, px self.parent[py] = px if self.rank[px] == self.rank[py]: self.rank[px] += 1 return True def kruskal_lazy(vertices: int, edges: List[Edge]) -> Tuple[List[Edge], float]: """ Kruskal's algorithm with lazy edge extraction via min-heap. Time Complexity: - Heapify: O(E) - At most E heap pops: O(E log E) worst case - But typically only O(V) edges needed: O(E + V log E) For dense graphs (E = Θ(V²)), this can be faster than full sorting! """ # Build min-heap: (weight, u, v) # We rearrange to put weight first for heap ordering heap = [(w, u, v) for u, v, w in edges] heapq.heapify(heap) # O(E) uf = UnionFind(vertices) mst_edges = [] mst_weight = 0.0 edges_needed = vertices - 1 edges_examined = 0 while heap and len(mst_edges) < edges_needed: w, u, v = heapq.heappop(heap) # O(log E) edges_examined += 1 if uf.union(u, v): mst_edges.append((u, v, w)) mst_weight += w # Statistics if len(mst_edges) == edges_needed: print(f"MST found! Examined {edges_examined}/{len(edges)} edges " f"({100*edges_examined/len(edges):.1f}%)") else: print("Graph is disconnected - no spanning tree exists") return mst_edges, mst_weight def kruskal_standard(vertices: int, edges: List[Edge]) -> Tuple[List[Edge], float]: """ Standard Kruskal's with full sorting for comparison. """ sorted_edges = sorted(edges, key=lambda e: e[2]) # O(E log E) uf = UnionFind(vertices) mst_edges = [] mst_weight = 0.0 for u, v, w in sorted_edges: if uf.union(u, v): mst_edges.append((u, v, w)) mst_weight += w if len(mst_edges) == vertices - 1: break return mst_edges, mst_weight # ═══════════════════════════════════════════════════════════════════# BENCHMARK: When does lazy sorting help?# ═══════════════════════════════════════════════════════════════════import timeimport random def benchmark_lazy_vs_standard(): """Compare lazy heap approach vs full sorting.""" # Dense graph: V vertices, many edges V = 1000 E_dense = V * (V - 1) // 4 # ~25% of complete graph = 249,750 edges edges = [] for i in range(V): for j in range(i + 1, V): if random.random() < 0.25: # 25% edge density edges.append((i, j, random.random() * 100)) print(f"Graph: {V} vertices, {len(edges)} edges") print(f"Expected edges in MST: {V - 1}") print() # Standard approach (full sort) edges_copy = edges.copy() start = time.time() mst1, weight1 = kruskal_standard(V, edges_copy) time_standard = time.time() - start print(f"Standard (full sort): {time_standard:.4f}s") # Lazy approach (heap) edges_copy = edges.copy() start = time.time() mst2, weight2 = kruskal_lazy(V, edges_copy) time_lazy = time.time() - start print(f"Lazy (heap): {time_lazy:.4f}s") print() print(f"Speedup from lazy: {time_standard/time_lazy:.2f}x") print(f"MST weights match: {abs(weight1 - weight2) < 0.001}") if __name__ == "__main__": benchmark_lazy_vs_standard()Use heap-based lazy extraction when: (1) the graph is dense (E >> V), so only a fraction of edges will be examined; (2) memory is tight and you can't afford to store a sorted copy; (3) edges arrive in a stream and you need to process them incrementally. Use full sorting when: the graph is sparse, sorting is highly optimized in your environment, or you need all edges in sorted order for other purposes.
In many practical applications, edges aren't arbitrary—they have structure that can be exploited to speed up or skip the sorting step entirely.\n\nScenario 1: Edges Already Sorted\n\nIf edges come from a source that naturally orders them by weight (e.g., a sorted database table, output from another algorithm), we can skip sorting entirely!\n\nAdaptive sorting algorithms like Timsort detect nearly-sorted input and run in O(E) time instead of O(E log E). Python's sorted() and .sort() use Timsort.\n\nScenario 2: Integer Weights with Small Range\n\nIf edge weights are integers in range [0, W], use counting sort for O(E + W) time.\n\nExample applications:\n- Road networks with distance in discrete units (miles, km)\n- Network latency bucketed into categories\n- Grid graphs with unit weights\n\nScenario 3: Weights from a Known Distribution\n\nIf weights are uniformly distributed in [0, 1], bucket sort can achieve O(E) expected time.\n\nScenario 4: Geometric Graphs\n\nIn Euclidean MST problems (MST of points in the plane), spatial data structures can help generate edges in approximately sorted order, or prune the edge set significantly before sorting.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173
from typing import List, Tuplefrom collections import defaultdict Edge = Tuple[int, int, float] # ═══════════════════════════════════════════════════════════════════# CASE 1: Integer weights with small range - use Counting Sort# ═══════════════════════════════════════════════════════════════════def kruskal_integer_weights( vertices: int, edges: List[Edge], max_weight: int) -> Tuple[List[Edge], int]: """ Optimized Kruskal's for integer weights in [0, max_weight]. Instead of O(E log E) comparison sort, uses O(E + W) counting sort. When W = O(E), this makes sorting O(E), and overall algorithm is dominated by Union-Find: O(E × α(V)) ≈ O(E). Args: vertices: Number of vertices edges: List of (u, v, weight) with integer weights max_weight: Maximum possible weight """ # Counting sort: bucket edges by weight buckets = defaultdict(list) for u, v, w in edges: buckets[int(w)].append((u, v, w)) # Process edges in sorted order (iterate through buckets) uf = UnionFind(vertices) mst_edges = [] mst_weight = 0 for weight in range(max_weight + 1): if weight in buckets: for u, v, w in buckets[weight]: if uf.union(u, v): mst_edges.append((u, v, w)) mst_weight += w if len(mst_edges) == vertices - 1: return mst_edges, mst_weight return mst_edges, mst_weight # ═══════════════════════════════════════════════════════════════════# CASE 2: Pre-sorted edges - skip sorting entirely# ═══════════════════════════════════════════════════════════════════def kruskal_presorted( vertices: int, sorted_edges: List[Edge]) -> Tuple[List[Edge], float]: """ Kruskal's when edges are already sorted by weight. Time: O(E × α(V)) ≈ O(E) The sorting bottleneck is eliminated! """ uf = UnionFind(vertices) mst_edges = [] mst_weight = 0.0 for u, v, w in sorted_edges: # Already in sorted order if uf.union(u, v): mst_edges.append((u, v, w)) mst_weight += w if len(mst_edges) == vertices - 1: break return mst_edges, mst_weight # ═══════════════════════════════════════════════════════════════════# CASE 3: Grid graph - unit weights# ═══════════════════════════════════════════════════════════════════def kruskal_grid_unit_weights(rows: int, cols: int) -> int: """ MST of a grid graph where all edges have weight 1. Since all weights are equal, no sorting needed! Just find any spanning tree. For an r×c grid: - Vertices: r × c - MST needs: r × c - 1 edges - All edges have weight 1 - MST weight: r × c - 1 Time: O(r × c) """ vertices = rows * cols # Any spanning tree works! Just traverse all cells. # Each cell except the first adds exactly 1 edge. return vertices - 1 # MST weight def kruskal_grid_random_weights( rows: int, cols: int, horizontal_weights: List[List[float]], vertical_weights: List[List[float]]) -> Tuple[List[Edge], float]: """ MST of a grid graph with arbitrary edge weights. Grid structure: - Horizontal edges: (i,j) -- (i,j+1) for j in 0..cols-2 - Vertical edges: (i,j) -- (i+1,j) for i in 0..rows-2 horizontal_weights[i][j] = weight of edge (i,j)--(i,j+1) vertical_weights[i][j] = weight of edge (i,j)--(i+1,j) """ def vertex_id(r: int, c: int) -> int: return r * cols + c edges = [] # Collect horizontal edges for i in range(rows): for j in range(cols - 1): u = vertex_id(i, j) v = vertex_id(i, j + 1) w = horizontal_weights[i][j] edges.append((u, v, w)) # Collect vertical edges for i in range(rows - 1): for j in range(cols): u = vertex_id(i, j) v = vertex_id(i + 1, j) w = vertical_weights[i][j] edges.append((u, v, w)) # Standard Kruskal's sorted_edges = sorted(edges, key=lambda e: e[2]) uf = UnionFind(rows * cols) mst_edges = [] mst_weight = 0.0 for u, v, w in sorted_edges: if uf.union(u, v): mst_edges.append((u, v, w)) mst_weight += w if len(mst_edges) == rows * cols - 1: break return mst_edges, mst_weight # Union-Find implementationclass UnionFind: def __init__(self, n: int): self.parent = list(range(n)) self.rank = [0] * n def find(self, x: int) -> int: if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x: int, y: int) -> bool: px, py = self.find(x), self.find(y) if px == py: return False if self.rank[px] < self.rank[py]: px, py = py, px self.parent[py] = px if self.rank[px] == self.rank[py]: self.rank[px] += 1 return TrueBefore running Kruskal's algorithm, always ask: Do I know anything about the edge weights? Are they pre-sorted? Are they integers in a small range? Are they all equal? Do they come from a geometric source? Exploiting structure can reduce complexity from O(E log E) to O(E) or better.
Based on our analysis, here are practical guidelines for implementing the sorting phase effectively.\n\nGuideline 1: Use Standard Library Sorting\n\nUnless you have a specific reason to optimize, use your language's built-in sort. It's heavily optimized, handles edge cases, and is well-tested.\n\npython\n# Python\nedges.sort(key=lambda e: e[2]) # In-place\nsorted_edges = sorted(edges, key=lambda e: e[2]) # New list\n\n\njavascript\n// JavaScript\nedges.sort((a, b) => a[2] - b[2]);\n\n\ncpp\n// C++\nstd::sort(edges.begin(), edges.end(), \n [](const Edge& a, const Edge& b) { return a.weight < b.weight; });\n\n\nGuideline 2: Early Termination\n\nAlways check if you've found V-1 edges and terminate early. In dense graphs, this saves significant work.
| Graph Type | Edge Weights | Recommended Strategy |
|---|---|---|
| Sparse (E ≈ V) | Any | Standard library sort |
| Medium (E ≈ V log V) | Any | Standard library sort |
| Dense (E ≈ V²) | Float/arbitrary | Lazy heap extraction |
| Dense (E ≈ V²) | Small integers | Counting sort + iteration |
| Any | Pre-sorted | No sorting needed |
| Any | All equal | No sorting needed (any spanning tree) |
| Geometric (Euclidean) | Distances | Spatial data structures + filtering |
We've thoroughly analyzed the sorting phase of Kruskal's algorithm—the step that typically determines overall performance. Let's consolidate the key insights:
What's Next:\n\nWith sorted edges in hand, we need a way to determine whether adding an edge would create a cycle. The next page explores the Union-Find data structure—the elegant solution that makes Kruskal's cycle detection nearly constant time per operation.
You now understand the critical role of edge sorting in Kruskal's algorithm. From standard comparison-based sorts to exploiting special structure in edge weights, you have the knowledge to implement and optimize this phase effectively.