Loading content...
Imagine labels as colored ink dropped onto specific points in a network. Over time, the ink diffuses through connections, coloring nearby nodes. Points strongly connected to red-labeled nodes become red; those connected to blue become blue. Points between clusters might become purple—a mix reflecting their ambiguous position.
This intuition underlies label propagation, a family of graph-based semi-supervised methods. Instead of training a classifier, we construct a similarity graph over all data (labeled and unlabeled) and propagate known labels through the graph structure until equilibrium.
By the end of this page, you will understand graph construction for semi-supervised learning, the label propagation algorithm, its closed-form solution, connections to random walks and harmonic functions, and practical implementation considerations.
Label propagation requires a similarity graph $G = (V, E, W)$ where nodes are all samples and edge weights encode similarity.
Common Graph Construction Methods:
1. k-Nearest Neighbors (kNN): $$w_{ij} = \begin{cases} \exp(-|x_i - x_j|^2 / \sigma^2) & \text{if } j \in \text{kNN}(i) \ 0 & \text{otherwise} \end{cases}$$
2. ε-Neighborhood: $$w_{ij} = \begin{cases} \exp(-|x_i - x_j|^2 / \sigma^2) & \text{if } |x_i - x_j| < \epsilon \ 0 & \text{otherwise} \end{cases}$$
3. Fully Connected (RBF): $$w_{ij} = \exp(-|x_i - x_j|^2 / \sigma^2)$$
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
import numpy as npfrom scipy.spatial.distance import cdistfrom sklearn.neighbors import kneighbors_graph def build_similarity_graph(X, method='knn', k=10, sigma=1.0, epsilon=0.5): """ Construct similarity graph for label propagation. Args: X: Feature matrix (n_samples, n_features) method: 'knn', 'epsilon', or 'rbf' k: Number of neighbors for kNN sigma: RBF kernel bandwidth epsilon: Radius for epsilon-neighborhood Returns: W: Symmetric affinity matrix (n, n) """ n = len(X) distances = cdist(X, X, metric='euclidean') if method == 'knn': # k-nearest neighbors graph W = np.zeros((n, n)) for i in range(n): neighbors = np.argsort(distances[i])[1:k+1] # Exclude self W[i, neighbors] = np.exp(-distances[i, neighbors]**2 / sigma**2) # Symmetrize: mutual kNN or union W = (W + W.T) / 2 elif method == 'epsilon': # Epsilon-neighborhood graph W = np.exp(-distances**2 / sigma**2) W[distances >= epsilon] = 0 np.fill_diagonal(W, 0) elif method == 'rbf': # Fully connected RBF W = np.exp(-distances**2 / sigma**2) np.fill_diagonal(W, 0) return W def compute_graph_laplacian(W, normalized=True): """ Compute graph Laplacian from affinity matrix. L = D - W (unnormalized) L = I - D^{-1/2} W D^{-1/2} (symmetric normalized) """ D = np.diag(W.sum(axis=1)) if normalized: D_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D) + 1e-10)) L = np.eye(len(W)) - D_inv_sqrt @ W @ D_inv_sqrt else: L = D - W return LThe bandwidth σ is critical. Too small: only very close points connect, graph fragments. Too large: everything connects, structure is lost. A common heuristic: set σ to the mean distance to the k-th nearest neighbor across all points.
The core idea: each unlabeled node's label is determined by a weighted average of its neighbors' labels, iterated until convergence.
Label Propagation (Zhu & Ghahramani, 2002):
Let $Y \in \mathbb{R}^{n \times c}$ be the label matrix where $Y_{ij}$ is the probability of sample $i$ having class $j$. Initialize labeled samples with one-hot vectors and unlabeled with uniform.
Propagation Rule: $$Y^{(t+1)} = D^{-1} W Y^{(t)}$$
where $D = \text{diag}(W \mathbf{1})$ is the degree matrix.
Clamping: After each iteration, reset labeled nodes to their true labels: $$Y^{(t+1)}_L = Y_L^{\text{true}}$$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import numpy as np def label_propagation(W, y_labeled, labeled_indices, n_classes, max_iter=1000, tol=1e-6): """ Label Propagation with label clamping. Args: W: Affinity matrix (n, n) y_labeled: Labels for labeled samples labeled_indices: Indices of labeled samples n_classes: Number of classes max_iter: Maximum iterations tol: Convergence tolerance Returns: Y: Soft label matrix (n, n_classes) """ n = W.shape[0] # Transition matrix: row-normalized W D_inv = np.diag(1.0 / (W.sum(axis=1) + 1e-10)) T = D_inv @ W # Initialize label matrix Y = np.ones((n, n_classes)) / n_classes # Uniform prior # Clamp labeled nodes for i, label in zip(labeled_indices, y_labeled): Y[i] = 0 Y[i, label] = 1 Y_labeled_fixed = Y[labeled_indices].copy() for iteration in range(max_iter): Y_prev = Y.copy() # Propagate Y = T @ Y # Clamp: reset labeled nodes Y[labeled_indices] = Y_labeled_fixed # Check convergence diff = np.abs(Y - Y_prev).max() if diff < tol: print(f"Converged at iteration {iteration}") break return Y def predict_labels(Y, labeled_indices): """Extract predictions for unlabeled samples.""" unlabeled = [i for i in range(len(Y)) if i not in labeled_indices] predictions = Y[unlabeled].argmax(axis=1) confidences = Y[unlabeled].max(axis=1) return unlabeled, predictions, confidencesConvergence and Closed-Form Solution:
Rather than iterating, we can solve directly. Partition nodes into labeled ($L$) and unlabeled ($U$). The equilibrium satisfies:
$$Y_U = (I - T_{UU})^{-1} T_{UL} Y_L$$
where $T_{UU}$ is the transition probability within unlabeled nodes and $T_{UL}$ from unlabeled to labeled. This shows labels propagate as a linear combination of labeled examples, weighted by graph proximity.
Label propagation has elegant interpretations from multiple theoretical perspectives:
1. Random Walk Interpretation:
The soft label $Y_{ij}$ equals the probability that a random walk starting from node $i$ first reaches a labeled node of class $j$: $$Y_{ij} = P(\text{absorb at class } j | \text{start at } i)$$
Labeled nodes are absorbing states—once reached, the walk terminates.
2. Harmonic Function:
Label propagation finds a harmonic function on the graph—one that satisfies: $$f(i) = \sum_j w_{ij} f(j) / \sum_j w_{ij} \quad \forall i \in U$$
A harmonic function equals its weighted average over neighbors. This is the graph analog of solving Laplace's equation with Dirichlet boundary conditions.
3. Energy Minimization:
The solution minimizes the quadratic energy: $$E(f) = \frac{1}{2} \sum_{i,j} w_{ij}(f_i - f_j)^2 = f^T L f$$
subject to $f_L = y_L$. This encourages smoothness: connected nodes should have similar labels.
All these interpretations encode the same assumption: labels should vary smoothly across the graph. Points strongly connected to class A should themselves be class A. Label propagation is optimal when the class boundary aligns with sparse graph connectivity.
Label spreading (Zhou et al., 2004) modifies the objective to be more robust to label noise by allowing labeled nodes to change:
$$\min_F \sum_{i,j} w_{ij} \left| \frac{f_i}{\sqrt{d_i}} - \frac{f_j}{\sqrt{d_j}} \right|^2 + \mu \sum_i |f_i - y_i|^2$$
The first term encourages smoothness (normalized by degree). The second penalizes deviation from given labels, but allows some movement.
Closed-form Solution: $$F^* = (I - \alpha S)^{-1} Y$$
where $S = D^{-1/2} W D^{-1/2}$ is the normalized affinity and $\alpha = 1/(1+\mu)$ controls label influence vs. smoothness.
123456789101112131415161718192021222324252627
from sklearn.semi_supervised import LabelSpreading, LabelPropagationimport numpy as np # Using sklearn's implementationsdef compare_propagation_methods(X, y_partial, n_labeled): """ Compare label propagation vs label spreading. y_partial: labels with -1 for unlabeled """ # Label Propagation (hard clamping) lp = LabelPropagation(kernel='rbf', gamma=20, max_iter=1000) lp.fit(X, y_partial) # Label Spreading (soft clamping, alpha controls) ls = LabelSpreading(kernel='rbf', gamma=20, alpha=0.2, max_iter=1000) ls.fit(X, y_partial) return { 'propagation': { 'predictions': lp.transduction_, 'distributions': lp.label_distributions_ }, 'spreading': { 'predictions': ls.transduction_, 'distributions': ls.label_distributions_ } }| Aspect | Propagation | Spreading |
|---|---|---|
| Labeled nodes | Clamped (fixed) | Soft (can change) |
| Label noise | Vulnerable | Robust |
| Normalization | Row-normalized (D⁻¹W) | Symmetric (D⁻¹/²WD⁻¹/²) |
| Hyperparameter | None (beyond graph) | α (clamp strength) |
Scalability:
The closed-form solution requires inverting an $n_U \times n_U$ matrix—$O(n^3)$, infeasible for large datasets. Solutions:
Inductive Extension:
Label propagation is transductive—it cannot predict on new samples not in the graph. For induction:
What's Next:
Label propagation introduced the power of graph structure for semi-supervised learning. Next, we'll explore a broader family of graph-based methods including manifold regularization, graph neural networks for SSL, and methods that learn graphs rather than constructing them from distances.
You now understand label propagation: graph construction, the iterative algorithm, theoretical foundations, and the label spreading variant. These graph-based concepts form the foundation for modern graph neural network approaches to semi-supervised learning.