Semi Supervised Methods - Learning Module

Loading content...

0/245

Label Propagation

Labels as Fluid: Flowing Through Data Structure

Imagine labels as colored ink dropped onto specific points in a network. Over time, the ink diffuses through connections, coloring nearby nodes. Points strongly connected to red-labeled nodes become red; those connected to blue become blue. Points between clusters might become purple—a mix reflecting their ambiguous position.

This intuition underlies label propagation, a family of graph-based semi-supervised methods. Instead of training a classifier, we construct a similarity graph over all data (labeled and unlabeled) and propagate known labels through the graph structure until equilibrium.

What You Will Learn

By the end of this page, you will understand graph construction for semi-supervised learning, the label propagation algorithm, its closed-form solution, connections to random walks and harmonic functions, and practical implementation considerations.

Graph Construction

Label propagation requires a similarity graph $G = (V, E, W)$ where nodes are all samples and edge weights encode similarity.

Common Graph Construction Methods:

1. k-Nearest Neighbors (kNN): $$w_{ij} = \begin{cases} \exp(-|x_i - x_j|^2 / \sigma^2) & \text{if } j \in \text{kNN}(i) \ 0 & \text{otherwise} \end{cases}$$

2. ε-Neighborhood: $$w_{ij} = \begin{cases} \exp(-|x_i - x_j|^2 / \sigma^2) & \text{if } |x_i - x_j| < \epsilon \ 0 & \text{otherwise} \end{cases}$$

3. Fully Connected (RBF): $$w_{ij} = \exp(-|x_i - x_j|^2 / \sigma^2)$$

graph_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.neighbors import kneighbors_graph
 
def build_similarity_graph(X, method='knn', k=10, sigma=1.0, epsilon=0.5):
    """
    Construct similarity graph for label propagation.
    
    Args:
        X: Feature matrix (n_samples, n_features)
        method: 'knn', 'epsilon', or 'rbf'
        k: Number of neighbors for kNN
        sigma: RBF kernel bandwidth
        epsilon: Radius for epsilon-neighborhood
    
    Returns:
        W: Symmetric affinity matrix (n, n)
    """
    n = len(X)
    distances = cdist(X, X, metric='euclidean')
    
    if method == 'knn':
        # k-nearest neighbors graph
        W = np.zeros((n, n))
        for i in range(n):
            neighbors = np.argsort(distances[i])[1:k+1]  # Exclude self
            W[i, neighbors] = np.exp(-distances[i, neighbors]**2 / sigma**2)
        # Symmetrize: mutual kNN or union
        W = (W + W.T) / 2
        
    elif method == 'epsilon':
        # Epsilon-neighborhood graph
        W = np.exp(-distances**2 / sigma**2)
        W[distances >= epsilon] = 0
        np.fill_diagonal(W, 0)
        
    elif method == 'rbf':
        # Fully connected RBF
        W = np.exp(-distances**2 / sigma**2)
        np.fill_diagonal(W, 0)
        
    return W
 
def compute_graph_laplacian(W, normalized=True):
    """
    Compute graph Laplacian from affinity matrix.
    
    L = D - W (unnormalized)
    L = I - D^{-1/2} W D^{-1/2} (symmetric normalized)
    """
    D = np.diag(W.sum(axis=1))
    
    if normalized:
        D_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D) + 1e-10))
        L = np.eye(len(W)) - D_inv_sqrt @ W @ D_inv_sqrt
    else:
        L = D - W
        
    return L

Choosing σ (Bandwidth)

The bandwidth σ is critical. Too small: only very close points connect, graph fragments. Too large: everything connects, structure is lost. A common heuristic: set σ to the mean distance to the k-th nearest neighbor across all points.

The Label Propagation Algorithm

The core idea: each unlabeled node's label is determined by a weighted average of its neighbors' labels, iterated until convergence.

Label Propagation (Zhu & Ghahramani, 2002):

Let $Y \in \mathbb{R}^{n \times c}$ be the label matrix where $Y_{ij}$ is the probability of sample $i$ having class $j$. Initialize labeled samples with one-hot vectors and unlabeled with uniform.

Propagation Rule: $$Y^{(t+1)} = D^{-1} W Y^{(t)}$$

where $D = \text{diag}(W \mathbf{1})$ is the degree matrix.

Clamping: After each iteration, reset labeled nodes to their true labels: $$Y^{(t+1)}_L = Y_L^{\text{true}}$$

label_propagation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
 
def label_propagation(W, y_labeled, labeled_indices, n_classes, 
                      max_iter=1000, tol=1e-6):
    """
    Label Propagation with label clamping.
    
    Args:
        W: Affinity matrix (n, n)
        y_labeled: Labels for labeled samples
        labeled_indices: Indices of labeled samples
        n_classes: Number of classes
        max_iter: Maximum iterations
        tol: Convergence tolerance
    
    Returns:
        Y: Soft label matrix (n, n_classes)
    """
    n = W.shape[0]
    
    # Transition matrix: row-normalized W
    D_inv = np.diag(1.0 / (W.sum(axis=1) + 1e-10))
    T = D_inv @ W
    
    # Initialize label matrix
    Y = np.ones((n, n_classes)) / n_classes  # Uniform prior
    
    # Clamp labeled nodes
    for i, label in zip(labeled_indices, y_labeled):
        Y[i] = 0
        Y[i, label] = 1
    
    Y_labeled_fixed = Y[labeled_indices].copy()
    
    for iteration in range(max_iter):
        Y_prev = Y.copy()
        
        # Propagate
        Y = T @ Y
        
        # Clamp: reset labeled nodes
        Y[labeled_indices] = Y_labeled_fixed
        
        # Check convergence
        diff = np.abs(Y - Y_prev).max()
        if diff < tol:
            print(f"Converged at iteration {iteration}")
            break
    
    return Y
 
def predict_labels(Y, labeled_indices):
    """Extract predictions for unlabeled samples."""
    unlabeled = [i for i in range(len(Y)) if i not in labeled_indices]
    predictions = Y[unlabeled].argmax(axis=1)
    confidences = Y[unlabeled].max(axis=1)
    return unlabeled, predictions, confidences

Convergence and Closed-Form Solution:

Rather than iterating, we can solve directly. Partition nodes into labeled ($L$) and unlabeled ($U$). The equilibrium satisfies:

$$Y_U = (I - T_{UU})^{-1} T_{UL} Y_L$$

where $T_{UU}$ is the transition probability within unlabeled nodes and $T_{UL}$ from unlabeled to labeled. This shows labels propagate as a linear combination of labeled examples, weighted by graph proximity.

Theoretical Perspectives

Label propagation has elegant interpretations from multiple theoretical perspectives:

1. Random Walk Interpretation:

The soft label $Y_{ij}$ equals the probability that a random walk starting from node $i$ first reaches a labeled node of class $j$: $$Y_{ij} = P(\text{absorb at class } j | \text{start at } i)$$

Labeled nodes are absorbing states—once reached, the walk terminates.

2. Harmonic Function:

Label propagation finds a harmonic function on the graph—one that satisfies: $$f(i) = \sum_j w_{ij} f(j) / \sum_j w_{ij} \quad \forall i \in U$$

A harmonic function equals its weighted average over neighbors. This is the graph analog of solving Laplace's equation with Dirichlet boundary conditions.

3. Energy Minimization:

The solution minimizes the quadratic energy: $$E(f) = \frac{1}{2} \sum_{i,j} w_{ij}(f_i - f_j)^2 = f^T L f$$

subject to $f_L = y_L$. This encourages smoothness: connected nodes should have similar labels.

The Smoothness Assumption

All these interpretations encode the same assumption: labels should vary smoothly across the graph. Points strongly connected to class A should themselves be class A. Label propagation is optimal when the class boundary aligns with sparse graph connectivity.

Label Spreading: A Robust Variant

Label spreading (Zhou et al., 2004) modifies the objective to be more robust to label noise by allowing labeled nodes to change:

$$\min_F \sum_{i,j} w_{ij} \left| \frac{f_i}{\sqrt{d_i}} - \frac{f_j}{\sqrt{d_j}} \right|^2 + \mu \sum_i |f_i - y_i|^2$$

The first term encourages smoothness (normalized by degree). The second penalizes deviation from given labels, but allows some movement.

Closed-form Solution: $$F^* = (I - \alpha S)^{-1} Y$$

where $S = D^{-1/2} W D^{-1/2}$ is the normalized affinity and $\alpha = 1/(1+\mu)$ controls label influence vs. smoothness.

label_spreading.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from sklearn.semi_supervised import LabelSpreading, LabelPropagation
import numpy as np
 
# Using sklearn's implementations
def compare_propagation_methods(X, y_partial, n_labeled):
    """
    Compare label propagation vs label spreading.
    y_partial: labels with -1 for unlabeled
    """
    # Label Propagation (hard clamping)
    lp = LabelPropagation(kernel='rbf', gamma=20, max_iter=1000)
    lp.fit(X, y_partial)
    
    # Label Spreading (soft clamping, alpha controls)
    ls = LabelSpreading(kernel='rbf', gamma=20, alpha=0.2, max_iter=1000)
    ls.fit(X, y_partial)
    
    return {
        'propagation': {
            'predictions': lp.transduction_,
            'distributions': lp.label_distributions_
        },
        'spreading': {
            'predictions': ls.transduction_,
            'distributions': ls.label_distributions_
        }
    }

Label Propagation vs Spreading
Aspect	Propagation	Spreading
Labeled nodes	Clamped (fixed)	Soft (can change)
Label noise	Vulnerable	Robust
Normalization	Row-normalized (D⁻¹W)	Symmetric (D⁻¹/²WD⁻¹/²)
Hyperparameter	None (beyond graph)	α (clamp strength)

Practical Considerations

Scalability:

The closed-form solution requires inverting an $n_U \times n_U$ matrix—$O(n^3)$, infeasible for large datasets. Solutions:

Iterative methods: Converge in $O(km)$ where $k$ is iterations, $m$ is edges
Sparse graphs: Use kNN (sparse $W$) instead of fully connected
Approximate methods: Nyström approximation, anchor graphs

Inductive Extension:

Label propagation is transductive—it cannot predict on new samples not in the graph. For induction:

Add new sample to graph, recompute
Train a classifier on propagated labels
Use manifold regularization (next section)

Common Pitfalls

•Disconnected components — Nodes unreachable from labeled nodes get uniform labels
•Scale mismatch — Features with different scales distort distances; normalize first
•High-dimensional data — Euclidean distance becomes meaningless; use learned representations
•Class imbalance — Majority class labels dominate propagation

Summary

Key Takeaways

•Labels propagate through graph structure — Similarity defines how information flows
•Graph construction is crucial — kNN graphs balance sparsity and connectivity
•Closed-form solution exists — But iterative methods scale better
•Multiple theoretical views — Random walks, harmonic functions, energy minimization
•Label spreading handles noise — Soft clamping allows labeled nodes to adjust

What's Next:

Label propagation introduced the power of graph structure for semi-supervised learning. Next, we'll explore a broader family of graph-based methods including manifold regularization, graph neural networks for SSL, and methods that learn graphs rather than constructing them from distances.

Page Complete

You now understand label propagation: graph construction, the iterative algorithm, theoretical foundations, and the label spreading variant. These graph-based concepts form the foundation for modern graph neural network approaches to semi-supervised learning.