Markov Random Fields - Learning Module

Loading content...

0/245

Potential Functions

The Language of Compatibility

In Bayesian networks, we speak the language of conditional probabilities—each factor is a legitimate probability distribution answering questions like "What is the probability of X given its parents?" But Markov Random Fields speak a different language: the language of compatibility and affinity.

Potential functions don't answer probabilistic questions directly. Instead, they answer questions like: "How compatible are these variable values?" or "How favorable is this configuration?" This subtle but profound shift enables MRFs to model symmetric relationships where no variable 'causes' another, and provides deep connections to energy-based formulations from statistical physics.

Understanding potential functions is essential for working with MRFs—they are the mathematical machinery that translates graph structure into probability distributions.

What You Will Learn

By the end of this page, you will master the mathematical definition of potential functions, understand common parameterizations (log-linear models, table potentials, Gibbs potentials), learn how to construct potentials for different types of relationships, and grasp the critical distinction between potentials and probabilities.

Mathematical Definition of Potential Functions

A potential function (also called a factor, compatibility function, or clique potential) is a non-negative function defined over a subset of variables in an MRF.

Formal Definition

Let $C \subseteq V$ be a clique in an undirected graph $G = (V, E)$. A potential function over $C$ is a function $\psi_C: \text{dom}(\mathbf{X}C) \rightarrow \mathbb{R}{\geq 0}$ that maps each configuration of variables in $C$ to a non-negative real number.

Key Properties:

Non-negativity: $\psi_C(\mathbf{x}_C) \geq 0$ for all configurations $\mathbf{x}_C$
Local scope: The potential only depends on variables in the clique $C$
Not normalized: Potentials don't need to sum to 1; they measure relative compatibility
Interpretation: Higher values indicate more compatible/favorable configurations

The Joint Distribution:

Given a set of potential functions ${\psi_C}_{C \in \mathcal{C}}$ defined over cliques, the joint distribution is:

$$P(\mathbf{X} = \mathbf{x}) = \frac{1}{Z} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$

where $Z = \sum_{\mathbf{x}'} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}'_C)$ is the partition function ensuring the distribution sums to 1.

potential_function_basics.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
import numpy as np
from typing import Dict, Tuple, List, Callable, Any
from abc import ABC, abstractmethod
 
class PotentialFunction(ABC):
    """
    Abstract base class for potential functions in MRFs.
    
    A potential function maps configurations of variables to non-negative
    real numbers, measuring the "compatibility" or "favorability" of
    each configuration.
    """
    
    def __init__(self, scope: Tuple[int, ...]):
        """
        Initialize a potential function.
        
        Args:
            scope: Tuple of variable indices this potential depends on
        """
        self.scope = scope
        self.arity = len(scope)
    
    @abstractmethod
    def __call__(self, assignment: Dict[int, Any]) -> float:
        """
        Evaluate the potential for a given assignment.
        
        Args:
            assignment: Dictionary mapping variable indices to values
            
        Returns:
            Non-negative potential value
        """
        pass
    
    def get_scope_values(self, assignment: Dict[int, Any]) -> Tuple:
        """Extract values for variables in this potential's scope."""
        return tuple(assignment[var] for var in self.scope)
 
 
class TablePotential(PotentialFunction):
    """
    A potential function represented as an explicit table of values.
    
    This is the most general representation, but memory-intensive for
    large domains or high-arity potentials.
    """
    
    def __init__(self, scope: Tuple[int, ...], 
                 domain_sizes: Tuple[int, ...],
                 values: np.ndarray = None):
        """
        Initialize a table potential.
        
        Args:
            scope: Variable indices
            domain_sizes: Size of domain for each variable in scope
            values: Optional initial values (default: all ones)
        """
        super().__init__(scope)
        self.domain_sizes = domain_sizes
        
        if values is None:
            self.table = np.ones(domain_sizes)
        else:
            assert values.shape == domain_sizes
            assert np.all(values >= 0), "Potentials must be non-negative"
            self.table = values.copy()
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        """Look up potential value in the table."""
        index = self.get_scope_values(assignment)
        return float(self.table[index])
    
    def set_value(self, config: Tuple[int, ...], value: float):
        """Set the potential value for a specific configuration."""
        assert value >= 0, "Potential values must be non-negative"
        self.table[config] = value
    
    def normalize(self) -> 'TablePotential':
        """Return a normalized copy (sums to 1)."""
        normalized = TablePotential(self.scope, self.domain_sizes, 
                                    self.table / self.table.sum())
        return normalized
    
    def __repr__(self):
        return f"TablePotential(scope={self.scope}, shape={self.domain_sizes})"
 
 
# Example: Create a simple pairwise potential
# Two binary variables (X0, X1) that prefer to have the same value
 
# Scope: variables 0 and 1
# Domain: each variable has domain {0, 1}
pairwise_potential = TablePotential(
    scope=(0, 1),
    domain_sizes=(2, 2)
)
 
# Set values: high compatibility for matching, low for mismatching
pairwise_potential.set_value((0, 0), 5.0)  # Both 0: high compatibility
pairwise_potential.set_value((0, 1), 1.0)  # Different: low compatibility
pairwise_potential.set_value((1, 0), 1.0)  # Different: low compatibility
pairwise_potential.set_value((1, 1), 5.0)  # Both 1: high compatibility
 
print("Pairwise 'Agreement' Potential:")
print("=" * 40)
print(f"ψ(X₀=0, X₁=0) = {pairwise_potential({0: 0, 1: 0})}")
print(f"ψ(X₀=0, X₁=1) = {pairwise_potential({0: 0, 1: 1})}")
print(f"ψ(X₀=1, X₁=0) = {pairwise_potential({0: 1, 1: 0})}")
print(f"ψ(X₀=1, X₁=1) = {pairwise_potential({0: 1, 1: 1})}")
print(f"\nInterpretation: Matching values are 5x more compatible than mismatches")

Potentials vs. Probabilities: A Critical Distinction

One of the most common sources of confusion for newcomers to MRFs is the distinction between potential functions and probability distributions. Let's clarify this critical difference.

Potentials vs. Probabilities
Property	Potential Function $\psi$	Probability Distribution $P$
Range	$[0, \infty)$	$[0, 1]$
Normalization	Not required	Must sum/integrate to 1
Interpretation	Compatibility score	Likelihood of occurrence
Scale invariance	Can be multiplied by any positive constant	Fixed by normalization
Combination	Multiply to combine	Multiply, then renormalize
Zero values	Mean impossibility	Mean impossibility
Computing from	Usually defined directly	Computed from potentials via $Z$

A Common Mistake

Do NOT interpret potential values as probabilities! A potential of $\psi(x) = 100$ does not mean $P(x) = 100$ (which is impossible). Potentials are only meaningful relative to each other. The actual probability depends on the partition function $Z$, which normalizes the product of all potentials.

Why Use Potentials Instead of Probabilities?

This design choice has several important advantages:

Flexibility: Potentials can encode arbitrary symmetric relationships without satisfying normalization constraints
Modularity: You can define and combine potentials independently, deferring normalization to the end
Physics connection: Many natural phenomena are described by energy functions, which relate to potentials via $\psi = e^{-E}$
Learning convenience: During learning, you can optimize unnormalized scores and only compute $Z$ when needed
Avoiding underflow: Products of many small probabilities underflow; potentials can use any convenient scale

The Price We Pay:

The flexibility of potentials comes with a cost: computing the normalization constant $Z$ is typically intractable. This makes exact inference in MRFs harder than in many Bayesian networks, motivating the development of approximate inference algorithms.

potential_probability_distinction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import numpy as np
from typing import List
 
def demonstrate_potential_vs_probability():
    """
    Demonstrate the difference between potentials and probabilities.
    """
    
    # Consider a simple MRF with two binary variables
    # and pairwise + singleton potentials
    
    # Singleton potentials (unary): bias toward certain values
    psi_0 = {0: 2.0, 1: 3.0}  # Variable 0 prefers value 1
    psi_1 = {0: 4.0, 1: 1.0}  # Variable 1 prefers value 0
    
    # Pairwise potential: preference for agreement
    psi_01 = {
        (0, 0): 5.0,
        (0, 1): 1.0,
        (1, 0): 1.0,
        (1, 1): 5.0,
    }
    
    # Compute unnormalized "scores" for each configuration
    print("Computing Unnormalized Scores:")
    print("=" * 50)
    
    configs = [(0, 0), (0, 1), (1, 0), (1, 1)]
    scores = {}
    
    for x0, x1 in configs:
        # Product of all potentials
        score = psi_0[x0] * psi_1[x1] * psi_01[(x0, x1)]
        scores[(x0, x1)] = score
        print(f"Config (X₀={x0}, X₁={x1}): {psi_0[x0]} × {psi_1[x1]} × {psi_01[(x0, x1)]} = {score}")
    
    # Compute partition function
    Z = sum(scores.values())
    print(f"\nPartition Function Z = {Z}")
    
    # Compute actual probabilities
    print("\nActual Probabilities:")
    print("=" * 50)
    
    for config, score in scores.items():
        prob = score / Z
        print(f"P(X₀={config[0]}, X₁={config[1]}) = {score}/{Z} = {prob:.4f}")
    
    # Key insight: probabilities sum to 1
    print(f"\nSum of probabilities: {sum(s/Z for s in scores.values()):.6f}")
    
    # Demonstrate scale invariance of potentials
    print("\n" + "=" * 50)
    print("Scale Invariance Demonstration:")
    print("=" * 50)
    
    # Multiply all potentials by 100 - probabilities stay the same!
    scale = 100
    scaled_scores = {k: v * scale**3 for k, v in scores.items()}  # 3 potentials
    scaled_Z = sum(scaled_scores.values())
    
    print(f"After scaling all potentials by {scale}:")
    for config, score in scaled_scores.items():
        prob = score / scaled_Z
        print(f"P(X₀={config[0]}, X₁={config[1]}) = {prob:.4f}")
    
    print("\nProbabilities unchanged - only relative potentials matter!")
 
 
demonstrate_potential_vs_probability()

Common Potential Parameterizations

While potentials can be arbitrary non-negative functions, several parameterizations are particularly common and useful in practice. Each offers different tradeoffs between expressivity, interpretability, and computational efficiency.

The Log-Linear (Exponential Family) Form

The most widely used parameterization writes potentials in log-linear (or exponential family) form: $\psi_C(\mathbf{x}_C) = \exp\left(\sum_k \theta_k f_k(\mathbf{x}_C)\right)$, where $f_k$ are feature functions and $\theta_k$ are parameters. This form has deep connections to maximum entropy, convex optimization, and statistical learning.

1. Table Potentials (Tabular Form)

The most general representation explicitly stores a value for each configuration:

Representation: Multi-dimensional array indexed by variable values
Parameters: One parameter per configuration
Expressivity: Maximum—can represent any potential
Limitation: Memory grows exponentially with clique size
Use case: Small cliques with discrete variables

2. Log-Linear Potentials

Potentials in exponential form based on feature functions:

$$\psi_C(\mathbf{x}_C) = \exp\left(\sum_k \theta_k f_k(\mathbf{x}_C)\right)$$

Representation: Feature functions and weight vector
Parameters: Weights $\theta_k$ for each feature
Expressivity: Depends on feature design
Advantage: Compact, differentiable, convex learning
Use case: Most practical applications, especially with large state spaces

3. Gaussian Potentials

For continuous variables, quadratic forms in the exponent:

$$\psi_C(\mathbf{x}_C) = \exp\left(-\frac{1}{2}\mathbf{x}_C^T \Lambda_C \mathbf{x}_C + \mathbf{h}_C^T \mathbf{x}_C\right)$$

Representation: Precision matrix $\Lambda_C$ and potential vector $\mathbf{h}_C$
Parameters: Matrix and vector entries
Expressivity: Limited to Gaussian relationships
Advantage: Closed-form inference via matrix operations
Use case: Gaussian MRFs, Gaussian Processes in MRF form

potential_parameterizations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
import numpy as np
from typing import Callable, List
 
class LogLinearPotential(PotentialFunction):
    """
    Log-linear (exponential family) potential function.
    
    ψ(x) = exp(Σ θ_k f_k(x))
    
    This is the most common parameterization in practice, offering:
    - Compact representation via features
    - Guaranteed positivity (exp is always positive)  
    - Convex learning objectives
    - Natural connection to maximum entropy
    """
    
    def __init__(self, scope: Tuple[int, ...],
                 features: List[Callable],
                 weights: np.ndarray = None):
        """
        Initialize a log-linear potential.
        
        Args:
            scope: Variable indices
            features: List of feature functions f_k(assignment) -> float
            weights: Parameter weights (default: zeros)
        """
        super().__init__(scope)
        self.features = features
        self.num_features = len(features)
        
        if weights is None:
            self.weights = np.zeros(self.num_features)
        else:
            assert len(weights) == self.num_features
            self.weights = np.array(weights)
    
    def compute_feature_vector(self, assignment: Dict[int, Any]) -> np.ndarray:
        """Compute the feature vector for an assignment."""
        return np.array([f(assignment) for f in self.features])
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        """Compute exp(θ · f(x))."""
        features = self.compute_feature_vector(assignment)
        log_potential = np.dot(self.weights, features)
        return np.exp(log_potential)
    
    def log_potential(self, assignment: Dict[int, Any]) -> float:
        """Compute log ψ(x) = θ · f(x) directly (avoids exp)."""
        features = self.compute_feature_vector(assignment)
        return np.dot(self.weights, features)
 
 
class IsingPotential(PotentialFunction):
    """
    Ising model pairwise potential.
    
    The Ising model is a classic MRF from statistical physics:
    ψ(x_i, x_j) = exp(J * x_i * x_j + h_i * x_i + h_j * x_j)
    
    For binary variables x ∈ {-1, +1}:
    - J > 0 (ferromagnetic): prefers alignment
    - J < 0 (antiferromagnetic): prefers anti-alignment
    - h: external field (bias toward +1 or -1)
    """
    
    def __init__(self, var_i: int, var_j: int,
                 coupling: float = 1.0,
                 field_i: float = 0.0,
                 field_j: float = 0.0):
        """
        Initialize Ising potential.
        
        Args:
            var_i, var_j: Variable indices
            coupling: Coupling strength J
            field_i, field_j: External fields
        """
        super().__init__((var_i, var_j))
        self.coupling = coupling
        self.field_i = field_i
        self.field_j = field_j
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        """Compute Ising potential."""
        xi = assignment[self.scope[0]]  # Should be -1 or +1
        xj = assignment[self.scope[1]]
        
        energy = -(self.coupling * xi * xj + 
                   self.field_i * xi + 
                   self.field_j * xj)
        
        return np.exp(-energy)  # ψ = exp(-E)
 
 
class PottsPotential(PotentialFunction):
    """
    Potts model pairwise potential.
    
    Generalization of Ising to multi-valued discrete variables:
    ψ(x_i, x_j) = exp(J * δ(x_i, x_j))
    
    where δ(x_i, x_j) = 1 if x_i == x_j, else 0.
    
    This encourages neighboring variables to take the same value,
    commonly used in image segmentation.
    """
    
    def __init__(self, var_i: int, var_j: int,
                 coupling: float = 1.0):
        """
        Initialize Potts potential.
        
        Args:
            var_i, var_j: Variable indices  
            coupling: Coupling strength (higher = stronger preference for agreement)
        """
        super().__init__((var_i, var_j))
        self.coupling = coupling
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        """Compute Potts potential."""
        xi = assignment[self.scope[0]]
        xj = assignment[self.scope[1]]
        
        delta = 1.0 if xi == xj else 0.0
        return np.exp(self.coupling * delta)
 
 
# Demonstration of different potential types
print("Log-Linear Potential Example:")
print("=" * 50)
 
# Define feature functions for a pairwise potential
def feature_same(assignment):
    """1 if variables have same value, 0 otherwise."""
    return 1.0 if assignment[0] == assignment[1] else 0.0
 
def feature_both_one(assignment):
    """1 if both variables are 1."""
    return 1.0 if assignment[0] == 1 and assignment[1] == 1 else 0.0
 
log_linear = LogLinearPotential(
    scope=(0, 1),
    features=[feature_same, feature_both_one],
    weights=[2.0, 1.0]  # Strong preference for same, extra bonus for both=1
)
 
for x0 in [0, 1]:
    for x1 in [0, 1]:
        assignment = {0: x0, 1: x1}
        psi = log_linear(assignment)
        log_psi = log_linear.log_potential(assignment)
        print(f"ψ({x0}, {x1}) = exp({log_psi:.2f}) = {psi:.4f}")
 
print("\nIsing Potential Example (Ferromagnetic):")
print("=" * 50)
 
ising = IsingPotential(0, 1, coupling=1.0)  # J=1: ferromagnetic
 
for xi in [-1, 1]:
    for xj in [-1, 1]:
        assignment = {0: xi, 1: xj}
        psi = ising(assignment)
        print(f"ψ(x₀={xi:+d}, x₁={xj:+d}) = {psi:.4f}")
 
print("\nPotts Potential Example (3-state):")
print("=" * 50)
 
potts = PottsPotential(0, 1, coupling=2.0)
 
for xi in range(3):
    for xj in range(3):
        assignment = {0: xi, 1: xj}
        psi = potts(assignment)
        print(f"ψ(x₀={xi}, x₁={xj}) = {psi:.4f}")

Constructing Potentials for Different Relationships

Designing appropriate potential functions is both an art and a science. The potential should capture the domain knowledge about how variables interact. Let's explore common patterns for different types of relationships.

Common Potential Patterns

•Smoothness/Similarity: Neighboring variables should have similar values. Use: Potts potential, Gaussian potential, absolute difference penalty.
•Contrast/Difference: Neighboring variables should be different (e.g., graph coloring). Use: Anti-Potts potential (penalty for same values).
•Hard Constraints: Certain configurations are forbidden. Use: Zero potential for forbidden configs, positive for allowed.
•Soft Constraints: Preferences that can be violated at a cost. Use: Graduated potentials with lower (but non-zero) values for violations.
•Bias/Prior: Individual variables have baseline preferences. Use: Unary potentials encoding prior beliefs.

potential_patterns.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
class SmoothnessPotential(PotentialFunction):
    """
    Potential that encourages smooth/similar values between variables.
    
    For continuous: ψ(x_i, x_j) = exp(-λ|x_i - x_j|²)
    For discrete: ψ(x_i, x_j) = exp(-λ|x_i - x_j|)
    
    Used extensively in image denoising, depth estimation, etc.
    """
    
    def __init__(self, var_i: int, var_j: int, 
                 smoothness_weight: float = 1.0,
                 distance_type: str = "squared"):
        super().__init__((var_i, var_j))
        self.weight = smoothness_weight
        self.distance_type = distance_type
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        xi = assignment[self.scope[0]]
        xj = assignment[self.scope[1]]
        
        diff = abs(xi - xj)
        if self.distance_type == "squared":
            distance = diff ** 2
        elif self.distance_type == "absolute":
            distance = diff
        elif self.distance_type == "truncated":
            distance = min(diff, 2.0)  # Cap penalty
        else:
            distance = diff
            
        return np.exp(-self.weight * distance)
 
 
class ContrastPotential(PotentialFunction):
    """
    Potential that encourages different values (anti-smoothness).
    
    Useful for graph coloring, scheduling (avoid conflicts), etc.
    """
    
    def __init__(self, var_i: int, var_j: int,
                 penalty: float = 10.0):
        super().__init__((var_i, var_j))
        self.penalty = penalty
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        xi = assignment[self.scope[0]]
        xj = assignment[self.scope[1]]
        
        if xi == xj:
            return np.exp(-self.penalty)  # Strong penalty for same value
        else:
            return 1.0  # No penalty for different values
 
 
class HardConstraintPotential(PotentialFunction):
    """
    Potential encoding hard constraints.
    
    Returns 1.0 for allowed configurations, 0.0 for forbidden.
    Zero probability for forbidden configurations.
    """
    
    def __init__(self, scope: Tuple[int, ...],
                 is_allowed: Callable[[Dict[int, Any]], bool]):
        super().__init__(scope)
        self.is_allowed = is_allowed
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        return 1.0 if self.is_allowed(assignment) else 0.0
 
 
class DataTermPotential(PotentialFunction):
    """
    Unary potential encoding how well a variable value matches observations.
    
    Common in image segmentation:
    ψ(x_i) = exp(-λ * ||I_i - μ_{x_i}||²)
    
    where I_i is observed pixel value and μ_{x_i} is mean for label x_i.
    """
    
    def __init__(self, var: int, 
                 observed_value: float,
                 label_means: Dict[int, float],
                 weight: float = 1.0):
        super().__init__((var,))
        self.observed = observed_value
        self.means = label_means
        self.weight = weight
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        label = assignment[self.scope[0]]
        expected = self.means[label]
        error = (self.observed - expected) ** 2
        return np.exp(-self.weight * error)
 
 
# Example: Build an image segmentation MRF
def build_segmentation_mrf_potentials(
    image: np.ndarray,
    fg_mean: float = 200,
    bg_mean: float = 50,
    smoothness: float = 2.0
) -> Tuple[List[PotentialFunction], dict]:
    """
    Build potentials for binary image segmentation.
    
    Args:
        image: 2D grayscale image
        fg_mean: Expected foreground pixel value
        bg_mean: Expected background pixel value
        smoothness: Weight for pairwise smoothness term
        
    Returns:
        List of potential functions and metadata
    """
    height, width = image.shape
    potentials = []
    
    label_means = {0: bg_mean, 1: fg_mean}  # 0=background, 1=foreground
    
    # Data terms (unary potentials)
    for row in range(height):
        for col in range(width):
            pixel_idx = row * width + col
            pixel_value = image[row, col]
            
            data_potential = DataTermPotential(
                var=pixel_idx,
                observed_value=pixel_value,
                label_means=label_means,
                weight=0.01
            )
            potentials.append(data_potential)
    
    # Smoothness terms (pairwise potentials)
    for row in range(height):
        for col in range(width):
            pixel_idx = row * width + col
            
            # Right neighbor
            if col < width - 1:
                right_idx = pixel_idx + 1
                potentials.append(
                    PottsPotential(pixel_idx, right_idx, coupling=smoothness)
                )
            
            # Down neighbor
            if row < height - 1:
                down_idx = pixel_idx + width
                potentials.append(
                    PottsPotential(pixel_idx, down_idx, coupling=smoothness)
                )
    
    metadata = {
        'num_variables': height * width,
        'num_unary': height * width,
        'num_pairwise': len(potentials) - height * width,
        'labels': {0: 'background', 1: 'foreground'}
    }
    
    return potentials, metadata
 
 
# Demonstrate with a small synthetic image
print("Image Segmentation MRF Construction:")
print("=" * 50)
 
# Create a 3x3 synthetic image
synthetic_image = np.array([
    [50, 60, 55],     # Dark top row (background)
    [180, 200, 190],  # Bright middle row (foreground)
    [45, 55, 50],     # Dark bottom row (background)
])
 
potentials, meta = build_segmentation_mrf_potentials(
    synthetic_image, 
    fg_mean=190, 
    bg_mean=55,
    smoothness=2.0
)
 
print(f"Number of variables: {meta['num_variables']}")
print(f"Number of unary potentials: {meta['num_unary']}")  
print(f"Number of pairwise potentials: {meta['num_pairwise']}")
print(f"Total potentials: {len(potentials)}")

The Energy Function Perspective

There's an illuminating alternative view of MRFs that comes from statistical physics: the energy-based perspective. Instead of thinking about compatibility scores that we want to maximize, we think about energy costs that we want to minimize.

The Gibbs Distribution

In statistical physics, systems tend toward low-energy configurations. The probability of a configuration is given by the Gibbs (or Boltzmann) distribution: $P(\mathbf{x}) = \frac{1}{Z} \exp(-E(\mathbf{x})/T)$, where $E(\mathbf{x})$ is the energy and $T$ is temperature. Setting $T = 1$, we get the standard MRF form with $\psi(\mathbf{x}) = \exp(-E(\mathbf{x}))$.

The Potential-Energy Relationship:

Potentials and energies are related by a simple transformation:

$$\psi_C(\mathbf{x}_C) = \exp(-E_C(\mathbf{x}_C))$$ $$E_C(\mathbf{x}_C) = -\log \psi_C(\mathbf{x}_C)$$

This relationship means:

High potential ⟷ Low energy (favorable)
Low potential ⟷ High energy (unfavorable)
Zero potential ⟷ Infinite energy (impossible)

Total Energy:

Since potentials multiply and logarithms convert products to sums:

$$E(\mathbf{x}) = \sum_{C \in \mathcal{C}} E_C(\mathbf{x}C) = -\sum{C \in \mathcal{C}} \log \psi_C(\mathbf{x}_C)$$

This additive structure of energy is often more intuitive than the multiplicative structure of potentials.

Potential View

•Potentials are multiplied
•Higher is better (more compatible)
•Scale: $[0, \infty)$
•Common form: tables, features
•Natural for: soft matching, compatibility

Energy View

•Energies are added
•Lower is better (more stable)
•Scale: $(-\infty, \infty)$
•Common form: cost functions, penalties
•Natural for: optimization, physics

energy_based_view.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
class EnergyBasedMRF:
    """
    MRF formulated in terms of energy functions instead of potentials.
    
    P(x) = (1/Z) exp(-E(x))
    
    where E(x) = Σ E_c(x_c) is the sum of local energy terms.
    """
    
    def __init__(self):
        self.energy_terms: List[Callable] = []
    
    def add_energy_term(self, energy_func: Callable):
        """Add an energy term to the model."""
        self.energy_terms.append(energy_func)
    
    def total_energy(self, assignment: Dict[int, Any]) -> float:
        """Compute total energy (sum of all terms)."""
        return sum(e(assignment) for e in self.energy_terms)
    
    def unnormalized_probability(self, assignment: Dict[int, Any]) -> float:
        """Compute exp(-E(x)) - unnormalized probability."""
        return np.exp(-self.total_energy(assignment))
    
    def find_map_brute_force(self, variable_domains: Dict[int, List]) -> Dict[int, Any]:
        """
        Find MAP (Maximum A Posteriori) assignment by brute force.
        
        MAP = argmin E(x) = argmax P(x)
        
        For small problems only!
        """
        from itertools import product
        
        variables = sorted(variable_domains.keys())
        domains = [variable_domains[v] for v in variables]
        
        best_assignment = None
        best_energy = float('inf')
        
        for values in product(*domains):
            assignment = dict(zip(variables, values))
            energy = self.total_energy(assignment)
            
            if energy < best_energy:
                best_energy = energy
                best_assignment = assignment
        
        return best_assignment, best_energy
 
 
def create_ising_model_energy(adjacency_list: Dict[int, List[int]],
                              couplings: Dict[Tuple[int, int], float],
                              fields: Dict[int, float]) -> EnergyBasedMRF:
    """
    Create an Ising model in energy form.
    
    E(x) = -Σ_{i,j} J_{ij} x_i x_j - Σ_i h_i x_i
    
    Args:
        adjacency_list: Graph structure
        couplings: J_{ij} values for each edge
        fields: h_i values for each node
    """
    model = EnergyBasedMRF()
    
    # Pairwise energy terms
    added_edges = set()
    for i, neighbors in adjacency_list.items():
        for j in neighbors:
            edge = tuple(sorted([i, j]))
            if edge not in added_edges:
                J = couplings.get(edge, 0.0)
                
                def pairwise_energy(assignment, i=i, j=j, J=J):
                    return -J * assignment[i] * assignment[j]
                
                model.add_energy_term(pairwise_energy)
                added_edges.add(edge)
    
    # Unary energy terms (external field)
    for i, h in fields.items():
        def field_energy(assignment, i=i, h=h):
            return -h * assignment[i]
        
        model.add_energy_term(field_energy)
    
    return model
 
 
# Example: Small 2D Ising model
print("Energy-Based Ising Model:")
print("=" * 50)
 
# 2x2 grid of spins
#  0 - 1
#  |   |
#  2 - 3
 
adjacency = {
    0: [1, 2],
    1: [0, 3],
    2: [0, 3],
    3: [1, 2]
}
 
# Ferromagnetic coupling (J > 0)
couplings = {
    (0, 1): 1.0,
    (0, 2): 1.0,
    (1, 3): 1.0,
    (2, 3): 1.0,
}
 
# Small external field favoring +1
fields = {0: 0.1, 1: 0.1, 2: 0.1, 3: 0.1}
 
ising_model = create_ising_model_energy(adjacency, couplings, fields)
 
# Evaluate some configurations
configs = [
    {0: -1, 1: -1, 2: -1, 3: -1},  # All down
    {0: +1, 1: +1, 2: +1, 3: +1},  # All up
    {0: +1, 1: -1, 2: -1, 3: +1},  # Checkerboard
    {0: +1, 1: +1, 2: -1, 3: -1},  # Half and half
]
 
for config in configs:
    E = ising_model.total_energy(config)
    P_unnorm = ising_model.unnormalized_probability(config)
    spin_str = "".join("+" if config[i] > 0 else "-" for i in range(4))
    print(f"Config [{spin_str}]: E = {E:+.2f}, exp(-E) = {P_unnorm:.4f}")
 
# Find MAP assignment
map_assignment, map_energy = ising_model.find_map_brute_force(
    {i: [-1, +1] for i in range(4)}
)
spin_str = "".join("+" if map_assignment[i] > 0 else "-" for i in range(4))
print(f"\nMAP assignment: [{spin_str}] with energy {map_energy:+.2f}")

Higher-Order Potentials

So far, we've focused primarily on unary and pairwise potentials. However, many real-world relationships involve three or more variables simultaneously. These are called higher-order potentials.

The Computational Challenge

Higher-order potentials introduce significant computational challenges. A potential over $k$ binary variables has $2^k$ possible values to store and enumerate. Inference algorithms become substantially more complex. Despite this, higher-order potentials are crucial for capturing rich dependencies that pairwise models cannot express.

Why Higher-Order Potentials Matter:

Richer Expressivity: Some constraints fundamentally involve multiple variables
- "At most 2 of these 5 variables can be true"
- "These pixels form a coherent region"
- "This clique should be mostly homogeneous"
Better Modeling Accuracy: Pairwise models can sometimes approximate higher-order effects, but often poorly
Natural Problem Structure: Many problems (e.g., XOR constraints, parity checks) are inherently higher-order

Common Higher-Order Patterns:

Cardinality potentials: Depend on how many variables are ON (e.g., "exactly 3 must be true")
Pattern potentials: Favor or penalize specific patterns (e.g., detect T-junctions in images)
Robust higher-order potentials: Tolerate some outliers but penalize excessive violations

higher_order_potentials.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class CardinalityPotential(PotentialFunction):
    """
    Higher-order potential that depends only on the count of ON variables.
    
    ψ(x_1, ..., x_k) = f(Σ_i x_i)
    
    This is more efficient than general higher-order potentials because
    we only need to store k+1 values (for counts 0, 1, ..., k).
    
    Applications:
    - "At least k variables must be ON"
    - "At most k variables can be ON"  
    - "Exactly k variables should be ON"
    """
    
    def __init__(self, scope: Tuple[int, ...],
                 cardinality_values: List[float]):
        """
        Initialize cardinality potential.
        
        Args:
            scope: Variable indices
            cardinality_values: Potential value for each count (0, 1, ..., k)
        """
        super().__init__(scope)
        k = len(scope)
        assert len(cardinality_values) == k + 1
        self.values = np.array(cardinality_values)
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        count = sum(assignment[v] for v in self.scope)
        return float(self.values[count])
 
 
class AtMostKPotential(CardinalityPotential):
    """At most K variables can be ON (1)."""
    
    def __init__(self, scope: Tuple[int, ...], k: int, 
                 violation_penalty: float = 0.0):
        n = len(scope)
        values = [1.0 if i <= k else violation_penalty for i in range(n + 1)]
        super().__init__(scope, values)
        self.k = k
 
 
class ExactlyKPotential(CardinalityPotential):
    """Exactly K variables should be ON (1)."""
    
    def __init__(self, scope: Tuple[int, ...], k: int,
                 correct_value: float = 1.0,
                 violation_penalty: float = 0.0):
        n = len(scope)  
        values = [correct_value if i == k else violation_penalty for i in range(n + 1)]
        super().__init__(scope, values)
        self.k = k
 
 
class RobustPnPotential(PotentialFunction):
    """
    Robust P^n Potts potential from Kohli et al.
    
    A higher-order potential that encourages label consistency
    but is robust to a small number of outliers.
    
    If all variables agree: low energy
    If up to k disagree: gradually increasing energy  
    If many disagree: truncated energy (doesn't penalize further)
    
    This is critical for image segmentation with outliers.
    """
    
    def __init__(self, scope: Tuple[int, ...],
                 num_labels: int,
                 gamma_max: float = 10.0,
                 truncation: float = 0.5):
        """
        Args:
            scope: Variable indices
            num_labels: Number of possible label values
            gamma_max: Maximum penalty
            truncation: Fraction at which penalty is truncated
        """
        super().__init__(scope)
        self.num_labels = num_labels
        self.gamma_max = gamma_max
        self.truncation = truncation
    
    def __call__(self, assignment: Dict[int, Any]) -> float:
        # Count label frequencies
        labels = [assignment[v] for v in self.scope]
        label_counts = {}
        for l in labels:
            label_counts[l] = label_counts.get(l, 0) + 1
        
        # Find majority label
        max_count = max(label_counts.values())
        minority_count = len(labels) - max_count  # Number of "outliers"
        
        # Compute energy
        n = len(labels)
        truncation_point = int(self.truncation * n)
        
        if minority_count <= truncation_point:
            energy = (minority_count / truncation_point) * self.gamma_max
        else:
            energy = self.gamma_max
        
        return np.exp(-energy)
 
 
# Demonstration
print("Higher-Order Potential Examples:")
print("=" * 50)
 
# At Most 2 potential over 4 variables
at_most_2 = AtMostKPotential(scope=(0, 1, 2, 3), k=2, violation_penalty=0.01)
 
print("At Most 2 Potential (4 binary variables):")
for count in range(5):
    # Create assignment with 'count' ones
    assignment = {i: 1 if i < count else 0 for i in range(4)}
    psi = at_most_2(assignment)
    print(f"  Count={count}: ψ = {psi}")
 
# Exactly 2 potential
print("\nExactly 2 Potential (4 binary variables):")
exactly_2 = ExactlyKPotential(scope=(0, 1, 2, 3), k=2, violation_penalty=0.01)
 
for count in range(5):
    assignment = {i: 1 if i < count else 0 for i in range(4)}
    psi = exactly_2(assignment)
    print(f"  Count={count}: ψ = {psi}")
 
# Robust P^n Potts
print("\nRobust P^n Potts (6 variables, 3 labels):")
robust = RobustPnPotential(
    scope=tuple(range(6)),
    num_labels=3,
    gamma_max=5.0,
    truncation=0.33  # Truncate at 2 outliers
)
 
configs = [
    [0, 0, 0, 0, 0, 0],  # All same
    [0, 0, 0, 0, 0, 1],  # 1 outlier
    [0, 0, 0, 0, 1, 2],  # 2 outliers
    [0, 0, 0, 1, 1, 2],  # 3 outliers
    [0, 0, 1, 1, 2, 2],  # Maximum diversity
]
 
for config in configs:
    assignment = {i: config[i] for i in range(6)}
    psi = robust(assignment)
    outliers = 6 - max(config.count(l) for l in set(config))
    print(f"  {config} (outliers={outliers}): ψ = {psi:.4f}")

Potential Function Design Principles

Designing effective potential functions requires balancing expressivity, computational tractability, and domain knowledge. Here are key principles that guide practitioners.

Design Principles for Potentials

•Start Simple: Begin with unary and pairwise potentials. Add higher-order only if clearly needed. Complex models are harder to train and infer over.
•Use Log-Linear Form When Possible: The exponential family form offers convex learning, gradient-based optimization, and theoretical guarantees.
•Design Interpretable Features: For log-linear potentials, choose features that are meaningful and inspectable. This aids debugging and model understanding.
•Consider Computational Structure: Some potentials (e.g., cardinality) allow specialized efficient inference. Design potentials with tractability in mind.
•Avoid Hard Zeros Carefully: Zero potentials (infinite energy) create hard constraints. Ensure the feasible region is non-empty and connected.
•Balance Data and Smoothness Terms: In applications like image segmentation, data terms and pairwise smoothness terms must be balanced. Too much smoothness over-smooths; too little creates noise.
•Validate on Simple Cases: Before using complex potentials, verify behavior on simple, interpretable test cases where you know the expected answer.

The Art of Feature Engineering

For log-linear models, the choice of features $f_k(\mathbf{x}_C)$ is critical. Good features capture meaningful patterns; poor features waste parameters or miss important structure. Feature engineering for MRFs shares much with feature engineering for other ML models: domain knowledge, iteration, and validation are key.

Common Pitfalls to Avoid:

Over-parameterization: Using full tables when simpler parameterizations suffice
Scale mismatch: Potentials operating at very different scales, causing some to dominate
Symmetry neglect: Missing symmetries that could reduce the parameter count
Ignoring tractability: Designing potentials that preclude efficient inference
Insufficient regularization: Log-linear models with too many features overfit

Summary: Mastering Potential Functions

Potential functions are the mathematical heart of Markov Random Fields. Let's consolidate the key concepts:

Key Takeaways

•Potentials measure compatibility, not probability — They assign non-negative scores to variable configurations, with higher scores indicating more favorable states.
•Potentials are not normalized — Unlike conditional probabilities, potentials can take any non-negative value. Normalization happens globally via the partition function.
•Log-linear form dominates practice — The exponential family parameterization $\psi = \exp(\theta \cdot f)$ offers convex learning, interpretable features, and computational benefits.
•Energy is the negative log of potential — The energy perspective from physics provides intuition: low energy = high probability. Energies add where potentials multiply.
•Common patterns exist for standard relationships — Smoothness (Potts), contrast, hard/soft constraints, and data fidelity terms address most application needs.
•Higher-order potentials capture complex dependencies — Cardinality, pattern, and robust potentials model relationships that pairwise terms cannot, at increased computational cost.
•Design requires balancing expressivity and tractability — Start simple, use structured potentials when possible, and validate on interpretable cases.

What's Next:

With potentials defined, a critical question remains: How do we ensure the resulting distribution is properly normalized? The next page explores the Partition Function—the normalization constant that converts products of potentials into a valid probability distribution. We'll see why computing $Z$ exactly is typically intractable and preview the approximate methods that make MRF inference practical.

Page Complete

You now understand how potential functions define compatibility relationships in MRFs. You can create table potentials, log-linear potentials, and specialized potentials for common patterns. You understand the energy-based perspective and can design potentials for practical applications like image segmentation.