Loading content...
In Bayesian networks, we speak the language of conditional probabilities—each factor is a legitimate probability distribution answering questions like "What is the probability of X given its parents?" But Markov Random Fields speak a different language: the language of compatibility and affinity.
Potential functions don't answer probabilistic questions directly. Instead, they answer questions like: "How compatible are these variable values?" or "How favorable is this configuration?" This subtle but profound shift enables MRFs to model symmetric relationships where no variable 'causes' another, and provides deep connections to energy-based formulations from statistical physics.
Understanding potential functions is essential for working with MRFs—they are the mathematical machinery that translates graph structure into probability distributions.
By the end of this page, you will master the mathematical definition of potential functions, understand common parameterizations (log-linear models, table potentials, Gibbs potentials), learn how to construct potentials for different types of relationships, and grasp the critical distinction between potentials and probabilities.
A potential function (also called a factor, compatibility function, or clique potential) is a non-negative function defined over a subset of variables in an MRF.
Let $C \subseteq V$ be a clique in an undirected graph $G = (V, E)$. A potential function over $C$ is a function $\psi_C: \text{dom}(\mathbf{X}C) \rightarrow \mathbb{R}{\geq 0}$ that maps each configuration of variables in $C$ to a non-negative real number.
Key Properties:
The Joint Distribution:
Given a set of potential functions ${\psi_C}_{C \in \mathcal{C}}$ defined over cliques, the joint distribution is:
$$P(\mathbf{X} = \mathbf{x}) = \frac{1}{Z} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$
where $Z = \sum_{\mathbf{x}'} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}'_C)$ is the partition function ensuring the distribution sums to 1.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
import numpy as npfrom typing import Dict, Tuple, List, Callable, Anyfrom abc import ABC, abstractmethod class PotentialFunction(ABC): """ Abstract base class for potential functions in MRFs. A potential function maps configurations of variables to non-negative real numbers, measuring the "compatibility" or "favorability" of each configuration. """ def __init__(self, scope: Tuple[int, ...]): """ Initialize a potential function. Args: scope: Tuple of variable indices this potential depends on """ self.scope = scope self.arity = len(scope) @abstractmethod def __call__(self, assignment: Dict[int, Any]) -> float: """ Evaluate the potential for a given assignment. Args: assignment: Dictionary mapping variable indices to values Returns: Non-negative potential value """ pass def get_scope_values(self, assignment: Dict[int, Any]) -> Tuple: """Extract values for variables in this potential's scope.""" return tuple(assignment[var] for var in self.scope) class TablePotential(PotentialFunction): """ A potential function represented as an explicit table of values. This is the most general representation, but memory-intensive for large domains or high-arity potentials. """ def __init__(self, scope: Tuple[int, ...], domain_sizes: Tuple[int, ...], values: np.ndarray = None): """ Initialize a table potential. Args: scope: Variable indices domain_sizes: Size of domain for each variable in scope values: Optional initial values (default: all ones) """ super().__init__(scope) self.domain_sizes = domain_sizes if values is None: self.table = np.ones(domain_sizes) else: assert values.shape == domain_sizes assert np.all(values >= 0), "Potentials must be non-negative" self.table = values.copy() def __call__(self, assignment: Dict[int, Any]) -> float: """Look up potential value in the table.""" index = self.get_scope_values(assignment) return float(self.table[index]) def set_value(self, config: Tuple[int, ...], value: float): """Set the potential value for a specific configuration.""" assert value >= 0, "Potential values must be non-negative" self.table[config] = value def normalize(self) -> 'TablePotential': """Return a normalized copy (sums to 1).""" normalized = TablePotential(self.scope, self.domain_sizes, self.table / self.table.sum()) return normalized def __repr__(self): return f"TablePotential(scope={self.scope}, shape={self.domain_sizes})" # Example: Create a simple pairwise potential# Two binary variables (X0, X1) that prefer to have the same value # Scope: variables 0 and 1# Domain: each variable has domain {0, 1}pairwise_potential = TablePotential( scope=(0, 1), domain_sizes=(2, 2)) # Set values: high compatibility for matching, low for mismatchingpairwise_potential.set_value((0, 0), 5.0) # Both 0: high compatibilitypairwise_potential.set_value((0, 1), 1.0) # Different: low compatibilitypairwise_potential.set_value((1, 0), 1.0) # Different: low compatibilitypairwise_potential.set_value((1, 1), 5.0) # Both 1: high compatibility print("Pairwise 'Agreement' Potential:")print("=" * 40)print(f"ψ(X₀=0, X₁=0) = {pairwise_potential({0: 0, 1: 0})}")print(f"ψ(X₀=0, X₁=1) = {pairwise_potential({0: 0, 1: 1})}")print(f"ψ(X₀=1, X₁=0) = {pairwise_potential({0: 1, 1: 0})}")print(f"ψ(X₀=1, X₁=1) = {pairwise_potential({0: 1, 1: 1})}")print(f"\nInterpretation: Matching values are 5x more compatible than mismatches")One of the most common sources of confusion for newcomers to MRFs is the distinction between potential functions and probability distributions. Let's clarify this critical difference.
| Property | Potential Function $\psi$ | Probability Distribution $P$ |
|---|---|---|
| Range | $[0, \infty)$ | $[0, 1]$ |
| Normalization | Not required | Must sum/integrate to 1 |
| Interpretation | Compatibility score | Likelihood of occurrence |
| Scale invariance | Can be multiplied by any positive constant | Fixed by normalization |
| Combination | Multiply to combine | Multiply, then renormalize |
| Zero values | Mean impossibility | Mean impossibility |
| Computing from | Usually defined directly | Computed from potentials via $Z$ |
Do NOT interpret potential values as probabilities! A potential of $\psi(x) = 100$ does not mean $P(x) = 100$ (which is impossible). Potentials are only meaningful relative to each other. The actual probability depends on the partition function $Z$, which normalizes the product of all potentials.
Why Use Potentials Instead of Probabilities?
This design choice has several important advantages:
Flexibility: Potentials can encode arbitrary symmetric relationships without satisfying normalization constraints
Modularity: You can define and combine potentials independently, deferring normalization to the end
Physics connection: Many natural phenomena are described by energy functions, which relate to potentials via $\psi = e^{-E}$
Learning convenience: During learning, you can optimize unnormalized scores and only compute $Z$ when needed
Avoiding underflow: Products of many small probabilities underflow; potentials can use any convenient scale
The Price We Pay:
The flexibility of potentials comes with a cost: computing the normalization constant $Z$ is typically intractable. This makes exact inference in MRFs harder than in many Bayesian networks, motivating the development of approximate inference algorithms.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
import numpy as npfrom typing import List def demonstrate_potential_vs_probability(): """ Demonstrate the difference between potentials and probabilities. """ # Consider a simple MRF with two binary variables # and pairwise + singleton potentials # Singleton potentials (unary): bias toward certain values psi_0 = {0: 2.0, 1: 3.0} # Variable 0 prefers value 1 psi_1 = {0: 4.0, 1: 1.0} # Variable 1 prefers value 0 # Pairwise potential: preference for agreement psi_01 = { (0, 0): 5.0, (0, 1): 1.0, (1, 0): 1.0, (1, 1): 5.0, } # Compute unnormalized "scores" for each configuration print("Computing Unnormalized Scores:") print("=" * 50) configs = [(0, 0), (0, 1), (1, 0), (1, 1)] scores = {} for x0, x1 in configs: # Product of all potentials score = psi_0[x0] * psi_1[x1] * psi_01[(x0, x1)] scores[(x0, x1)] = score print(f"Config (X₀={x0}, X₁={x1}): {psi_0[x0]} × {psi_1[x1]} × {psi_01[(x0, x1)]} = {score}") # Compute partition function Z = sum(scores.values()) print(f"\nPartition Function Z = {Z}") # Compute actual probabilities print("\nActual Probabilities:") print("=" * 50) for config, score in scores.items(): prob = score / Z print(f"P(X₀={config[0]}, X₁={config[1]}) = {score}/{Z} = {prob:.4f}") # Key insight: probabilities sum to 1 print(f"\nSum of probabilities: {sum(s/Z for s in scores.values()):.6f}") # Demonstrate scale invariance of potentials print("\n" + "=" * 50) print("Scale Invariance Demonstration:") print("=" * 50) # Multiply all potentials by 100 - probabilities stay the same! scale = 100 scaled_scores = {k: v * scale**3 for k, v in scores.items()} # 3 potentials scaled_Z = sum(scaled_scores.values()) print(f"After scaling all potentials by {scale}:") for config, score in scaled_scores.items(): prob = score / scaled_Z print(f"P(X₀={config[0]}, X₁={config[1]}) = {prob:.4f}") print("\nProbabilities unchanged - only relative potentials matter!") demonstrate_potential_vs_probability()While potentials can be arbitrary non-negative functions, several parameterizations are particularly common and useful in practice. Each offers different tradeoffs between expressivity, interpretability, and computational efficiency.
The most widely used parameterization writes potentials in log-linear (or exponential family) form: $\psi_C(\mathbf{x}_C) = \exp\left(\sum_k \theta_k f_k(\mathbf{x}_C)\right)$, where $f_k$ are feature functions and $\theta_k$ are parameters. This form has deep connections to maximum entropy, convex optimization, and statistical learning.
1. Table Potentials (Tabular Form)
The most general representation explicitly stores a value for each configuration:
2. Log-Linear Potentials
Potentials in exponential form based on feature functions:
$$\psi_C(\mathbf{x}_C) = \exp\left(\sum_k \theta_k f_k(\mathbf{x}_C)\right)$$
3. Gaussian Potentials
For continuous variables, quadratic forms in the exponent:
$$\psi_C(\mathbf{x}_C) = \exp\left(-\frac{1}{2}\mathbf{x}_C^T \Lambda_C \mathbf{x}_C + \mathbf{h}_C^T \mathbf{x}_C\right)$$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176
import numpy as npfrom typing import Callable, List class LogLinearPotential(PotentialFunction): """ Log-linear (exponential family) potential function. ψ(x) = exp(Σ θ_k f_k(x)) This is the most common parameterization in practice, offering: - Compact representation via features - Guaranteed positivity (exp is always positive) - Convex learning objectives - Natural connection to maximum entropy """ def __init__(self, scope: Tuple[int, ...], features: List[Callable], weights: np.ndarray = None): """ Initialize a log-linear potential. Args: scope: Variable indices features: List of feature functions f_k(assignment) -> float weights: Parameter weights (default: zeros) """ super().__init__(scope) self.features = features self.num_features = len(features) if weights is None: self.weights = np.zeros(self.num_features) else: assert len(weights) == self.num_features self.weights = np.array(weights) def compute_feature_vector(self, assignment: Dict[int, Any]) -> np.ndarray: """Compute the feature vector for an assignment.""" return np.array([f(assignment) for f in self.features]) def __call__(self, assignment: Dict[int, Any]) -> float: """Compute exp(θ · f(x)).""" features = self.compute_feature_vector(assignment) log_potential = np.dot(self.weights, features) return np.exp(log_potential) def log_potential(self, assignment: Dict[int, Any]) -> float: """Compute log ψ(x) = θ · f(x) directly (avoids exp).""" features = self.compute_feature_vector(assignment) return np.dot(self.weights, features) class IsingPotential(PotentialFunction): """ Ising model pairwise potential. The Ising model is a classic MRF from statistical physics: ψ(x_i, x_j) = exp(J * x_i * x_j + h_i * x_i + h_j * x_j) For binary variables x ∈ {-1, +1}: - J > 0 (ferromagnetic): prefers alignment - J < 0 (antiferromagnetic): prefers anti-alignment - h: external field (bias toward +1 or -1) """ def __init__(self, var_i: int, var_j: int, coupling: float = 1.0, field_i: float = 0.0, field_j: float = 0.0): """ Initialize Ising potential. Args: var_i, var_j: Variable indices coupling: Coupling strength J field_i, field_j: External fields """ super().__init__((var_i, var_j)) self.coupling = coupling self.field_i = field_i self.field_j = field_j def __call__(self, assignment: Dict[int, Any]) -> float: """Compute Ising potential.""" xi = assignment[self.scope[0]] # Should be -1 or +1 xj = assignment[self.scope[1]] energy = -(self.coupling * xi * xj + self.field_i * xi + self.field_j * xj) return np.exp(-energy) # ψ = exp(-E) class PottsPotential(PotentialFunction): """ Potts model pairwise potential. Generalization of Ising to multi-valued discrete variables: ψ(x_i, x_j) = exp(J * δ(x_i, x_j)) where δ(x_i, x_j) = 1 if x_i == x_j, else 0. This encourages neighboring variables to take the same value, commonly used in image segmentation. """ def __init__(self, var_i: int, var_j: int, coupling: float = 1.0): """ Initialize Potts potential. Args: var_i, var_j: Variable indices coupling: Coupling strength (higher = stronger preference for agreement) """ super().__init__((var_i, var_j)) self.coupling = coupling def __call__(self, assignment: Dict[int, Any]) -> float: """Compute Potts potential.""" xi = assignment[self.scope[0]] xj = assignment[self.scope[1]] delta = 1.0 if xi == xj else 0.0 return np.exp(self.coupling * delta) # Demonstration of different potential typesprint("Log-Linear Potential Example:")print("=" * 50) # Define feature functions for a pairwise potentialdef feature_same(assignment): """1 if variables have same value, 0 otherwise.""" return 1.0 if assignment[0] == assignment[1] else 0.0 def feature_both_one(assignment): """1 if both variables are 1.""" return 1.0 if assignment[0] == 1 and assignment[1] == 1 else 0.0 log_linear = LogLinearPotential( scope=(0, 1), features=[feature_same, feature_both_one], weights=[2.0, 1.0] # Strong preference for same, extra bonus for both=1) for x0 in [0, 1]: for x1 in [0, 1]: assignment = {0: x0, 1: x1} psi = log_linear(assignment) log_psi = log_linear.log_potential(assignment) print(f"ψ({x0}, {x1}) = exp({log_psi:.2f}) = {psi:.4f}") print("\nIsing Potential Example (Ferromagnetic):")print("=" * 50) ising = IsingPotential(0, 1, coupling=1.0) # J=1: ferromagnetic for xi in [-1, 1]: for xj in [-1, 1]: assignment = {0: xi, 1: xj} psi = ising(assignment) print(f"ψ(x₀={xi:+d}, x₁={xj:+d}) = {psi:.4f}") print("\nPotts Potential Example (3-state):")print("=" * 50) potts = PottsPotential(0, 1, coupling=2.0) for xi in range(3): for xj in range(3): assignment = {0: xi, 1: xj} psi = potts(assignment) print(f"ψ(x₀={xi}, x₁={xj}) = {psi:.4f}")Designing appropriate potential functions is both an art and a science. The potential should capture the domain knowledge about how variables interact. Let's explore common patterns for different types of relationships.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188
class SmoothnessPotential(PotentialFunction): """ Potential that encourages smooth/similar values between variables. For continuous: ψ(x_i, x_j) = exp(-λ|x_i - x_j|²) For discrete: ψ(x_i, x_j) = exp(-λ|x_i - x_j|) Used extensively in image denoising, depth estimation, etc. """ def __init__(self, var_i: int, var_j: int, smoothness_weight: float = 1.0, distance_type: str = "squared"): super().__init__((var_i, var_j)) self.weight = smoothness_weight self.distance_type = distance_type def __call__(self, assignment: Dict[int, Any]) -> float: xi = assignment[self.scope[0]] xj = assignment[self.scope[1]] diff = abs(xi - xj) if self.distance_type == "squared": distance = diff ** 2 elif self.distance_type == "absolute": distance = diff elif self.distance_type == "truncated": distance = min(diff, 2.0) # Cap penalty else: distance = diff return np.exp(-self.weight * distance) class ContrastPotential(PotentialFunction): """ Potential that encourages different values (anti-smoothness). Useful for graph coloring, scheduling (avoid conflicts), etc. """ def __init__(self, var_i: int, var_j: int, penalty: float = 10.0): super().__init__((var_i, var_j)) self.penalty = penalty def __call__(self, assignment: Dict[int, Any]) -> float: xi = assignment[self.scope[0]] xj = assignment[self.scope[1]] if xi == xj: return np.exp(-self.penalty) # Strong penalty for same value else: return 1.0 # No penalty for different values class HardConstraintPotential(PotentialFunction): """ Potential encoding hard constraints. Returns 1.0 for allowed configurations, 0.0 for forbidden. Zero probability for forbidden configurations. """ def __init__(self, scope: Tuple[int, ...], is_allowed: Callable[[Dict[int, Any]], bool]): super().__init__(scope) self.is_allowed = is_allowed def __call__(self, assignment: Dict[int, Any]) -> float: return 1.0 if self.is_allowed(assignment) else 0.0 class DataTermPotential(PotentialFunction): """ Unary potential encoding how well a variable value matches observations. Common in image segmentation: ψ(x_i) = exp(-λ * ||I_i - μ_{x_i}||²) where I_i is observed pixel value and μ_{x_i} is mean for label x_i. """ def __init__(self, var: int, observed_value: float, label_means: Dict[int, float], weight: float = 1.0): super().__init__((var,)) self.observed = observed_value self.means = label_means self.weight = weight def __call__(self, assignment: Dict[int, Any]) -> float: label = assignment[self.scope[0]] expected = self.means[label] error = (self.observed - expected) ** 2 return np.exp(-self.weight * error) # Example: Build an image segmentation MRFdef build_segmentation_mrf_potentials( image: np.ndarray, fg_mean: float = 200, bg_mean: float = 50, smoothness: float = 2.0) -> Tuple[List[PotentialFunction], dict]: """ Build potentials for binary image segmentation. Args: image: 2D grayscale image fg_mean: Expected foreground pixel value bg_mean: Expected background pixel value smoothness: Weight for pairwise smoothness term Returns: List of potential functions and metadata """ height, width = image.shape potentials = [] label_means = {0: bg_mean, 1: fg_mean} # 0=background, 1=foreground # Data terms (unary potentials) for row in range(height): for col in range(width): pixel_idx = row * width + col pixel_value = image[row, col] data_potential = DataTermPotential( var=pixel_idx, observed_value=pixel_value, label_means=label_means, weight=0.01 ) potentials.append(data_potential) # Smoothness terms (pairwise potentials) for row in range(height): for col in range(width): pixel_idx = row * width + col # Right neighbor if col < width - 1: right_idx = pixel_idx + 1 potentials.append( PottsPotential(pixel_idx, right_idx, coupling=smoothness) ) # Down neighbor if row < height - 1: down_idx = pixel_idx + width potentials.append( PottsPotential(pixel_idx, down_idx, coupling=smoothness) ) metadata = { 'num_variables': height * width, 'num_unary': height * width, 'num_pairwise': len(potentials) - height * width, 'labels': {0: 'background', 1: 'foreground'} } return potentials, metadata # Demonstrate with a small synthetic imageprint("Image Segmentation MRF Construction:")print("=" * 50) # Create a 3x3 synthetic imagesynthetic_image = np.array([ [50, 60, 55], # Dark top row (background) [180, 200, 190], # Bright middle row (foreground) [45, 55, 50], # Dark bottom row (background)]) potentials, meta = build_segmentation_mrf_potentials( synthetic_image, fg_mean=190, bg_mean=55, smoothness=2.0) print(f"Number of variables: {meta['num_variables']}")print(f"Number of unary potentials: {meta['num_unary']}") print(f"Number of pairwise potentials: {meta['num_pairwise']}")print(f"Total potentials: {len(potentials)}")There's an illuminating alternative view of MRFs that comes from statistical physics: the energy-based perspective. Instead of thinking about compatibility scores that we want to maximize, we think about energy costs that we want to minimize.
In statistical physics, systems tend toward low-energy configurations. The probability of a configuration is given by the Gibbs (or Boltzmann) distribution: $P(\mathbf{x}) = \frac{1}{Z} \exp(-E(\mathbf{x})/T)$, where $E(\mathbf{x})$ is the energy and $T$ is temperature. Setting $T = 1$, we get the standard MRF form with $\psi(\mathbf{x}) = \exp(-E(\mathbf{x}))$.
The Potential-Energy Relationship:
Potentials and energies are related by a simple transformation:
$$\psi_C(\mathbf{x}_C) = \exp(-E_C(\mathbf{x}_C))$$ $$E_C(\mathbf{x}_C) = -\log \psi_C(\mathbf{x}_C)$$
This relationship means:
Total Energy:
Since potentials multiply and logarithms convert products to sums:
$$E(\mathbf{x}) = \sum_{C \in \mathcal{C}} E_C(\mathbf{x}C) = -\sum{C \in \mathcal{C}} \log \psi_C(\mathbf{x}_C)$$
This additive structure of energy is often more intuitive than the multiplicative structure of potentials.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
class EnergyBasedMRF: """ MRF formulated in terms of energy functions instead of potentials. P(x) = (1/Z) exp(-E(x)) where E(x) = Σ E_c(x_c) is the sum of local energy terms. """ def __init__(self): self.energy_terms: List[Callable] = [] def add_energy_term(self, energy_func: Callable): """Add an energy term to the model.""" self.energy_terms.append(energy_func) def total_energy(self, assignment: Dict[int, Any]) -> float: """Compute total energy (sum of all terms).""" return sum(e(assignment) for e in self.energy_terms) def unnormalized_probability(self, assignment: Dict[int, Any]) -> float: """Compute exp(-E(x)) - unnormalized probability.""" return np.exp(-self.total_energy(assignment)) def find_map_brute_force(self, variable_domains: Dict[int, List]) -> Dict[int, Any]: """ Find MAP (Maximum A Posteriori) assignment by brute force. MAP = argmin E(x) = argmax P(x) For small problems only! """ from itertools import product variables = sorted(variable_domains.keys()) domains = [variable_domains[v] for v in variables] best_assignment = None best_energy = float('inf') for values in product(*domains): assignment = dict(zip(variables, values)) energy = self.total_energy(assignment) if energy < best_energy: best_energy = energy best_assignment = assignment return best_assignment, best_energy def create_ising_model_energy(adjacency_list: Dict[int, List[int]], couplings: Dict[Tuple[int, int], float], fields: Dict[int, float]) -> EnergyBasedMRF: """ Create an Ising model in energy form. E(x) = -Σ_{i,j} J_{ij} x_i x_j - Σ_i h_i x_i Args: adjacency_list: Graph structure couplings: J_{ij} values for each edge fields: h_i values for each node """ model = EnergyBasedMRF() # Pairwise energy terms added_edges = set() for i, neighbors in adjacency_list.items(): for j in neighbors: edge = tuple(sorted([i, j])) if edge not in added_edges: J = couplings.get(edge, 0.0) def pairwise_energy(assignment, i=i, j=j, J=J): return -J * assignment[i] * assignment[j] model.add_energy_term(pairwise_energy) added_edges.add(edge) # Unary energy terms (external field) for i, h in fields.items(): def field_energy(assignment, i=i, h=h): return -h * assignment[i] model.add_energy_term(field_energy) return model # Example: Small 2D Ising modelprint("Energy-Based Ising Model:")print("=" * 50) # 2x2 grid of spins# 0 - 1# | |# 2 - 3 adjacency = { 0: [1, 2], 1: [0, 3], 2: [0, 3], 3: [1, 2]} # Ferromagnetic coupling (J > 0)couplings = { (0, 1): 1.0, (0, 2): 1.0, (1, 3): 1.0, (2, 3): 1.0,} # Small external field favoring +1fields = {0: 0.1, 1: 0.1, 2: 0.1, 3: 0.1} ising_model = create_ising_model_energy(adjacency, couplings, fields) # Evaluate some configurationsconfigs = [ {0: -1, 1: -1, 2: -1, 3: -1}, # All down {0: +1, 1: +1, 2: +1, 3: +1}, # All up {0: +1, 1: -1, 2: -1, 3: +1}, # Checkerboard {0: +1, 1: +1, 2: -1, 3: -1}, # Half and half] for config in configs: E = ising_model.total_energy(config) P_unnorm = ising_model.unnormalized_probability(config) spin_str = "".join("+" if config[i] > 0 else "-" for i in range(4)) print(f"Config [{spin_str}]: E = {E:+.2f}, exp(-E) = {P_unnorm:.4f}") # Find MAP assignmentmap_assignment, map_energy = ising_model.find_map_brute_force( {i: [-1, +1] for i in range(4)})spin_str = "".join("+" if map_assignment[i] > 0 else "-" for i in range(4))print(f"\nMAP assignment: [{spin_str}] with energy {map_energy:+.2f}")So far, we've focused primarily on unary and pairwise potentials. However, many real-world relationships involve three or more variables simultaneously. These are called higher-order potentials.
Higher-order potentials introduce significant computational challenges. A potential over $k$ binary variables has $2^k$ possible values to store and enumerate. Inference algorithms become substantially more complex. Despite this, higher-order potentials are crucial for capturing rich dependencies that pairwise models cannot express.
Why Higher-Order Potentials Matter:
Richer Expressivity: Some constraints fundamentally involve multiple variables
Better Modeling Accuracy: Pairwise models can sometimes approximate higher-order effects, but often poorly
Natural Problem Structure: Many problems (e.g., XOR constraints, parity checks) are inherently higher-order
Common Higher-Order Patterns:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
class CardinalityPotential(PotentialFunction): """ Higher-order potential that depends only on the count of ON variables. ψ(x_1, ..., x_k) = f(Σ_i x_i) This is more efficient than general higher-order potentials because we only need to store k+1 values (for counts 0, 1, ..., k). Applications: - "At least k variables must be ON" - "At most k variables can be ON" - "Exactly k variables should be ON" """ def __init__(self, scope: Tuple[int, ...], cardinality_values: List[float]): """ Initialize cardinality potential. Args: scope: Variable indices cardinality_values: Potential value for each count (0, 1, ..., k) """ super().__init__(scope) k = len(scope) assert len(cardinality_values) == k + 1 self.values = np.array(cardinality_values) def __call__(self, assignment: Dict[int, Any]) -> float: count = sum(assignment[v] for v in self.scope) return float(self.values[count]) class AtMostKPotential(CardinalityPotential): """At most K variables can be ON (1).""" def __init__(self, scope: Tuple[int, ...], k: int, violation_penalty: float = 0.0): n = len(scope) values = [1.0 if i <= k else violation_penalty for i in range(n + 1)] super().__init__(scope, values) self.k = k class ExactlyKPotential(CardinalityPotential): """Exactly K variables should be ON (1).""" def __init__(self, scope: Tuple[int, ...], k: int, correct_value: float = 1.0, violation_penalty: float = 0.0): n = len(scope) values = [correct_value if i == k else violation_penalty for i in range(n + 1)] super().__init__(scope, values) self.k = k class RobustPnPotential(PotentialFunction): """ Robust P^n Potts potential from Kohli et al. A higher-order potential that encourages label consistency but is robust to a small number of outliers. If all variables agree: low energy If up to k disagree: gradually increasing energy If many disagree: truncated energy (doesn't penalize further) This is critical for image segmentation with outliers. """ def __init__(self, scope: Tuple[int, ...], num_labels: int, gamma_max: float = 10.0, truncation: float = 0.5): """ Args: scope: Variable indices num_labels: Number of possible label values gamma_max: Maximum penalty truncation: Fraction at which penalty is truncated """ super().__init__(scope) self.num_labels = num_labels self.gamma_max = gamma_max self.truncation = truncation def __call__(self, assignment: Dict[int, Any]) -> float: # Count label frequencies labels = [assignment[v] for v in self.scope] label_counts = {} for l in labels: label_counts[l] = label_counts.get(l, 0) + 1 # Find majority label max_count = max(label_counts.values()) minority_count = len(labels) - max_count # Number of "outliers" # Compute energy n = len(labels) truncation_point = int(self.truncation * n) if minority_count <= truncation_point: energy = (minority_count / truncation_point) * self.gamma_max else: energy = self.gamma_max return np.exp(-energy) # Demonstrationprint("Higher-Order Potential Examples:")print("=" * 50) # At Most 2 potential over 4 variablesat_most_2 = AtMostKPotential(scope=(0, 1, 2, 3), k=2, violation_penalty=0.01) print("At Most 2 Potential (4 binary variables):")for count in range(5): # Create assignment with 'count' ones assignment = {i: 1 if i < count else 0 for i in range(4)} psi = at_most_2(assignment) print(f" Count={count}: ψ = {psi}") # Exactly 2 potentialprint("\nExactly 2 Potential (4 binary variables):")exactly_2 = ExactlyKPotential(scope=(0, 1, 2, 3), k=2, violation_penalty=0.01) for count in range(5): assignment = {i: 1 if i < count else 0 for i in range(4)} psi = exactly_2(assignment) print(f" Count={count}: ψ = {psi}") # Robust P^n Pottsprint("\nRobust P^n Potts (6 variables, 3 labels):")robust = RobustPnPotential( scope=tuple(range(6)), num_labels=3, gamma_max=5.0, truncation=0.33 # Truncate at 2 outliers) configs = [ [0, 0, 0, 0, 0, 0], # All same [0, 0, 0, 0, 0, 1], # 1 outlier [0, 0, 0, 0, 1, 2], # 2 outliers [0, 0, 0, 1, 1, 2], # 3 outliers [0, 0, 1, 1, 2, 2], # Maximum diversity] for config in configs: assignment = {i: config[i] for i in range(6)} psi = robust(assignment) outliers = 6 - max(config.count(l) for l in set(config)) print(f" {config} (outliers={outliers}): ψ = {psi:.4f}")Designing effective potential functions requires balancing expressivity, computational tractability, and domain knowledge. Here are key principles that guide practitioners.
For log-linear models, the choice of features $f_k(\mathbf{x}_C)$ is critical. Good features capture meaningful patterns; poor features waste parameters or miss important structure. Feature engineering for MRFs shares much with feature engineering for other ML models: domain knowledge, iteration, and validation are key.
Common Pitfalls to Avoid:
Potential functions are the mathematical heart of Markov Random Fields. Let's consolidate the key concepts:
What's Next:
With potentials defined, a critical question remains: How do we ensure the resulting distribution is properly normalized? The next page explores the Partition Function—the normalization constant that converts products of potentials into a valid probability distribution. We'll see why computing $Z$ exactly is typically intractable and preview the approximate methods that make MRF inference practical.
You now understand how potential functions define compatibility relationships in MRFs. You can create table potentials, log-linear potentials, and specialized potentials for common patterns. You understand the energy-based perspective and can design potentials for practical applications like image segmentation.