Loading learning content...
Every probability distribution must sum to one. In Bayesian networks, this normalization is built-in: conditional probability tables are pre-normalized. But in Markov Random Fields, potential functions are unnormalized compatibility scores. The partition function $Z$ bridges this gap—it's the normalizing constant that transforms products of potentials into valid probabilities.
Understanding the partition function is essential because its computation is the central challenge in MRF inference. Computing $Z$ exactly is typically intractable, making approximate inference necessary for practical applications.
By the end of this page, you will understand the mathematical definition and role of the partition function, appreciate why computing Z is computationally intractable, learn about the log-partition function and its properties, and understand how Z relates to marginal probabilities and expectations.
The partition function $Z$ is defined as the sum (or integral for continuous variables) of the unnormalized probability over all possible configurations:
$$Z = \sum_{\mathbf{x} \in \mathcal{X}} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$
With the partition function, the joint probability becomes:
$$P(\mathbf{X} = \mathbf{x}) = \frac{1}{Z} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$
The term 'partition function' comes from statistical physics, where Z (from German 'Zustandssumme', meaning 'sum over states') appears in the Boltzmann distribution describing thermal equilibrium. The partition function 'partitions' probability among all possible states.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
import numpy as npfrom itertools import productfrom typing import Dict, List, Callable, Any, Tuple class ExactPartitionFunction: """ Exact computation of the partition function via brute-force enumeration. Only feasible for small problems! """ def __init__(self, potentials: List[Callable], variable_domains: Dict[int, List]): self.potentials = potentials self.domains = variable_domains self.variables = sorted(variable_domains.keys()) def compute_unnormalized(self, assignment: Dict[int, Any]) -> float: """Compute product of all potentials for an assignment.""" result = 1.0 for psi in self.potentials: result *= psi(assignment) return result def compute_Z(self) -> Tuple[float, Dict]: """ Compute Z by summing over all configurations. Returns Z and dictionary of all (assignment, unnorm_prob) pairs. """ domain_lists = [self.domains[v] for v in self.variables] Z = 0.0 all_scores = {} for values in product(*domain_lists): assignment = dict(zip(self.variables, values)) score = self.compute_unnormalized(assignment) all_scores[values] = score Z += score return Z, all_scores def compute_probabilities(self) -> Dict[Tuple, float]: """Compute normalized probabilities for all configurations.""" Z, scores = self.compute_Z() return {config: score/Z for config, score in scores.items()} # Example: Simple 3-variable binary MRFdef psi_01(a): return 2.0 if a[0] == a[1] else 0.5 # Prefer agreementdef psi_12(a): return 2.0 if a[1] == a[2] else 0.5 # Prefer agreementdef psi_0(a): return 1.5 if a[0] == 1 else 1.0 # Slight bias to 1 potentials = [psi_01, psi_12, psi_0]domains = {0: [0, 1], 1: [0, 1], 2: [0, 1]} solver = ExactPartitionFunction(potentials, domains)Z, scores = solver.compute_Z()probs = solver.compute_probabilities() print("Partition Function Computation:")print("=" * 50)print(f"\nPartition function Z = {Z:.4f}")print(f"Number of configurations: {len(scores)}")print("\nAll probabilities:")for config, prob in sorted(probs.items()): print(f" P(X={config}) = {prob:.4f}")print(f"\nSum of probabilities: {sum(probs.values()):.6f}")Computing the partition function exactly is #P-hard for general MRFs. This is a complexity class even harder than NP-complete—not only is finding a solution hard, but even counting solutions is hard.
For $n$ binary variables, computing $Z$ requires summing over $2^n$ configurations. At 30 variables: ~1 billion terms. At 50 variables: ~1 quadrillion terms. At 100 variables: more than atoms in the universe. Brute-force is simply impossible for realistic problems.
| Variables | Configurations | Time at 1M ops/sec |
|---|---|---|
| 10 | 1,024 | ~1 ms |
| 20 | 1,048,576 | ~1 second |
| 30 | ~1 billion | ~17 minutes |
| 40 | ~1 trillion | ~12 days |
| 50 | ~1 quadrillion | ~35 years |
| 100 | ~10³⁰ | Longer than universe age |
Tractable Special Cases:
Despite general intractability, exact computation is possible for:
For general graphs, we must resort to approximation.
In practice, we often work with the log-partition function $\log Z$ rather than $Z$ directly. This has several advantages:
$$\log Z = \log \sum_{\mathbf{x}} \exp\left(\sum_C \theta \cdot f_C(\mathbf{x}_C)\right)$$
For log-linear models, this is the log-sum-exp of linear functions in parameters.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
import numpy as npfrom scipy.special import logsumexp def compute_log_Z_stable(log_potentials: np.ndarray) -> float: """ Compute log(Z) in a numerically stable way using log-sum-exp. log Z = log Σ exp(log ψ_i) = logsumexp(log_potentials) Args: log_potentials: Array of log-potential values for each config """ return logsumexp(log_potentials) def demonstrate_numerical_stability(): """Show why log-space computation is essential.""" # Simulate potentials with very different scales n_configs = 1000 # Case 1: Large positive energies (tiny potentials) log_pots_small = -np.random.uniform(100, 200, n_configs) # Direct computation (will underflow!) try: Z_direct = np.sum(np.exp(log_pots_small)) print(f"Direct Z (small): {Z_direct}") # Will be 0 due to underflow except: print("Direct computation failed!") # Log-space computation (stable) log_Z = compute_log_Z_stable(log_pots_small) print(f"Log-space log(Z): {log_Z:.4f}") # Case 2: Large negative energies (huge potentials) log_pots_large = np.random.uniform(100, 200, n_configs) try: Z_direct = np.sum(np.exp(log_pots_large)) print(f"Direct Z (large): {Z_direct}") # Will be inf due to overflow except: print("Direct computation failed!") log_Z = compute_log_Z_stable(log_pots_large) print(f"Log-space log(Z): {log_Z:.4f}") demonstrate_numerical_stability() # Gradient of log Z gives expected featuresdef log_Z_gradient_demo(): """ For log-linear models: ∂log(Z)/∂θ_k = E[f_k(X)] This is crucial for maximum likelihood learning. """ # Simple 2-variable binary model # ψ(x0, x1) = exp(θ * I[x0 == x1]) theta = 2.0 # Parameter # Compute Z and probabilities configs = [(0,0), (0,1), (1,0), (1,1)] feature = lambda x: 1.0 if x[0] == x[1] else 0.0 log_pots = np.array([theta * feature(c) for c in configs]) log_Z = logsumexp(log_pots) probs = np.exp(log_pots - log_Z) # Expected feature = Σ P(x) * f(x) expected_feature = sum(p * feature(c) for p, c in zip(probs, configs)) # Numerical gradient eps = 1e-5 log_pots_plus = np.array([(theta + eps) * feature(c) for c in configs]) log_pots_minus = np.array([(theta - eps) * feature(c) for c in configs]) numerical_grad = (logsumexp(log_pots_plus) - logsumexp(log_pots_minus)) / (2 * eps) print(f"\nGradient of log(Z):") print(f" Expected feature E[f(X)] = {expected_feature:.6f}") print(f" Numerical ∂log(Z)/∂θ = {numerical_grad:.6f}") print(f" Match: {np.isclose(expected_feature, numerical_grad)}") log_Z_gradient_demo()The partition function appears in almost every inference task in MRFs. Understanding these connections clarifies why $Z$ is so central.
Marginal Probabilities:
$$P(X_i = x_i) = \frac{\sum_{\mathbf{x}: x_i = x_i} \prod_C \psi_C(\mathbf{x}C)}{Z} = \frac{Z{x_i}}{Z}$$
The marginal is a ratio of partition functions: one constrained ($Z_{x_i}$) and one unconstrained ($Z$).
Conditional Probabilities:
$$P(X_A \mid X_B = x_B) = \frac{1}{Z_{x_B}} \prod_C \psi_C(x_C)$$
Conditioning requires a new partition function over remaining variables.
Expectations:
$$\mathbb{E}[f(X)] = \frac{1}{Z} \sum_{\mathbf{x}} f(\mathbf{x}) \prod_C \psi_C(\mathbf{x}_C)$$
Nearly every probabilistic query in an MRF requires computing or approximating Z or ratios of Z values. This is why approximate inference methods—variational inference, MCMC, belief propagation—are so important for practical MRF applications.
Since exact computation of $Z$ is generally intractable, numerous approximation methods have been developed. Here's a preview of the main approaches:
| Method | Approach | Guarantees |
|---|---|---|
| Variational (Mean Field) | Find tractable lower bound on log Z | Lower bound, often loose |
| Belief Propagation | Message passing on graph | Exact on trees, approx on loopy |
| MCMC Sampling | Estimate Z via sampling | Asymptotically exact |
| Importance Sampling | Weighted samples from proposal | Unbiased but high variance |
| Annealed Importance Sampling | Tempered distribution sequence | Better variance than naive IS |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
def mean_field_log_Z_bound(unary_energies: np.ndarray, pairwise_energies: np.ndarray, max_iters: int = 100) -> float: """ Mean-field variational lower bound on log(Z). Approximates the true distribution with a factorized one: q(x) = Π q_i(x_i) Returns a lower bound: log Z ≥ -F(q) """ n = len(unary_energies) # Initialize beliefs uniformly q = np.ones((n, 2)) / 2 # q[i, k] = q_i(x_i = k) for _ in range(max_iters): for i in range(n): # Update q_i based on neighbors log_q = np.zeros(2) for k in range(2): log_q[k] = -unary_energies[i, k] # Add pairwise contribution (simplified) q[i] = np.exp(log_q - logsumexp(log_q)) # Compute variational free energy bound # F = E_q[E(x)] - H(q) where H is entropy entropy = -np.sum(q * np.log(q + 1e-10)) # Energy term would need full computation return entropy # Simplified; real implementation needs energy term def importance_sampling_Z(log_potential_func, proposal_sampler, log_proposal_func, n_samples: int): """ Estimate Z using importance sampling. Z = E_q[exp(log_psi(x)) / q(x)] ≈ (1/N) Σ exp(log_psi(x_i) - log_q(x_i)) """ log_weights = [] for _ in range(n_samples): x = proposal_sampler() log_w = log_potential_func(x) - log_proposal_func(x) log_weights.append(log_w) # log Z ≈ log(mean(exp(log_weights))) log_weights = np.array(log_weights) log_Z_estimate = logsumexp(log_weights) - np.log(n_samples) return log_Z_estimate print("Partition Function Approximation Methods:")print("=" * 50)print("See Module 4 (Inference in Graphical Models) for details")What's Next:
The next page presents the Hammersley-Clifford theorem—the fundamental result that formally connects graph structure to probability factorization in MRFs.
You now understand the partition function's role, its computational challenges, and why approximate inference is necessary for practical MRF applications.