Loading learning content...
We've established that MRFs use potential functions defined over cliques to represent probability distributions. But a fundamental question remains: What is the precise relationship between graph structure and probability factorization?
The Hammersley-Clifford theorem answers this question definitively. It establishes a beautiful equivalence: a positive distribution satisfies the Markov properties with respect to a graph if and only if it factorizes as a product of potentials over the maximal cliques. This theorem is the theoretical cornerstone of undirected graphical models.
By the end of this page, you will understand the precise statement of the Hammersley-Clifford theorem, appreciate the positivity requirement and what happens without it, follow the key ideas of the proof, and understand the theorem's implications for MRF modeling and inference.
Let $P$ be a strictly positive probability distribution over random variables $\mathbf{X} = (X_1, \ldots, X_n)$, and let $G = (V, E)$ be an undirected graph with $V = {1, \ldots, n}$. Then the following are equivalent:
What This Means:
The theorem provides a two-way bridge:
Independence → Factorization: If we know a distribution has certain conditional independencies (expressible via graph separation), we can represent it as a product of clique potentials.
Factorization → Independence: If we define a distribution via clique potentials, we automatically get all the conditional independencies implied by graph separation.
This equivalence justifies the entire MRF framework: modeling with potentials is principled because it directly corresponds to conditional independence assumptions.
The Positivity Requirement:
The theorem requires $P(\mathbf{x}) > 0$ for all configurations $\mathbf{x}$. This "strictly positive" or "full support" assumption is crucial:
The full proof of Hammersley-Clifford is technical, but the key ideas are accessible. Let's sketch the main arguments.
Direction 1: Factorization ⟹ Markov Properties
This direction is straightforward. If $P$ factorizes as: $$P(\mathbf{x}) = \frac{1}{Z}\prod_{C} \psi_C(\mathbf{x}_C)$$
Consider sets $A$, $B$ separated by $S$ in the graph. Every clique involves only variables from $A \cup S$, only from $B \cup S$, or entirely within $S$. This means we can write: $$P(\mathbf{x}) \propto f(\mathbf{x}{A \cup S}) \cdot g(\mathbf{x}{B \cup S})$$
This factorization immediately implies $A \perp!!!\perp B \mid S$.
Direction 2: Markov Properties ⟹ Factorization
This is the harder direction. The key insight uses the Möbius inversion formula and logarithms.
Define the interaction potentials: $$\log \psi_C(\mathbf{x}C) = \sum{D \subseteq C} (-1)^{|C| - |D|} \log P(\mathbf{x}D, \mathbf{x}^0{-D})$$
where $\mathbf{x}^0$ is a fixed reference configuration.
The Markov properties ensure that $\psi_C = 1$ (constant) whenever $C$ is not a clique, leaving only clique potentials in the factorization.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
import numpy as npfrom itertools import product, combinationsfrom typing import Dict, List, Set, Tuple def verify_factorization_implies_independence( potentials: List[Tuple[Set[int], np.ndarray]], variable_domains: Dict[int, List], set_a: Set[int], set_b: Set[int], separator: Set[int]) -> bool: """ Verify that if P factorizes over cliques, separation implies independence. Args: potentials: List of (scope, potential_table) pairs variable_domains: Domain of each variable set_a, set_b: Sets to check independence between separator: Proposed separator """ variables = sorted(variable_domains.keys()) domains = [variable_domains[v] for v in variables] def compute_joint(assignment: Dict[int, int]) -> float: result = 1.0 for scope, table in potentials: idx = tuple(assignment[v] for v in sorted(scope)) result *= table[idx] return result # Compute full joint and marginals all_configs = list(product(*domains)) joint = {} for vals in all_configs: assignment = dict(zip(variables, vals)) joint[vals] = compute_joint(assignment) Z = sum(joint.values()) joint = {k: v/Z for k, v in joint.items()} # Check: P(A, B | S) = P(A | S) * P(B | S) for all A, B, S values tolerance = 1e-10 independence_holds = True # This would require marginalizing and checking - simplified here print("Verification: Checking if separation implies conditional independence") print(f" Sets: A={set_a}, B={set_b}, S={separator}") return independence_holds # Full verification would be more complex def construct_clique_potentials_from_distribution( joint_probs: Dict[Tuple, float], cliques: List[Set[int]], reference_config: Tuple) -> Dict[frozenset, Dict[Tuple, float]]: """ Construct clique potentials from a joint distribution using the Möbius inversion approach from Hammersley-Clifford proof. """ n = len(reference_config) potentials = {} for clique in cliques: clique = frozenset(clique) potential = {} # For each configuration of the clique # (Simplified - full implementation needs Möbius inversion) potentials[clique] = potential return potentials # Demonstrate the theorem with a simple exampleprint("Hammersley-Clifford Theorem Demonstration:")print("=" * 50) # A simple 3-node chain: X0 - X1 - X2# Cliques: {0,1} and {1,2}# Independence: X0 ⊥⊥ X2 | X1 # Define potentialspsi_01 = np.array([[2, 1], [1, 2]]) # Prefers agreementpsi_12 = np.array([[2, 1], [1, 2]]) # Prefers agreement def compute_chain_distribution(): """Compute distribution for 3-node chain and verify independence.""" joint = np.zeros((2, 2, 2)) for x0 in range(2): for x1 in range(2): for x2 in range(2): joint[x0, x1, x2] = psi_01[x0, x1] * psi_12[x1, x2] Z = joint.sum() joint /= Z # Check X0 ⊥⊥ X2 | X1 # P(X0, X2 | X1) should equal P(X0 | X1) * P(X2 | X1) print("\nJoint distribution P(X0, X1, X2):") for x0 in range(2): for x1 in range(2): for x2 in range(2): print(f" P({x0},{x1},{x2}) = {joint[x0,x1,x2]:.4f}") print("\nVerifying X0 ⊥⊥ X2 | X1:") for x1 in range(2): # P(X1 = x1) p_x1 = joint[:, x1, :].sum() if p_x1 == 0: continue for x0 in range(2): for x2 in range(2): p_x0_x2_given_x1 = joint[x0, x1, x2] / p_x1 p_x0_given_x1 = joint[x0, x1, :].sum() / p_x1 p_x2_given_x1 = joint[:, x1, x2].sum() / p_x1 product = p_x0_given_x1 * p_x2_given_x1 match = "✓" if abs(p_x0_x2_given_x1 - product) < 1e-10 else "✗" print(f" X1={x1}: P(X0={x0},X2={x2}|X1) = {p_x0_x2_given_x1:.4f}, " f"P(X0|X1)P(X2|X1) = {product:.4f} {match}") compute_chain_distribution()The positivity requirement ($P(\mathbf{x}) > 0$ for all $\mathbf{x}$) is not merely technical—it's essential for the theorem to hold.
Without strict positivity, a distribution can satisfy the Markov properties without admitting a clique factorization, or can factorize in ways that obscure the true independence structure. The theorem breaks down in both directions.
Classic Counterexample:
Consider three binary variables with the distribution:
This distribution satisfies: $X_0 \perp!!!\perp X_1$, $X_1 \perp!!!\perp X_2$, and $X_0 \perp!!!\perp X_2$ (marginally independent), suggesting an empty graph (no edges).
But it cannot factorize as $P(x_0)P(x_1)P(x_2)$ because the variables are deterministically related: knowing any one determines the others.
Practical Implications:
The theorem states we can factorize over maximal cliques. But can we use non-maximal cliques too?
Answer: Yes, with care.
Any clique factorization can be converted to a maximal clique factorization by "absorbing" smaller clique potentials into the maximal cliques containing them:
$$\prod_C \psi_C(\mathbf{x}C) = \prod{C_{max}} \left[\prod_{C \subseteq C_{max}} \psi_C(\mathbf{x}_C)\right]$$
However, working with non-maximal cliques has advantages:
Common Practice:
In applications like image processing, we typically use:
Even though the graph might have larger cliques, these simpler potentials suffice for the desired model.
The Hammersley-Clifford theorem has profound practical implications:
When building an MRF, think carefully about which edges to include. Each edge creates a direct dependency; each missing edge asserts conditional independence. The Hammersley-Clifford theorem guarantees that your factorization respects exactly these assertions.
What's Next:
The final page explores Energy-Based Models—a powerful perspective that connects MRFs to neural networks, contrastive learning, and modern deep generative models.
You now understand the Hammersley-Clifford theorem—the theoretical bedrock of Markov Random Fields that precisely connects graph structure to probability factorization.