Machine LearningGraphical Models

Markov Random Fields

LevelAdvanced

Duration90 mins

TopicGraphical Models

3 / 5

Partition Function

The Normalizing Denominator

Every probability distribution must sum to one. In Bayesian networks, this normalization is built-in: conditional probability tables are pre-normalized. But in Markov Random Fields, potential functions are unnormalized compatibility scores. The partition function $Z$ bridges this gap—it's the normalizing constant that transforms products of potentials into valid probabilities.

Understanding the partition function is essential because its computation is the central challenge in MRF inference. Computing $Z$ exactly is typically intractable, making approximate inference necessary for practical applications.

What You Will Learn

By the end of this page, you will understand the mathematical definition and role of the partition function, appreciate why computing Z is computationally intractable, learn about the log-partition function and its properties, and understand how Z relates to marginal probabilities and expectations.

Mathematical Definition

The partition function $Z$ is defined as the sum (or integral for continuous variables) of the unnormalized probability over all possible configurations:

$$Z = \sum_{\mathbf{x} \in \mathcal{X}} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$

With the partition function, the joint probability becomes:

$$P(\mathbf{X} = \mathbf{x}) = \frac{1}{Z} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$

Etymology

The term 'partition function' comes from statistical physics, where Z (from German 'Zustandssumme', meaning 'sum over states') appears in the Boltzmann distribution describing thermal equilibrium. The partition function 'partitions' probability among all possible states.

partition_function_basics.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import numpy as np
from itertools import product
from typing import Dict, List, Callable, Any, Tuple
 
class ExactPartitionFunction:
    """
    Exact computation of the partition function via brute-force enumeration.
    Only feasible for small problems!
    """
    
    def __init__(self, potentials: List[Callable], 
                 variable_domains: Dict[int, List]):
        self.potentials = potentials
        self.domains = variable_domains
        self.variables = sorted(variable_domains.keys())
    
    def compute_unnormalized(self, assignment: Dict[int, Any]) -> float:
        """Compute product of all potentials for an assignment."""
        result = 1.0
        for psi in self.potentials:
            result *= psi(assignment)
        return result
    
    def compute_Z(self) -> Tuple[float, Dict]:
        """
        Compute Z by summing over all configurations.
        Returns Z and dictionary of all (assignment, unnorm_prob) pairs.
        """
        domain_lists = [self.domains[v] for v in self.variables]
        Z = 0.0
        all_scores = {}
        
        for values in product(*domain_lists):
            assignment = dict(zip(self.variables, values))
            score = self.compute_unnormalized(assignment)
            all_scores[values] = score
            Z += score
        
        return Z, all_scores
    
    def compute_probabilities(self) -> Dict[Tuple, float]:
        """Compute normalized probabilities for all configurations."""
        Z, scores = self.compute_Z()
        return {config: score/Z for config, score in scores.items()}
 
 
# Example: Simple 3-variable binary MRF
def psi_01(a): return 2.0 if a[0] == a[1] else 0.5  # Prefer agreement
def psi_12(a): return 2.0 if a[1] == a[2] else 0.5  # Prefer agreement
def psi_0(a): return 1.5 if a[0] == 1 else 1.0      # Slight bias to 1
 
potentials = [psi_01, psi_12, psi_0]
domains = {0: [0, 1], 1: [0, 1], 2: [0, 1]}
 
solver = ExactPartitionFunction(potentials, domains)
Z, scores = solver.compute_Z()
probs = solver.compute_probabilities()
 
print("Partition Function Computation:")
print("=" * 50)
print(f"\nPartition function Z = {Z:.4f}")
print(f"Number of configurations: {len(scores)}")
print("\nAll probabilities:")
for config, prob in sorted(probs.items()):
    print(f"  P(X={config}) = {prob:.4f}")
print(f"\nSum of probabilities: {sum(probs.values()):.6f}")

Computational Intractability

Computing the partition function exactly is #P-hard for general MRFs. This is a complexity class even harder than NP-complete—not only is finding a solution hard, but even counting solutions is hard.

The Exponential Explosion

For $n$ binary variables, computing $Z$ requires summing over $2^n$ configurations. At 30 variables: ~1 billion terms. At 50 variables: ~1 quadrillion terms. At 100 variables: more than atoms in the universe. Brute-force is simply impossible for realistic problems.

Partition Function Complexity Growth
Variables	Configurations	Time at 1M ops/sec
10	1,024	~1 ms
20	1,048,576	~1 second
30	~1 billion	~17 minutes
40	~1 trillion	~12 days
50	~1 quadrillion	~35 years
100	~10³⁰	Longer than universe age

Tractable Special Cases:

Despite general intractability, exact computation is possible for:

Tree-structured graphs: Variable elimination runs in $O(n \cdot d^2)$ where $d$ is max domain size
Low treewidth graphs: Junction tree algorithm is exponential only in treewidth
Discrete Gaussian MRFs: Closed-form solutions via matrix operations
Decomposable models: When the graph has special structure

For general graphs, we must resort to approximation.

The Log-Partition Function

In practice, we often work with the log-partition function $\log Z$ rather than $Z$ directly. This has several advantages:

$$\log Z = \log \sum_{\mathbf{x}} \exp\left(\sum_C \theta \cdot f_C(\mathbf{x}_C)\right)$$

For log-linear models, this is the log-sum-exp of linear functions in parameters.

Why Log-Partition?

•Numerical stability: Avoids overflow/underflow from very large/small products
•Convexity: $\log Z(\theta)$ is convex in parameters $\theta$ for log-linear models
•Gradient connection: $\nabla_\theta \log Z = \mathbb{E}[f(X)]$ — expected features under the model
•Free energy: In physics, $-\log Z$ relates to Helmholtz free energy

log_partition_function.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
import numpy as np
from scipy.special import logsumexp
 
def compute_log_Z_stable(log_potentials: np.ndarray) -> float:
    """
    Compute log(Z) in a numerically stable way using log-sum-exp.
    
    log Z = log Σ exp(log ψ_i) = logsumexp(log_potentials)
    
    Args:
        log_potentials: Array of log-potential values for each config
    """
    return logsumexp(log_potentials)
 
 
def demonstrate_numerical_stability():
    """Show why log-space computation is essential."""
    
    # Simulate potentials with very different scales
    n_configs = 1000
    
    # Case 1: Large positive energies (tiny potentials)
    log_pots_small = -np.random.uniform(100, 200, n_configs)
    
    # Direct computation (will underflow!)
    try:
        Z_direct = np.sum(np.exp(log_pots_small))
        print(f"Direct Z (small): {Z_direct}")  # Will be 0 due to underflow
    except:
        print("Direct computation failed!")
    
    # Log-space computation (stable)
    log_Z = compute_log_Z_stable(log_pots_small)
    print(f"Log-space log(Z): {log_Z:.4f}")
    
    # Case 2: Large negative energies (huge potentials)
    log_pots_large = np.random.uniform(100, 200, n_configs)
    
    try:
        Z_direct = np.sum(np.exp(log_pots_large))
        print(f"Direct Z (large): {Z_direct}")  # Will be inf due to overflow
    except:
        print("Direct computation failed!")
    
    log_Z = compute_log_Z_stable(log_pots_large)
    print(f"Log-space log(Z): {log_Z:.4f}")
 
demonstrate_numerical_stability()
 
# Gradient of log Z gives expected features
def log_Z_gradient_demo():
    """
    For log-linear models: ∂log(Z)/∂θ_k = E[f_k(X)]
    This is crucial for maximum likelihood learning.
    """
    # Simple 2-variable binary model
    # ψ(x0, x1) = exp(θ * I[x0 == x1])
    
    theta = 2.0  # Parameter
    
    # Compute Z and probabilities
    configs = [(0,0), (0,1), (1,0), (1,1)]
    feature = lambda x: 1.0 if x[0] == x[1] else 0.0
    
    log_pots = np.array([theta * feature(c) for c in configs])
    log_Z = logsumexp(log_pots)
    probs = np.exp(log_pots - log_Z)
    
    # Expected feature = Σ P(x) * f(x)
    expected_feature = sum(p * feature(c) for p, c in zip(probs, configs))
    
    # Numerical gradient
    eps = 1e-5
    log_pots_plus = np.array([(theta + eps) * feature(c) for c in configs])
    log_pots_minus = np.array([(theta - eps) * feature(c) for c in configs])
    numerical_grad = (logsumexp(log_pots_plus) - logsumexp(log_pots_minus)) / (2 * eps)
    
    print(f"\nGradient of log(Z):")
    print(f"  Expected feature E[f(X)] = {expected_feature:.6f}")
    print(f"  Numerical ∂log(Z)/∂θ = {numerical_grad:.6f}")
    print(f"  Match: {np.isclose(expected_feature, numerical_grad)}")
 
log_Z_gradient_demo()

Partition Function and Inference

The partition function appears in almost every inference task in MRFs. Understanding these connections clarifies why $Z$ is so central.

Marginal Probabilities:

$$P(X_i = x_i) = \frac{\sum_{\mathbf{x}: x_i = x_i} \prod_C \psi_C(\mathbf{x}C)}{Z} = \frac{Z{x_i}}{Z}$$

The marginal is a ratio of partition functions: one constrained ($Z_{x_i}$) and one unconstrained ($Z$).

Conditional Probabilities:

$$P(X_A \mid X_B = x_B) = \frac{1}{Z_{x_B}} \prod_C \psi_C(x_C)$$

Conditioning requires a new partition function over remaining variables.

Expectations:

$$\mathbb{E}[f(X)] = \frac{1}{Z} \sum_{\mathbf{x}} f(\mathbf{x}) \prod_C \psi_C(\mathbf{x}_C)$$

The Central Role of Z

Nearly every probabilistic query in an MRF requires computing or approximating Z or ratios of Z values. This is why approximate inference methods—variational inference, MCMC, belief propagation—are so important for practical MRF applications.

Approximating the Partition Function

Since exact computation of $Z$ is generally intractable, numerous approximation methods have been developed. Here's a preview of the main approaches:

Partition Function Approximation Methods
Method	Approach	Guarantees
Variational (Mean Field)	Find tractable lower bound on log Z	Lower bound, often loose
Belief Propagation	Message passing on graph	Exact on trees, approx on loopy
MCMC Sampling	Estimate Z via sampling	Asymptotically exact
Importance Sampling	Weighted samples from proposal	Unbiased but high variance
Annealed Importance Sampling	Tempered distribution sequence	Better variance than naive IS

z_approximation_preview.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def mean_field_log_Z_bound(unary_energies: np.ndarray, 
                                   pairwise_energies: np.ndarray,
                                   max_iters: int = 100) -> float:
    """
    Mean-field variational lower bound on log(Z).
    
    Approximates the true distribution with a factorized one:
    q(x) = Π q_i(x_i)
    
    Returns a lower bound: log Z ≥ -F(q)
    """
    n = len(unary_energies)
    # Initialize beliefs uniformly
    q = np.ones((n, 2)) / 2  # q[i, k] = q_i(x_i = k)
    
    for _ in range(max_iters):
        for i in range(n):
            # Update q_i based on neighbors
            log_q = np.zeros(2)
            for k in range(2):
                log_q[k] = -unary_energies[i, k]
                # Add pairwise contribution (simplified)
            q[i] = np.exp(log_q - logsumexp(log_q))
    
    # Compute variational free energy bound
    # F = E_q[E(x)] - H(q)  where H is entropy
    entropy = -np.sum(q * np.log(q + 1e-10))
    # Energy term would need full computation
    
    return entropy  # Simplified; real implementation needs energy term
 
 
def importance_sampling_Z(log_potential_func, proposal_sampler, 
                          log_proposal_func, n_samples: int):
    """
    Estimate Z using importance sampling.
    
    Z = E_q[exp(log_psi(x)) / q(x)]
      ≈ (1/N) Σ exp(log_psi(x_i) - log_q(x_i))
    """
    log_weights = []
    
    for _ in range(n_samples):
        x = proposal_sampler()
        log_w = log_potential_func(x) - log_proposal_func(x)
        log_weights.append(log_w)
    
    # log Z ≈ log(mean(exp(log_weights)))
    log_weights = np.array(log_weights)
    log_Z_estimate = logsumexp(log_weights) - np.log(n_samples)
    
    return log_Z_estimate
 
print("Partition Function Approximation Methods:")
print("=" * 50)
print("See Module 4 (Inference in Graphical Models) for details")

Summary: The Partition Function

Key Takeaways

•Z normalizes potentials into probabilities — The partition function ensures the distribution sums to one: $P(x) = \frac{1}{Z}\prod_C \psi_C(x_C)$.
•Computing Z is #P-hard — Exponential in the number of variables for general graphs.
•Log-partition is preferred — $\log Z$ offers numerical stability, convexity, and gradient connections.
•Z appears in all inference tasks — Marginals, conditionals, and expectations all involve Z.
•Approximation is essential — Variational methods, MCMC, and message passing approximate Z for practical applications.

What's Next:

The next page presents the Hammersley-Clifford theorem—the fundamental result that formally connects graph structure to probability factorization in MRFs.

Page Complete

You now understand the partition function's role, its computational challenges, and why approximate inference is necessary for practical MRF applications.

3 / 5

Loading learning content...

Machine LearningGraphical Models

Markov Random Fields

LevelAdvanced

Duration90 mins

TopicGraphical Models

3 / 5

Partition Function

The Normalizing Denominator

What You Will Learn

Mathematical Definition

The partition function $Z$ is defined as the sum (or integral for continuous variables) of the unnormalized probability over all possible configurations:

$$Z = \sum_{\mathbf{x} \in \mathcal{X}} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$

With the partition function, the joint probability becomes:

$$P(\mathbf{X} = \mathbf{x}) = \frac{1}{Z} \prod_{C \in \mathcal{C}} \psi_C(\mathbf{x}_C)$$

Etymology

partition_function_basics.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import numpy as np
from itertools import product
from typing import Dict, List, Callable, Any, Tuple
 
class ExactPartitionFunction:
    """
    Exact computation of the partition function via brute-force enumeration.
    Only feasible for small problems!
    """
    
    def __init__(self, potentials: List[Callable], 
                 variable_domains: Dict[int, List]):
        self.potentials = potentials
        self.domains = variable_domains
        self.variables = sorted(variable_domains.keys())
    
    def compute_unnormalized(self, assignment: Dict[int, Any]) -> float:
        """Compute product of all potentials for an assignment."""
        result = 1.0
        for psi in self.potentials:
            result *= psi(assignment)
        return result
    
    def compute_Z(self) -> Tuple[float, Dict]:
        """
        Compute Z by summing over all configurations.
        Returns Z and dictionary of all (assignment, unnorm_prob) pairs.
        """
        domain_lists = [self.domains[v] for v in self.variables]
        Z = 0.0
        all_scores = {}
        
        for values in product(*domain_lists):
            assignment = dict(zip(self.variables, values))
            score = self.compute_unnormalized(assignment)
            all_scores[values] = score
            Z += score
        
        return Z, all_scores
    
    def compute_probabilities(self) -> Dict[Tuple, float]:
        """Compute normalized probabilities for all configurations."""
        Z, scores = self.compute_Z()
        return {config: score/Z for config, score in scores.items()}
 
 
# Example: Simple 3-variable binary MRF
def psi_01(a): return 2.0 if a[0] == a[1] else 0.5  # Prefer agreement
def psi_12(a): return 2.0 if a[1] == a[2] else 0.5  # Prefer agreement
def psi_0(a): return 1.5 if a[0] == 1 else 1.0      # Slight bias to 1
 
potentials = [psi_01, psi_12, psi_0]
domains = {0: [0, 1], 1: [0, 1], 2: [0, 1]}
 
solver = ExactPartitionFunction(potentials, domains)
Z, scores = solver.compute_Z()
probs = solver.compute_probabilities()
 
print("Partition Function Computation:")
print("=" * 50)
print(f"\nPartition function Z = {Z:.4f}")
print(f"Number of configurations: {len(scores)}")
print("\nAll probabilities:")
for config, prob in sorted(probs.items()):
    print(f"  P(X={config}) = {prob:.4f}")
print(f"\nSum of probabilities: {sum(probs.values()):.6f}")

Computational Intractability

The Exponential Explosion

Partition Function Complexity Growth
Variables	Configurations	Time at 1M ops/sec
10	1,024	~1 ms
20	1,048,576	~1 second
30	~1 billion	~17 minutes
40	~1 trillion	~12 days
50	~1 quadrillion	~35 years
100	~10³⁰	Longer than universe age

Tractable Special Cases:

Despite general intractability, exact computation is possible for:

Tree-structured graphs: Variable elimination runs in $O(n \cdot d^2)$ where $d$ is max domain size
Low treewidth graphs: Junction tree algorithm is exponential only in treewidth
Discrete Gaussian MRFs: Closed-form solutions via matrix operations
Decomposable models: When the graph has special structure

For general graphs, we must resort to approximation.

The Log-Partition Function

In practice, we often work with the log-partition function $\log Z$ rather than $Z$ directly. This has several advantages:

$$\log Z = \log \sum_{\mathbf{x}} \exp\left(\sum_C \theta \cdot f_C(\mathbf{x}_C)\right)$$

For log-linear models, this is the log-sum-exp of linear functions in parameters.

Why Log-Partition?

•Numerical stability: Avoids overflow/underflow from very large/small products
•Convexity: $\log Z(\theta)$ is convex in parameters $\theta$ for log-linear models
•Gradient connection: $\nabla_\theta \log Z = \mathbb{E}[f(X)]$ — expected features under the model
•Free energy: In physics, $-\log Z$ relates to Helmholtz free energy

log_partition_function.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
import numpy as np
from scipy.special import logsumexp
 
def compute_log_Z_stable(log_potentials: np.ndarray) -> float:
    """
    Compute log(Z) in a numerically stable way using log-sum-exp.
    
    log Z = log Σ exp(log ψ_i) = logsumexp(log_potentials)
    
    Args:
        log_potentials: Array of log-potential values for each config
    """
    return logsumexp(log_potentials)
 
 
def demonstrate_numerical_stability():
    """Show why log-space computation is essential."""
    
    # Simulate potentials with very different scales
    n_configs = 1000
    
    # Case 1: Large positive energies (tiny potentials)
    log_pots_small = -np.random.uniform(100, 200, n_configs)
    
    # Direct computation (will underflow!)
    try:
        Z_direct = np.sum(np.exp(log_pots_small))
        print(f"Direct Z (small): {Z_direct}")  # Will be 0 due to underflow
    except:
        print("Direct computation failed!")
    
    # Log-space computation (stable)
    log_Z = compute_log_Z_stable(log_pots_small)
    print(f"Log-space log(Z): {log_Z:.4f}")
    
    # Case 2: Large negative energies (huge potentials)
    log_pots_large = np.random.uniform(100, 200, n_configs)
    
    try:
        Z_direct = np.sum(np.exp(log_pots_large))
        print(f"Direct Z (large): {Z_direct}")  # Will be inf due to overflow
    except:
        print("Direct computation failed!")
    
    log_Z = compute_log_Z_stable(log_pots_large)
    print(f"Log-space log(Z): {log_Z:.4f}")
 
demonstrate_numerical_stability()
 
# Gradient of log Z gives expected features
def log_Z_gradient_demo():
    """
    For log-linear models: ∂log(Z)/∂θ_k = E[f_k(X)]
    This is crucial for maximum likelihood learning.
    """
    # Simple 2-variable binary model
    # ψ(x0, x1) = exp(θ * I[x0 == x1])
    
    theta = 2.0  # Parameter
    
    # Compute Z and probabilities
    configs = [(0,0), (0,1), (1,0), (1,1)]
    feature = lambda x: 1.0 if x[0] == x[1] else 0.0
    
    log_pots = np.array([theta * feature(c) for c in configs])
    log_Z = logsumexp(log_pots)
    probs = np.exp(log_pots - log_Z)
    
    # Expected feature = Σ P(x) * f(x)
    expected_feature = sum(p * feature(c) for p, c in zip(probs, configs))
    
    # Numerical gradient
    eps = 1e-5
    log_pots_plus = np.array([(theta + eps) * feature(c) for c in configs])
    log_pots_minus = np.array([(theta - eps) * feature(c) for c in configs])
    numerical_grad = (logsumexp(log_pots_plus) - logsumexp(log_pots_minus)) / (2 * eps)
    
    print(f"\nGradient of log(Z):")
    print(f"  Expected feature E[f(X)] = {expected_feature:.6f}")
    print(f"  Numerical ∂log(Z)/∂θ = {numerical_grad:.6f}")
    print(f"  Match: {np.isclose(expected_feature, numerical_grad)}")
 
log_Z_gradient_demo()

Partition Function and Inference

The partition function appears in almost every inference task in MRFs. Understanding these connections clarifies why $Z$ is so central.

Marginal Probabilities:

$$P(X_i = x_i) = \frac{\sum_{\mathbf{x}: x_i = x_i} \prod_C \psi_C(\mathbf{x}C)}{Z} = \frac{Z{x_i}}{Z}$$

The marginal is a ratio of partition functions: one constrained ($Z_{x_i}$) and one unconstrained ($Z$).

Conditional Probabilities:

$$P(X_A \mid X_B = x_B) = \frac{1}{Z_{x_B}} \prod_C \psi_C(x_C)$$

Conditioning requires a new partition function over remaining variables.

Expectations:

$$\mathbb{E}[f(X)] = \frac{1}{Z} \sum_{\mathbf{x}} f(\mathbf{x}) \prod_C \psi_C(\mathbf{x}_C)$$

The Central Role of Z

Approximating the Partition Function

Since exact computation of $Z$ is generally intractable, numerous approximation methods have been developed. Here's a preview of the main approaches:

Partition Function Approximation Methods
Method	Approach	Guarantees
Variational (Mean Field)	Find tractable lower bound on log Z	Lower bound, often loose
Belief Propagation	Message passing on graph	Exact on trees, approx on loopy
MCMC Sampling	Estimate Z via sampling	Asymptotically exact
Importance Sampling	Weighted samples from proposal	Unbiased but high variance
Annealed Importance Sampling	Tempered distribution sequence	Better variance than naive IS

z_approximation_preview.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def mean_field_log_Z_bound(unary_energies: np.ndarray, 
                                   pairwise_energies: np.ndarray,
                                   max_iters: int = 100) -> float:
    """
    Mean-field variational lower bound on log(Z).
    
    Approximates the true distribution with a factorized one:
    q(x) = Π q_i(x_i)
    
    Returns a lower bound: log Z ≥ -F(q)
    """
    n = len(unary_energies)
    # Initialize beliefs uniformly
    q = np.ones((n, 2)) / 2  # q[i, k] = q_i(x_i = k)
    
    for _ in range(max_iters):
        for i in range(n):
            # Update q_i based on neighbors
            log_q = np.zeros(2)
            for k in range(2):
                log_q[k] = -unary_energies[i, k]
                # Add pairwise contribution (simplified)
            q[i] = np.exp(log_q - logsumexp(log_q))
    
    # Compute variational free energy bound
    # F = E_q[E(x)] - H(q)  where H is entropy
    entropy = -np.sum(q * np.log(q + 1e-10))
    # Energy term would need full computation
    
    return entropy  # Simplified; real implementation needs energy term
 
 
def importance_sampling_Z(log_potential_func, proposal_sampler, 
                          log_proposal_func, n_samples: int):
    """
    Estimate Z using importance sampling.
    
    Z = E_q[exp(log_psi(x)) / q(x)]
      ≈ (1/N) Σ exp(log_psi(x_i) - log_q(x_i))
    """
    log_weights = []
    
    for _ in range(n_samples):
        x = proposal_sampler()
        log_w = log_potential_func(x) - log_proposal_func(x)
        log_weights.append(log_w)
    
    # log Z ≈ log(mean(exp(log_weights)))
    log_weights = np.array(log_weights)
    log_Z_estimate = logsumexp(log_weights) - np.log(n_samples)
    
    return log_Z_estimate
 
print("Partition Function Approximation Methods:")
print("=" * 50)
print("See Module 4 (Inference in Graphical Models) for details")

Summary: The Partition Function

Key Takeaways

•Z normalizes potentials into probabilities — The partition function ensures the distribution sums to one: $P(x) = \frac{1}{Z}\prod_C \psi_C(x_C)$.
•Computing Z is #P-hard — Exponential in the number of variables for general graphs.
•Log-partition is preferred — $\log Z$ offers numerical stability, convexity, and gradient connections.
•Z appears in all inference tasks — Marginals, conditionals, and expectations all involve Z.
•Approximation is essential — Variational methods, MCMC, and message passing approximate Z for practical applications.

What's Next:

The next page presents the Hammersley-Clifford theorem—the fundamental result that formally connects graph structure to probability factorization in MRFs.

Page Complete

You now understand the partition function's role, its computational challenges, and why approximate inference is necessary for practical MRF applications.

3 / 5