Inference In Graphical Models - Learning Module

Loading content...

0/278

Variable Elimination

The Computational Heart of Probabilistic Reasoning

Probabilistic graphical models encode complex probability distributions through elegant graphical structures—but a model is only as useful as our ability to extract answers from it. Inference is the computational process of querying these models: computing marginal probabilities, conditional distributions, and most probable configurations.

Variable Elimination (VE) is the foundational exact inference algorithm that transforms the seemingly intractable problem of summing over exponentially many configurations into a structured, polynomial-time (in favorable cases) sequence of local computations. Understanding VE is essential because it reveals the core computational principles that underpin all exact inference methods in graphical models.

What You Will Learn

By the end of this page, you will understand: (1) the fundamental inference tasks in graphical models, (2) the variable elimination algorithm in complete detail, (3) how elimination order affects computational complexity, (4) the connection to dynamic programming and the sum-product algorithm, and (5) practical implementation considerations for real-world applications.

The Inference Problem

Before diving into algorithms, we must precisely define what inference means in the context of probabilistic graphical models. Inference refers to computing quantities derived from a joint probability distribution, and different inference tasks require different computational approaches.

The Core Challenge:

Consider a joint probability distribution P(X₁, X₂, ..., Xₙ) factorized according to a graphical model. The defining characteristic of graphical models is that this joint distribution factors into a product of local functions (conditional probability tables in Bayesian networks, or potential functions in Markov random fields). The inference challenge arises because computing quantities of interest often requires summing over variables—and the number of configurations grows exponentially with the number of variables.

Fundamental Inference Tasks in Graphical Models
Inference Task	Mathematical Form	Description	Example Application
Marginal Inference	P(Xᵢ) = Σₓ₋ᵢ P(X)	Compute the probability distribution over a single variable or subset	Medical diagnosis: P(Disease)
Conditional Inference	P(Xᵢ \| E=e)	Compute distribution given observed evidence	Document classification: P(Category \| Words)
MAP Inference	argmax_X P(X \| E=e)	Find most probable configuration	Image segmentation: best label assignment
Marginal MAP	argmax_Y Σ_Z P(Y,Z \| E)	Find most probable assignment to subset	Robot localization: best pose estimate
Partition Function	Z = Σₓ ψ(X)	Compute normalizing constant (MRFs)	Model training and comparison

Why Naive Computation Fails:

Consider the straightforward approach to computing P(X₁):

P(X₁) = Σₓ₂ Σₓ₃ ... Σₓₙ P(X₁, X₂, ..., Xₙ)

If each variable has k possible values, this sum involves k^(n-1) terms. For a modest network with n = 50 binary variables, we would need to sum over 2⁴⁹ ≈ 5.6 × 10¹⁴ configurations—computationally infeasible even on the fastest supercomputers.

The Key Insight:

Variable elimination exploits the factorized structure of graphical models. Since the joint distribution is a product of local factors, we can push summations inside products (using the distributive law) and perform local computations. This reduces exponential complexity to polynomial complexity for graphs with bounded treewidth.

The Distributive Law is Everything

The entire foundation of exact inference rests on the distributive law of arithmetic: a(b + c) = ab + ac. By factoring out terms that don't depend on certain variables, we can dramatically reduce computation. This principle—called 'variable elimination,' 'factor graph marginalization,' or the 'GDL algorithm'—underlies all exact inference methods.

Factor Operations: The Building Blocks

Variable elimination operates on factors—multi-dimensional arrays that represent functions of subsets of random variables. Before describing the algorithm, we must understand the fundamental operations on factors.

Definition: Factor

A factor φ(X₁, ..., Xₖ) is a function from configurations of variables {X₁, ..., Xₖ} to non-negative real numbers. The variables in the factor are called its scope, denoted Scope(φ). In a Bayesian network, each conditional probability table P(Xᵢ | Parents(Xᵢ)) is a factor. In a Markov random field, potential functions are factors.

Essential Factor Operations

•Factor Product (ψ = φ₁ × φ₂): Combine two factors into one. The scope of the result is Scope(φ₁) ∪ Scope(φ₂). For each configuration of variables in the combined scope, ψ is the product of the corresponding values from φ₁ and φ₂. This is the 'join' operation that builds up the computation.
•Factor Marginalization (τ = Σ_X φ): Eliminate a variable X from factor φ by summing over its values. The scope of the result is Scope(φ) \ {X}. For each configuration of remaining variables, τ is the sum of φ over all values of X. This is the 'project' operation that removes variables.
•Factor Reduction (φ' = φ[X=x]): Restrict a factor to specific evidence by fixing variable X to value x. The scope of the result is Scope(φ) \ {X}. Only rows where X=x are kept. This incorporates observed evidence into computation.
•Factor Normalization: Divide all entries by their sum to obtain a proper probability distribution. Used after computing unnormalized query results to get valid probabilities summing to 1.

factor_operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import numpy as np
from typing import Dict, List, Set, Tuple
from dataclasses import dataclass
 
@dataclass
class Factor:
    """Represents a factor (potential function) over discrete variables.
    
    Each variable is indexed by position, values stored in a multi-dimensional array.
    """
    variables: Tuple[str, ...]  # Variable names in scope
    cardinalities: Tuple[int, ...]  # Number of values for each variable
    values: np.ndarray  # Multi-dimensional array of factor values
    
    @property
    def scope(self) -> Set[str]:
        return set(self.variables)
    
    def get_value(self, assignment: Dict[str, int]) -> float:
        """Get factor value for a specific variable assignment."""
        indices = tuple(assignment[var] for var in self.variables)
        return self.values[indices]
 
 
def factor_product(phi1: Factor, phi2: Factor) -> Factor:
    """Compute the product of two factors.
    
    Result scope is union of input scopes. Each entry is product of
    corresponding entries from input factors.
    
    Time complexity: O(prod of cardinalities in combined scope)
    Space complexity: O(prod of cardinalities in combined scope)
    """
    # Determine combined scope
    combined_vars = list(phi1.variables)
    phi2_only_vars = [v for v in phi2.variables if v not in phi1.scope]
    combined_vars.extend(phi2_only_vars)
    
    # Build cardinality mapping
    var_to_card = dict(zip(phi1.variables, phi1.cardinalities))
    var_to_card.update(dict(zip(phi2.variables, phi2.cardinalities)))
    combined_cards = tuple(var_to_card[v] for v in combined_vars)
    
    # Compute product values
    result_shape = combined_cards
    result_values = np.zeros(result_shape)
    
    # Iterate over all configurations
    for idx in np.ndindex(result_shape):
        assignment = dict(zip(combined_vars, idx))
        
        # Get corresponding indices in original factors
        idx1 = tuple(assignment[v] for v in phi1.variables)
        idx2 = tuple(assignment[v] for v in phi2.variables)
        
        result_values[idx] = phi1.values[idx1] * phi2.values[idx2]
    
    return Factor(
        variables=tuple(combined_vars),
        cardinalities=combined_cards,
        values=result_values
    )
 
 
def factor_marginalize(phi: Factor, var: str) -> Factor:
    """Marginalize (sum out) a variable from the factor.
    
    Result scope is input scope minus the marginalized variable.
    Each entry is sum over all values of eliminated variable.
    
    Time complexity: O(size of input factor)
    Space complexity: O(size of output factor)
    """
    if var not in phi.scope:
        return phi  # Variable not in scope, return unchanged
    
    # Find position and cardinality of variable to eliminate
    var_idx = phi.variables.index(var)
    
    # New scope and cardinalities
    new_vars = tuple(v for v in phi.variables if v != var)
    new_cards = tuple(c for i, c in enumerate(phi.cardinalities) if i != var_idx)
    
    # Sum over the variable being eliminated
    result_values = np.sum(phi.values, axis=var_idx)
    
    return Factor(
        variables=new_vars,
        cardinalities=new_cards,
        values=result_values
    )
 
 
def factor_reduce(phi: Factor, evidence: Dict[str, int]) -> Factor:
    """Reduce factor by fixing variables to observed values (evidence).
    
    Result scope excludes observed variables.
    Only configurations consistent with evidence are retained.
    
    Time complexity: O(size of output factor)
    Space complexity: O(size of output factor)
    """
    remaining_vars = []
    remaining_cards = []
    slices = []
    
    for i, var in enumerate(phi.variables):
        if var in evidence:
            # This variable is observed: select specific value
            slices.append(evidence[var])
        else:
            # This variable is free: keep all values
            remaining_vars.append(var)
            remaining_cards.append(phi.cardinalities[i])
            slices.append(slice(None))
    
    # Extract sub-array corresponding to evidence
    result_values = phi.values[tuple(slices)]
    
    # Handle case where result is scalar (all vars observed)
    if result_values.ndim == 0:
        result_values = np.array([float(result_values)])
        remaining_vars = []
        remaining_cards = []
    
    return Factor(
        variables=tuple(remaining_vars),
        cardinalities=tuple(remaining_cards),
        values=result_values
    )
 
 
def factor_normalize(phi: Factor) -> Factor:
    """Normalize factor to sum to 1 (valid probability distribution)."""
    total = np.sum(phi.values)
    if total == 0:
        raise ValueError("Cannot normalize factor with zero sum")
    return Factor(
        variables=phi.variables,
        cardinalities=phi.cardinalities,
        values=phi.values / total
    )

Factor Representation Matters

Efficient factor implementations use sparse representations for factors with many zero entries, log-space arithmetic to prevent underflow, and careful indexing to minimize memory operations. Production implementations (pgmpy, libDAI, OpenGM) optimize these operations heavily since they dominate inference runtime.

The Variable Elimination Algorithm

With factor operations defined, we can now present the Variable Elimination algorithm. The algorithm computes marginal probabilities by systematically eliminating variables one at a time, using factor products and marginalizations.

Algorithm Overview:

Initialize: Start with the set of all factors from the graphical model
Incorporate Evidence: Reduce factors by conditioning on observed variables
Eliminate Variables: For each variable to eliminate (in a specified order):
- Multiply all factors containing that variable
- Marginalize (sum out) the variable from the product
Combine Remaining: Multiply all remaining factors
Normalize: Divide by the sum to get valid probabilities

The elegance of VE lies in its correctness: by the distributive law, eliminating variables one at a time produces the same result as the naive exponential-time summation.

variable_elimination.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
from typing import List, Dict, Set, Optional
from functools import reduce
 
class VariableElimination:
    """Complete Variable Elimination implementation for exact inference.
    
    Computes marginal and conditional probabilities in graphical models
    by systematically eliminating variables through factor operations.
    """
    
    def __init__(self, factors: List[Factor], variables: Set[str]):
        """
        Initialize with factors and the set of all variables.
        
        Args:
            factors: List of Factor objects representing the model
            variables: Set of all variable names in the model
        """
        self.original_factors = factors
        self.variables = variables
    
    def query(
        self, 
        query_vars: List[str],
        evidence: Optional[Dict[str, int]] = None,
        elimination_order: Optional[List[str]] = None
    ) -> Factor:
        """
        Compute P(query_vars | evidence) via variable elimination.
        
        Args:
            query_vars: Variables whose marginal distribution we want
            evidence: Observed variable assignments (variable -> value)
            elimination_order: Order to eliminate hidden variables
            
        Returns:
            Normalized factor representing P(query_vars | evidence)
        """
        evidence = evidence or {}
        
        # Step 1: Create working copy of factors
        factors = [Factor(f.variables, f.cardinalities, f.values.copy()) 
                   for f in self.original_factors]
        
        # Step 2: Incorporate evidence by reducing factors
        if evidence:
            factors = [factor_reduce(f, evidence) for f in factors]
        
        # Step 3: Determine variables to eliminate
        # Hidden = all variables - query variables - evidence variables
        hidden_vars = self.variables - set(query_vars) - set(evidence.keys())
        
        # Step 4: Determine elimination order
        if elimination_order is None:
            # Default: arbitrary order (not optimal!)
            elim_order = list(hidden_vars)
        else:
            # Use specified order, filtered to hidden variables
            elim_order = [v for v in elimination_order if v in hidden_vars]
        
        # Step 5: Eliminate variables one by one
        for var in elim_order:
            factors = self._eliminate_variable(factors, var)
        
        # Step 6: Multiply remaining factors
        result = self._multiply_all_factors(factors)
        
        # Step 7: Normalize to get proper probability distribution
        result = factor_normalize(result)
        
        return result
    
    def _eliminate_variable(
        self, 
        factors: List[Factor], 
        var: str
    ) -> List[Factor]:
        """
        Eliminate a single variable from the factor list.
        
        1. Identify all factors containing the variable
        2. Multiply them together
        3. Marginalize the variable from the product
        4. Add the new factor back (replacing the originals)
        
        This is the core operation that makes VE efficient.
        """
        # Partition factors by whether they contain var
        factors_with_var = [f for f in factors if var in f.scope]
        factors_without_var = [f for f in factors if var not in f.scope]
        
        if not factors_with_var:
            # Variable doesn't appear in any factor
            return factors
        
        # Multiply all factors containing var
        product = factors_with_var[0]
        for f in factors_with_var[1:]:
            product = factor_product(product, f)
        
        # Marginalize var from the product
        new_factor = factor_marginalize(product, var)
        
        # Return updated factor list
        return factors_without_var + [new_factor]
    
    def _multiply_all_factors(self, factors: List[Factor]) -> Factor:
        """Multiply all factors together."""
        if not factors:
            raise ValueError("No factors to multiply")
        return reduce(factor_product, factors)
    
    def compute_marginal_likelihood(
        self, 
        evidence: Dict[str, int],
        elimination_order: Optional[List[str]] = None
    ) -> float:
        """
        Compute P(evidence) - the marginal likelihood of observations.
        
        This is done by eliminating ALL variables (none are query variables).
        The result is a scalar representing the probability of evidence.
        """
        # Eliminate all non-evidence variables
        factors = [Factor(f.variables, f.cardinalities, f.values.copy()) 
                   for f in self.original_factors]
        factors = [factor_reduce(f, evidence) for f in factors]
        
        hidden_vars = self.variables - set(evidence.keys())
        elim_order = elimination_order or list(hidden_vars)
        elim_order = [v for v in elim_order if v in hidden_vars]
        
        for var in elim_order:
            factors = self._eliminate_variable(factors, var)
        
        # Result should be a single scalar factor
        result = self._multiply_all_factors(factors)
        return float(np.sum(result.values))

The Algorithm is Correct but Not Complete

Variable elimination as presented computes exact marginals—but its efficiency depends critically on the elimination order. A poor order can make VE as slow as naive enumeration. We'll address this crucial issue in the next section.

Elimination Ordering: The Key to Efficiency

The computational complexity of variable elimination depends entirely on the elimination order—the sequence in which variables are removed. Different orders can lead to vastly different intermediate factor sizes, and thus vastly different computation times.

Why Order Matters:

When we eliminate a variable X, we create a new factor whose scope is the union of scopes of all factors containing X (minus X itself). This is called the fill-in created by eliminating X. If X connects many previously unconnected variables, eliminating X first creates a large factor; if X is peripherally connected, eliminating it creates a small factor.

The Induced Graph View:

Variable elimination can be visualized through the induced graph or elimination cliques. Start with the original graph. When eliminating variable X:

Connect all neighbors of X to each other (fill edges)
Remove X from the graph

The width of an elimination order is the maximum number of neighbors any variable has when eliminated. The treewidth of a graph is the minimum width over all possible orderings.

Computational Complexity

If all variables are binary, the time and space complexity of VE is O(n · 2^w) where w is the width of the elimination order. Thus, finding a good order reduces w, making inference tractable. Finding the optimal order is NP-hard, but good heuristics exist.

Common Elimination Order Heuristics
Heuristic	Strategy	Strengths	Weaknesses
Min-Neighbors	Eliminate variable with fewest remaining neighbors	Simple, fast to compute	Doesn't account for cardinality
Min-Weight	Eliminate variable minimizing product of neighbor cardinalities	Accounts for variable domains	Greedy may miss global optimum
Min-Fill	Eliminate variable adding fewest new edges	Directly minimizes fill-in	More expensive to compute
Weighted Min-Fill	Minimize weighted sum of added edges	Balances cardinality and structure	Requires tuning weights
Max-Cardinality Search	Order by when first neighbor is numbered	O(n) complexity	Works well for chordal graphs

elimination_ordering.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def min_fill_heuristic(
    variables: Set[str], 
    factors: List[Factor]
) -> List[str]:
    """
    Compute elimination order using min-fill heuristic.
    
    At each step, eliminate the variable that adds the fewest
    fill edges to the induced graph. This greedy approach often
    produces orderings with low width.
    
    Args:
        variables: Set of variables to eliminate
        factors: Original factors defining the graph structure
        
    Returns:
        Ordered list of variables (elimination order)
    """
    # Build initial adjacency from factor scopes
    # Two variables are adjacent if they appear in the same factor
    adjacency = {v: set() for v in variables}
    for f in factors:
        for v1 in f.scope:
            for v2 in f.scope:
                if v1 != v2:
                    adjacency[v1].add(v2)
                    adjacency[v2].add(v1)
    
    remaining = set(variables)
    order = []
    
    while remaining:
        # Find variable with minimum fill-in
        best_var = None
        best_fill = float('inf')
        
        for var in remaining:
            # Compute fill: number of edges added if we eliminate var
            neighbors = adjacency[var] & remaining
            fill = 0
            neighbors_list = list(neighbors)
            for i, n1 in enumerate(neighbors_list):
                for n2 in neighbors_list[i+1:]:
                    if n2 not in adjacency[n1]:
                        fill += 1
            
            if fill < best_fill:
                best_fill = fill
                best_var = var
        
        # Eliminate best_var: add fill edges
        neighbors = adjacency[best_var] & remaining
        neighbors_list = list(neighbors)
        for i, n1 in enumerate(neighbors_list):
            for n2 in neighbors_list[i+1:]:
                adjacency[n1].add(n2)
                adjacency[n2].add(n1)
        
        order.append(best_var)
        remaining.remove(best_var)
    
    return order
 
 
def min_neighbors_heuristic(
    variables: Set[str], 
    factors: List[Factor]
) -> List[str]:
    """
    Simpler heuristic: always eliminate variable with fewest neighbors.
    
    Faster than min-fill but often produces slightly worse orderings.
    """
    adjacency = {v: set() for v in variables}
    for f in factors:
        for v1 in f.scope:
            for v2 in f.scope:
                if v1 != v2:
                    adjacency[v1].add(v2)
    
    remaining = set(variables)
    order = []
    
    while remaining:
        # Find variable with minimum remaining neighbors
        best_var = min(remaining, 
                       key=lambda v: len(adjacency[v] & remaining))
        
        # Add fill edges
        neighbors = list(adjacency[best_var] & remaining)
        for i, n1 in enumerate(neighbors):
            for n2 in neighbors[i+1:]:
                adjacency[n1].add(n2)
                adjacency[n2].add(n1)
        
        order.append(best_var)
        remaining.remove(best_var)
    
    return order
 
 
def compute_elimination_width(
    order: List[str], 
    factors: List[Factor]
) -> int:
    """
    Compute the width (max clique size - 1) of an elimination order.
    
    This metric determines the worst-case factor size created during VE.
    """
    adjacency = {v: set() for v in order}
    for f in factors:
        for v1 in f.scope:
            for v2 in f.scope:
                if v1 != v2:
                    adjacency[v1].add(v2)
    
    remaining = set(order)
    max_width = 0
    
    for var in order:
        # Width is number of remaining neighbors when eliminated
        width = len(adjacency[var] & remaining)
        max_width = max(max_width, width)
        
        # Add fill edges
        neighbors = list(adjacency[var] & remaining)
        for i, n1 in enumerate(neighbors):
            for n2 in neighbors[i+1:]:
                adjacency[n1].add(n2)
                adjacency[n2].add(n1)
        
        remaining.remove(var)
    
    return max_width

Treewidth and Tractability

A graph has treewidth w if and only if there exists an elimination order with width w. Models with bounded treewidth have polynomial-time exact inference. Trees have treewidth 1 (any leaf-first order is optimal). Grids have treewidth √n (still expensive). Complete graphs have treewidth n-1 (worst case).

VE as Dynamic Programming

Variable elimination is fundamentally a dynamic programming algorithm. Understanding this connection illuminates why VE is efficient and how it relates to other algorithms like the Viterbi algorithm for HMMs or the CKY algorithm for parsing.

The DP Perspective:

Consider computing a sum of products:

Σₓ₁ Σₓ₂ ... Σₓₙ f₁(X_S₁) × f₂(X_S₂) × ... × fₘ(X_Sₘ)

where each fᵢ depends on a subset Sᵢ of variables.

Naive approach: Enumerate all configurations—O(kⁿ) where k is the domain size.

DP approach: If we eliminate X₁ first, we can write:

Σₓ₂ ... Σₓₙ [Σₓ₁ ∏{i: X₁∈Sᵢ} fᵢ] × [∏{i: X₁∉Sᵢ} fᵢ]

The inner sum Σₓ₁ ∏_{i: X₁∈Sᵢ} fᵢ creates a new function (factor) that we memoize. This is precisely what VE does—and this memoization is the essence of dynamic programming.

Connections to Other Algorithms

•Forward-Backward (HMMs): VE on an HMM from left-to-right computes forward probabilities; right-to-left computes backward. The forward-backward algorithm is two runs of VE!
•Viterbi Algorithm: Replace sum with max in VE to compute MAP inference. The Viterbi algorithm for HMMs is exactly this—VE with max-product instead of sum-product.
•CKY Parsing: Parsing a sentence with a CFG uses the same DP structure—summing over derivations is analogous to variable elimination.
•Floyd-Warshall: All-pairs shortest paths can be viewed as VE in a graph semiring, eliminating intermediate nodes one by one.
•Matrix Chain Multiplication: Optimal parenthesization uses similar DP over elimination orders—connecting to optimal VE ordering!

The Generalized Distributive Law:

Variable elimination is an instance of the Generalized Distributive Law (GDL), also known as the sum-product algorithm on factor graphs. The GDL abstracts the computation:

Choose an algebraic structure (semiring): e.g., (ℝ⁺, +, ×) for probability, (ℝ ∪ {∞}, min, +) for shortest paths
Represent the problem as a product of local functions
Eliminate variables by repeated local marginalization

This abstraction unifies an enormous class of algorithms under one framework.

Semiring Instantiations of Variable Elimination
Problem	⊕ (marginalize)	⊗ (combine)	Result
Marginal probabilities	sum (+)	product (×)	P(query \| evidence)
MAP inference	max	product (×)	Most probable config
Shortest path	min	sum (+)	Shortest distances
Boolean satisfiability	or (∨)	and (∧)	Satisfiability check
Counting solutions	sum (+)	product (×)	Number of solutions
Constraint satisfaction	max	min	Best partial assignment

Practical Implementation Considerations

Moving from algorithm to implementation requires addressing several practical challenges. Real-world VE implementations must handle numerical issues, memory constraints, and performance optimization.

Numerical Stability:

Probability values in graphical models can be extremely small—products of many probabilities quickly underflow standard floating-point. The solution is to work in log space:

Store log φ(x) instead of φ(x)
Products become sums: log(φ₁ × φ₂) = log φ₁ + log φ₂
Marginalization requires log-sum-exp: log Σₓ exp(log φ(x)) = logsumexp(log φ)

log_space_operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
from scipy.special import logsumexp
 
def log_factor_marginalize(log_phi: np.ndarray, axis: int) -> np.ndarray:
    """
    Marginalize a log-space factor along the specified axis.
    
    Uses logsumexp for numerical stability:
    log(sum_x exp(log_phi)) = logsumexp(log_phi)
    
    Args:
        log_phi: Log-space factor values
        axis: Axis to marginalize over
        
    Returns:
        Log-space marginalized factor values
    """
    return logsumexp(log_phi, axis=axis)
 
 
def log_factor_product(log_phi1: np.ndarray, log_phi2: np.ndarray,
                       phi1_axes: List[int], phi2_axes: List[int],
                       result_axes: List[Tuple[int, str]]) -> np.ndarray:
    """
    Log-space factor product using broadcasting.
    
    In log space, multiplication becomes addition.
    The complexity is managing axis alignment.
    """
    # Expand dimensions for broadcasting
    shape1 = [1] * len(result_axes)
    shape2 = [1] * len(result_axes)
    
    for i, (_, source) in enumerate(result_axes):
        if source == 'phi1':
            shape1[i] = log_phi1.shape[phi1_axes.pop(0)]
        elif source == 'phi2':
            shape2[i] = log_phi2.shape[phi2_axes.pop(0)]
    
    # Reshape and add (product in log space)
    return log_phi1.reshape(shape1) + log_phi2.reshape(shape2)
 
 
def normalize_log_distribution(log_probs: np.ndarray) -> np.ndarray:
    """
    Normalize a log-probability distribution to sum to 1.
    
    Returns probabilities in standard (not log) space.
    """
    log_Z = logsumexp(log_probs)  # log of normalizing constant
    log_normalized = log_probs - log_Z
    return np.exp(log_normalized)
 
 
# Example: Avoiding underflow
def demo_log_space_necessity():
    """Demonstrate why log space is essential."""
    # Consider a chain of 100 binary variables
    n = 100
    p = 0.1  # Low probability at each step
    
    # Standard space: underflows to 0
    prob_standard = p ** n
    print(f"Standard space: {prob_standard}")  # 0.0 (underflow)
    
    # Log space: handles correctly
    log_prob = n * np.log(p)
    print(f"Log probability: {log_prob}")  # -230.26 (correct)
    print(f"Actual probability: {np.exp(log_prob)}")  # 1e-100 (reconstituted)

Memory Management Strategies

•Sparse Factor Representation: Many factors have structured zero patterns (deterministic CPTs, local constraints). Store only non-zero entries with index mapping to save memory and skip zero multiplications.
•Factor Caching: Intermediate factors can be reused across multiple queries with the same evidence. Cache them keyed by (eliminated variables, evidence) tuples.
•Memory-Bounded VE: If intermediate factors exceed memory, fall back to iterative methods (belief propagation) or approximation. Some systems dynamically switch algorithms based on factor sizes.
•Out-of-Core Computation: For very large factors, stream data from disk. Modern implementations use memory-mapped arrays and careful I/O scheduling.

When VE Fails

Variable elimination is exact but not always tractable. For graphs with high treewidth (dense graphs, grids, complete graphs), even the optimal elimination order produces intermediate factors that exceed available memory. In these cases, approximate inference (belief propagation, sampling) is necessary.

VE for Different Query Types

Variable elimination can be adapted for different inference tasks by modifying the marginalization operation or the computation structure.

MAP Inference:

To find the most probable configuration (Maximum A Posteriori assignment), replace summation with maximization:

argmax_X P(X | E) = argmax_X ∏ᵢ φᵢ(X_Sᵢ, E)

The VE algorithm becomes:

Factor product: unchanged (multiply potentials)
Factor marginalization: replace Σ with max
Track argmax: store which value achieved the maximum for backtracking

map_inference.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
from typing import Dict, Tuple
 
def factor_max_marginalize(phi: Factor, var: str) -> Tuple[Factor, Dict]:
    """
    Max-marginalize a variable (for MAP inference).
    
    Returns:
        - New factor with max values
        - Traceback dictionary mapping remaining configs to argmax values
    """
    var_idx = phi.variables.index(var)
    new_vars = tuple(v for v in phi.variables if v != var)
    new_cards = tuple(c for i, c in enumerate(phi.cardinalities) if i != var_idx)
    
    # Take max along variable axis
    max_values = np.max(phi.values, axis=var_idx)
    argmax_values = np.argmax(phi.values, axis=var_idx)
    
    new_factor = Factor(
        variables=new_vars,
        cardinalities=new_cards,
        values=max_values
    )
    
    return new_factor, argmax_values
 
 
class MAPVariableElimination:
    """Variable elimination for MAP inference."""
    
    def __init__(self, factors: List[Factor], variables: Set[str]):
        self.factors = factors
        self.variables = variables
        
    def map_query(
        self, 
        evidence: Dict[str, int],
        elimination_order: Optional[List[str]] = None
    ) -> Dict[str, int]:
        """
        Compute argmax_X P(X | evidence).
        
        Returns the most probable assignment to all non-evidence variables.
        """
        # Reduce factors by evidence
        factors = [factor_reduce(f, evidence) for f in self.factors]
        
        hidden_vars = self.variables - set(evidence.keys())
        elim_order = elimination_order or list(hidden_vars)
        
        # Track argmax for backtracking
        traceback = {}  # var -> (prev_factors, argmax_array)
        
        # Eliminate variables with max instead of sum
        for var in elim_order:
            factors_with_var = [f for f in factors if var in f.scope]
            factors_without_var = [f for f in factors if var not in f.scope]
            
            if factors_with_var:
                product = reduce(factor_product, factors_with_var)
                new_factor, argmax = factor_max_marginalize(product, var)
                traceback[var] = (product.variables, argmax)
                factors = factors_without_var + [new_factor]
        
        # Backtrack to recover MAP assignment
        assignment = {}
        
        # Reverse elimination order for backtracking
        for var in reversed(elim_order):
            if var in traceback:
                orig_vars, argmax = traceback[var]
                # Build index from already-assigned variables
                idx = []
                for v in orig_vars:
                    if v != var and v in assignment:
                        idx.append(assignment[v])
                assignment[var] = int(argmax[tuple(idx)]) if idx else int(argmax)
        
        return assignment

Marginal MAP is Harder

Marginal MAP—maximizing over some variables while summing out others—is NPPP-complete, strictly harder than both marginal inference (PP) and full MAP inference (NP). This is because summation and maximization don't commute, so we can't simply interleave sum and max operations.

Summary: Variable Elimination Foundations

Variable elimination is the foundational exact inference algorithm for probabilistic graphical models. Its elegance lies in exploiting the factorized structure of joint distributions to convert exponential summations into polynomial computations (when treewidth is bounded).

Key Insights:

Core Takeaways

•Inference is computation: Querying graphical models reduces to algebraic operations on factors—products and marginalizations.
•Structure enables efficiency: The factorized structure of graphical models allows local computations via the distributive law.
•Order matters critically: Elimination order determines computational complexity; optimal ordering is NP-hard but heuristics work well.
•VE is dynamic programming: Variable elimination memoizes intermediate computations, trading space for time.
•Generalization through semirings: By changing the algebraic operations, VE solves marginal inference, MAP, shortest paths, and more.
•Treewidth bounds tractability: Graphs with low treewidth admit efficient exact inference; high treewidth requires approximation.

Connection to What's Next:

Variable elimination answers one query efficiently—but what if we need multiple marginals, or if the graph structure allows more efficient organization? The next page introduces Belief Propagation, an algorithm that computes all marginals simultaneously by passing messages along graph edges. BP exploits the same structural insights as VE but organizes computation as a message-passing protocol, leading to the junction tree algorithm for exact inference and loopy BP for approximate inference.

Page Complete

You now understand Variable Elimination—the foundational exact inference algorithm for graphical models. You can implement VE, analyze its complexity, choose good elimination orders, and adapt it for different query types. Next, we'll see how message passing provides an alternative perspective that enables more efficient computation of multiple marginals.