Loading content...
Probabilistic graphical models encode complex probability distributions through elegant graphical structures—but a model is only as useful as our ability to extract answers from it. Inference is the computational process of querying these models: computing marginal probabilities, conditional distributions, and most probable configurations.
Variable Elimination (VE) is the foundational exact inference algorithm that transforms the seemingly intractable problem of summing over exponentially many configurations into a structured, polynomial-time (in favorable cases) sequence of local computations. Understanding VE is essential because it reveals the core computational principles that underpin all exact inference methods in graphical models.
By the end of this page, you will understand: (1) the fundamental inference tasks in graphical models, (2) the variable elimination algorithm in complete detail, (3) how elimination order affects computational complexity, (4) the connection to dynamic programming and the sum-product algorithm, and (5) practical implementation considerations for real-world applications.
Before diving into algorithms, we must precisely define what inference means in the context of probabilistic graphical models. Inference refers to computing quantities derived from a joint probability distribution, and different inference tasks require different computational approaches.
The Core Challenge:
Consider a joint probability distribution P(X₁, X₂, ..., Xₙ) factorized according to a graphical model. The defining characteristic of graphical models is that this joint distribution factors into a product of local functions (conditional probability tables in Bayesian networks, or potential functions in Markov random fields). The inference challenge arises because computing quantities of interest often requires summing over variables—and the number of configurations grows exponentially with the number of variables.
| Inference Task | Mathematical Form | Description | Example Application |
|---|---|---|---|
| Marginal Inference | P(Xᵢ) = Σₓ₋ᵢ P(X) | Compute the probability distribution over a single variable or subset | Medical diagnosis: P(Disease) |
| Conditional Inference | P(Xᵢ | E=e) | Compute distribution given observed evidence | Document classification: P(Category | Words) |
| MAP Inference | argmax_X P(X | E=e) | Find most probable configuration | Image segmentation: best label assignment |
| Marginal MAP | argmax_Y Σ_Z P(Y,Z | E) | Find most probable assignment to subset | Robot localization: best pose estimate |
| Partition Function | Z = Σₓ ψ(X) | Compute normalizing constant (MRFs) | Model training and comparison |
Why Naive Computation Fails:
Consider the straightforward approach to computing P(X₁):
P(X₁) = Σₓ₂ Σₓ₃ ... Σₓₙ P(X₁, X₂, ..., Xₙ)
If each variable has k possible values, this sum involves k^(n-1) terms. For a modest network with n = 50 binary variables, we would need to sum over 2⁴⁹ ≈ 5.6 × 10¹⁴ configurations—computationally infeasible even on the fastest supercomputers.
The Key Insight:
Variable elimination exploits the factorized structure of graphical models. Since the joint distribution is a product of local factors, we can push summations inside products (using the distributive law) and perform local computations. This reduces exponential complexity to polynomial complexity for graphs with bounded treewidth.
The entire foundation of exact inference rests on the distributive law of arithmetic: a(b + c) = ab + ac. By factoring out terms that don't depend on certain variables, we can dramatically reduce computation. This principle—called 'variable elimination,' 'factor graph marginalization,' or the 'GDL algorithm'—underlies all exact inference methods.
Variable elimination operates on factors—multi-dimensional arrays that represent functions of subsets of random variables. Before describing the algorithm, we must understand the fundamental operations on factors.
Definition: Factor
A factor φ(X₁, ..., Xₖ) is a function from configurations of variables {X₁, ..., Xₖ} to non-negative real numbers. The variables in the factor are called its scope, denoted Scope(φ). In a Bayesian network, each conditional probability table P(Xᵢ | Parents(Xᵢ)) is a factor. In a Markov random field, potential functions are factors.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
import numpy as npfrom typing import Dict, List, Set, Tuplefrom dataclasses import dataclass @dataclassclass Factor: """Represents a factor (potential function) over discrete variables. Each variable is indexed by position, values stored in a multi-dimensional array. """ variables: Tuple[str, ...] # Variable names in scope cardinalities: Tuple[int, ...] # Number of values for each variable values: np.ndarray # Multi-dimensional array of factor values @property def scope(self) -> Set[str]: return set(self.variables) def get_value(self, assignment: Dict[str, int]) -> float: """Get factor value for a specific variable assignment.""" indices = tuple(assignment[var] for var in self.variables) return self.values[indices] def factor_product(phi1: Factor, phi2: Factor) -> Factor: """Compute the product of two factors. Result scope is union of input scopes. Each entry is product of corresponding entries from input factors. Time complexity: O(prod of cardinalities in combined scope) Space complexity: O(prod of cardinalities in combined scope) """ # Determine combined scope combined_vars = list(phi1.variables) phi2_only_vars = [v for v in phi2.variables if v not in phi1.scope] combined_vars.extend(phi2_only_vars) # Build cardinality mapping var_to_card = dict(zip(phi1.variables, phi1.cardinalities)) var_to_card.update(dict(zip(phi2.variables, phi2.cardinalities))) combined_cards = tuple(var_to_card[v] for v in combined_vars) # Compute product values result_shape = combined_cards result_values = np.zeros(result_shape) # Iterate over all configurations for idx in np.ndindex(result_shape): assignment = dict(zip(combined_vars, idx)) # Get corresponding indices in original factors idx1 = tuple(assignment[v] for v in phi1.variables) idx2 = tuple(assignment[v] for v in phi2.variables) result_values[idx] = phi1.values[idx1] * phi2.values[idx2] return Factor( variables=tuple(combined_vars), cardinalities=combined_cards, values=result_values ) def factor_marginalize(phi: Factor, var: str) -> Factor: """Marginalize (sum out) a variable from the factor. Result scope is input scope minus the marginalized variable. Each entry is sum over all values of eliminated variable. Time complexity: O(size of input factor) Space complexity: O(size of output factor) """ if var not in phi.scope: return phi # Variable not in scope, return unchanged # Find position and cardinality of variable to eliminate var_idx = phi.variables.index(var) # New scope and cardinalities new_vars = tuple(v for v in phi.variables if v != var) new_cards = tuple(c for i, c in enumerate(phi.cardinalities) if i != var_idx) # Sum over the variable being eliminated result_values = np.sum(phi.values, axis=var_idx) return Factor( variables=new_vars, cardinalities=new_cards, values=result_values ) def factor_reduce(phi: Factor, evidence: Dict[str, int]) -> Factor: """Reduce factor by fixing variables to observed values (evidence). Result scope excludes observed variables. Only configurations consistent with evidence are retained. Time complexity: O(size of output factor) Space complexity: O(size of output factor) """ remaining_vars = [] remaining_cards = [] slices = [] for i, var in enumerate(phi.variables): if var in evidence: # This variable is observed: select specific value slices.append(evidence[var]) else: # This variable is free: keep all values remaining_vars.append(var) remaining_cards.append(phi.cardinalities[i]) slices.append(slice(None)) # Extract sub-array corresponding to evidence result_values = phi.values[tuple(slices)] # Handle case where result is scalar (all vars observed) if result_values.ndim == 0: result_values = np.array([float(result_values)]) remaining_vars = [] remaining_cards = [] return Factor( variables=tuple(remaining_vars), cardinalities=tuple(remaining_cards), values=result_values ) def factor_normalize(phi: Factor) -> Factor: """Normalize factor to sum to 1 (valid probability distribution).""" total = np.sum(phi.values) if total == 0: raise ValueError("Cannot normalize factor with zero sum") return Factor( variables=phi.variables, cardinalities=phi.cardinalities, values=phi.values / total )Efficient factor implementations use sparse representations for factors with many zero entries, log-space arithmetic to prevent underflow, and careful indexing to minimize memory operations. Production implementations (pgmpy, libDAI, OpenGM) optimize these operations heavily since they dominate inference runtime.
With factor operations defined, we can now present the Variable Elimination algorithm. The algorithm computes marginal probabilities by systematically eliminating variables one at a time, using factor products and marginalizations.
Algorithm Overview:
The elegance of VE lies in its correctness: by the distributive law, eliminating variables one at a time produces the same result as the naive exponential-time summation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138
from typing import List, Dict, Set, Optionalfrom functools import reduce class VariableElimination: """Complete Variable Elimination implementation for exact inference. Computes marginal and conditional probabilities in graphical models by systematically eliminating variables through factor operations. """ def __init__(self, factors: List[Factor], variables: Set[str]): """ Initialize with factors and the set of all variables. Args: factors: List of Factor objects representing the model variables: Set of all variable names in the model """ self.original_factors = factors self.variables = variables def query( self, query_vars: List[str], evidence: Optional[Dict[str, int]] = None, elimination_order: Optional[List[str]] = None ) -> Factor: """ Compute P(query_vars | evidence) via variable elimination. Args: query_vars: Variables whose marginal distribution we want evidence: Observed variable assignments (variable -> value) elimination_order: Order to eliminate hidden variables Returns: Normalized factor representing P(query_vars | evidence) """ evidence = evidence or {} # Step 1: Create working copy of factors factors = [Factor(f.variables, f.cardinalities, f.values.copy()) for f in self.original_factors] # Step 2: Incorporate evidence by reducing factors if evidence: factors = [factor_reduce(f, evidence) for f in factors] # Step 3: Determine variables to eliminate # Hidden = all variables - query variables - evidence variables hidden_vars = self.variables - set(query_vars) - set(evidence.keys()) # Step 4: Determine elimination order if elimination_order is None: # Default: arbitrary order (not optimal!) elim_order = list(hidden_vars) else: # Use specified order, filtered to hidden variables elim_order = [v for v in elimination_order if v in hidden_vars] # Step 5: Eliminate variables one by one for var in elim_order: factors = self._eliminate_variable(factors, var) # Step 6: Multiply remaining factors result = self._multiply_all_factors(factors) # Step 7: Normalize to get proper probability distribution result = factor_normalize(result) return result def _eliminate_variable( self, factors: List[Factor], var: str ) -> List[Factor]: """ Eliminate a single variable from the factor list. 1. Identify all factors containing the variable 2. Multiply them together 3. Marginalize the variable from the product 4. Add the new factor back (replacing the originals) This is the core operation that makes VE efficient. """ # Partition factors by whether they contain var factors_with_var = [f for f in factors if var in f.scope] factors_without_var = [f for f in factors if var not in f.scope] if not factors_with_var: # Variable doesn't appear in any factor return factors # Multiply all factors containing var product = factors_with_var[0] for f in factors_with_var[1:]: product = factor_product(product, f) # Marginalize var from the product new_factor = factor_marginalize(product, var) # Return updated factor list return factors_without_var + [new_factor] def _multiply_all_factors(self, factors: List[Factor]) -> Factor: """Multiply all factors together.""" if not factors: raise ValueError("No factors to multiply") return reduce(factor_product, factors) def compute_marginal_likelihood( self, evidence: Dict[str, int], elimination_order: Optional[List[str]] = None ) -> float: """ Compute P(evidence) - the marginal likelihood of observations. This is done by eliminating ALL variables (none are query variables). The result is a scalar representing the probability of evidence. """ # Eliminate all non-evidence variables factors = [Factor(f.variables, f.cardinalities, f.values.copy()) for f in self.original_factors] factors = [factor_reduce(f, evidence) for f in factors] hidden_vars = self.variables - set(evidence.keys()) elim_order = elimination_order or list(hidden_vars) elim_order = [v for v in elim_order if v in hidden_vars] for var in elim_order: factors = self._eliminate_variable(factors, var) # Result should be a single scalar factor result = self._multiply_all_factors(factors) return float(np.sum(result.values))Variable elimination as presented computes exact marginals—but its efficiency depends critically on the elimination order. A poor order can make VE as slow as naive enumeration. We'll address this crucial issue in the next section.
The computational complexity of variable elimination depends entirely on the elimination order—the sequence in which variables are removed. Different orders can lead to vastly different intermediate factor sizes, and thus vastly different computation times.
Why Order Matters:
When we eliminate a variable X, we create a new factor whose scope is the union of scopes of all factors containing X (minus X itself). This is called the fill-in created by eliminating X. If X connects many previously unconnected variables, eliminating X first creates a large factor; if X is peripherally connected, eliminating it creates a small factor.
The Induced Graph View:
Variable elimination can be visualized through the induced graph or elimination cliques. Start with the original graph. When eliminating variable X:
The width of an elimination order is the maximum number of neighbors any variable has when eliminated. The treewidth of a graph is the minimum width over all possible orderings.
If all variables are binary, the time and space complexity of VE is O(n · 2^w) where w is the width of the elimination order. Thus, finding a good order reduces w, making inference tractable. Finding the optimal order is NP-hard, but good heuristics exist.
| Heuristic | Strategy | Strengths | Weaknesses |
|---|---|---|---|
| Min-Neighbors | Eliminate variable with fewest remaining neighbors | Simple, fast to compute | Doesn't account for cardinality |
| Min-Weight | Eliminate variable minimizing product of neighbor cardinalities | Accounts for variable domains | Greedy may miss global optimum |
| Min-Fill | Eliminate variable adding fewest new edges | Directly minimizes fill-in | More expensive to compute |
| Weighted Min-Fill | Minimize weighted sum of added edges | Balances cardinality and structure | Requires tuning weights |
| Max-Cardinality Search | Order by when first neighbor is numbered | O(n) complexity | Works well for chordal graphs |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
def min_fill_heuristic( variables: Set[str], factors: List[Factor]) -> List[str]: """ Compute elimination order using min-fill heuristic. At each step, eliminate the variable that adds the fewest fill edges to the induced graph. This greedy approach often produces orderings with low width. Args: variables: Set of variables to eliminate factors: Original factors defining the graph structure Returns: Ordered list of variables (elimination order) """ # Build initial adjacency from factor scopes # Two variables are adjacent if they appear in the same factor adjacency = {v: set() for v in variables} for f in factors: for v1 in f.scope: for v2 in f.scope: if v1 != v2: adjacency[v1].add(v2) adjacency[v2].add(v1) remaining = set(variables) order = [] while remaining: # Find variable with minimum fill-in best_var = None best_fill = float('inf') for var in remaining: # Compute fill: number of edges added if we eliminate var neighbors = adjacency[var] & remaining fill = 0 neighbors_list = list(neighbors) for i, n1 in enumerate(neighbors_list): for n2 in neighbors_list[i+1:]: if n2 not in adjacency[n1]: fill += 1 if fill < best_fill: best_fill = fill best_var = var # Eliminate best_var: add fill edges neighbors = adjacency[best_var] & remaining neighbors_list = list(neighbors) for i, n1 in enumerate(neighbors_list): for n2 in neighbors_list[i+1:]: adjacency[n1].add(n2) adjacency[n2].add(n1) order.append(best_var) remaining.remove(best_var) return order def min_neighbors_heuristic( variables: Set[str], factors: List[Factor]) -> List[str]: """ Simpler heuristic: always eliminate variable with fewest neighbors. Faster than min-fill but often produces slightly worse orderings. """ adjacency = {v: set() for v in variables} for f in factors: for v1 in f.scope: for v2 in f.scope: if v1 != v2: adjacency[v1].add(v2) remaining = set(variables) order = [] while remaining: # Find variable with minimum remaining neighbors best_var = min(remaining, key=lambda v: len(adjacency[v] & remaining)) # Add fill edges neighbors = list(adjacency[best_var] & remaining) for i, n1 in enumerate(neighbors): for n2 in neighbors[i+1:]: adjacency[n1].add(n2) adjacency[n2].add(n1) order.append(best_var) remaining.remove(best_var) return order def compute_elimination_width( order: List[str], factors: List[Factor]) -> int: """ Compute the width (max clique size - 1) of an elimination order. This metric determines the worst-case factor size created during VE. """ adjacency = {v: set() for v in order} for f in factors: for v1 in f.scope: for v2 in f.scope: if v1 != v2: adjacency[v1].add(v2) remaining = set(order) max_width = 0 for var in order: # Width is number of remaining neighbors when eliminated width = len(adjacency[var] & remaining) max_width = max(max_width, width) # Add fill edges neighbors = list(adjacency[var] & remaining) for i, n1 in enumerate(neighbors): for n2 in neighbors[i+1:]: adjacency[n1].add(n2) adjacency[n2].add(n1) remaining.remove(var) return max_widthA graph has treewidth w if and only if there exists an elimination order with width w. Models with bounded treewidth have polynomial-time exact inference. Trees have treewidth 1 (any leaf-first order is optimal). Grids have treewidth √n (still expensive). Complete graphs have treewidth n-1 (worst case).
Variable elimination is fundamentally a dynamic programming algorithm. Understanding this connection illuminates why VE is efficient and how it relates to other algorithms like the Viterbi algorithm for HMMs or the CKY algorithm for parsing.
The DP Perspective:
Consider computing a sum of products:
Σₓ₁ Σₓ₂ ... Σₓₙ f₁(X_S₁) × f₂(X_S₂) × ... × fₘ(X_Sₘ)
where each fᵢ depends on a subset Sᵢ of variables.
Naive approach: Enumerate all configurations—O(kⁿ) where k is the domain size.
DP approach: If we eliminate X₁ first, we can write:
Σₓ₂ ... Σₓₙ [Σₓ₁ ∏{i: X₁∈Sᵢ} fᵢ] × [∏{i: X₁∉Sᵢ} fᵢ]
The inner sum Σₓ₁ ∏_{i: X₁∈Sᵢ} fᵢ creates a new function (factor) that we memoize. This is precisely what VE does—and this memoization is the essence of dynamic programming.
The Generalized Distributive Law:
Variable elimination is an instance of the Generalized Distributive Law (GDL), also known as the sum-product algorithm on factor graphs. The GDL abstracts the computation:
This abstraction unifies an enormous class of algorithms under one framework.
| Problem | ⊕ (marginalize) | ⊗ (combine) | Result |
|---|---|---|---|
| Marginal probabilities | sum (+) | product (×) | P(query | evidence) |
| MAP inference | max | product (×) | Most probable config |
| Shortest path | min | sum (+) | Shortest distances |
| Boolean satisfiability | or (∨) | and (∧) | Satisfiability check |
| Counting solutions | sum (+) | product (×) | Number of solutions |
| Constraint satisfaction | max | min | Best partial assignment |
Moving from algorithm to implementation requires addressing several practical challenges. Real-world VE implementations must handle numerical issues, memory constraints, and performance optimization.
Numerical Stability:
Probability values in graphical models can be extremely small—products of many probabilities quickly underflow standard floating-point. The solution is to work in log space:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
import numpy as npfrom scipy.special import logsumexp def log_factor_marginalize(log_phi: np.ndarray, axis: int) -> np.ndarray: """ Marginalize a log-space factor along the specified axis. Uses logsumexp for numerical stability: log(sum_x exp(log_phi)) = logsumexp(log_phi) Args: log_phi: Log-space factor values axis: Axis to marginalize over Returns: Log-space marginalized factor values """ return logsumexp(log_phi, axis=axis) def log_factor_product(log_phi1: np.ndarray, log_phi2: np.ndarray, phi1_axes: List[int], phi2_axes: List[int], result_axes: List[Tuple[int, str]]) -> np.ndarray: """ Log-space factor product using broadcasting. In log space, multiplication becomes addition. The complexity is managing axis alignment. """ # Expand dimensions for broadcasting shape1 = [1] * len(result_axes) shape2 = [1] * len(result_axes) for i, (_, source) in enumerate(result_axes): if source == 'phi1': shape1[i] = log_phi1.shape[phi1_axes.pop(0)] elif source == 'phi2': shape2[i] = log_phi2.shape[phi2_axes.pop(0)] # Reshape and add (product in log space) return log_phi1.reshape(shape1) + log_phi2.reshape(shape2) def normalize_log_distribution(log_probs: np.ndarray) -> np.ndarray: """ Normalize a log-probability distribution to sum to 1. Returns probabilities in standard (not log) space. """ log_Z = logsumexp(log_probs) # log of normalizing constant log_normalized = log_probs - log_Z return np.exp(log_normalized) # Example: Avoiding underflowdef demo_log_space_necessity(): """Demonstrate why log space is essential.""" # Consider a chain of 100 binary variables n = 100 p = 0.1 # Low probability at each step # Standard space: underflows to 0 prob_standard = p ** n print(f"Standard space: {prob_standard}") # 0.0 (underflow) # Log space: handles correctly log_prob = n * np.log(p) print(f"Log probability: {log_prob}") # -230.26 (correct) print(f"Actual probability: {np.exp(log_prob)}") # 1e-100 (reconstituted)Variable elimination is exact but not always tractable. For graphs with high treewidth (dense graphs, grids, complete graphs), even the optimal elimination order produces intermediate factors that exceed available memory. In these cases, approximate inference (belief propagation, sampling) is necessary.
Variable elimination can be adapted for different inference tasks by modifying the marginalization operation or the computation structure.
MAP Inference:
To find the most probable configuration (Maximum A Posteriori assignment), replace summation with maximization:
argmax_X P(X | E) = argmax_X ∏ᵢ φᵢ(X_Sᵢ, E)
The VE algorithm becomes:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
from typing import Dict, Tuple def factor_max_marginalize(phi: Factor, var: str) -> Tuple[Factor, Dict]: """ Max-marginalize a variable (for MAP inference). Returns: - New factor with max values - Traceback dictionary mapping remaining configs to argmax values """ var_idx = phi.variables.index(var) new_vars = tuple(v for v in phi.variables if v != var) new_cards = tuple(c for i, c in enumerate(phi.cardinalities) if i != var_idx) # Take max along variable axis max_values = np.max(phi.values, axis=var_idx) argmax_values = np.argmax(phi.values, axis=var_idx) new_factor = Factor( variables=new_vars, cardinalities=new_cards, values=max_values ) return new_factor, argmax_values class MAPVariableElimination: """Variable elimination for MAP inference.""" def __init__(self, factors: List[Factor], variables: Set[str]): self.factors = factors self.variables = variables def map_query( self, evidence: Dict[str, int], elimination_order: Optional[List[str]] = None ) -> Dict[str, int]: """ Compute argmax_X P(X | evidence). Returns the most probable assignment to all non-evidence variables. """ # Reduce factors by evidence factors = [factor_reduce(f, evidence) for f in self.factors] hidden_vars = self.variables - set(evidence.keys()) elim_order = elimination_order or list(hidden_vars) # Track argmax for backtracking traceback = {} # var -> (prev_factors, argmax_array) # Eliminate variables with max instead of sum for var in elim_order: factors_with_var = [f for f in factors if var in f.scope] factors_without_var = [f for f in factors if var not in f.scope] if factors_with_var: product = reduce(factor_product, factors_with_var) new_factor, argmax = factor_max_marginalize(product, var) traceback[var] = (product.variables, argmax) factors = factors_without_var + [new_factor] # Backtrack to recover MAP assignment assignment = {} # Reverse elimination order for backtracking for var in reversed(elim_order): if var in traceback: orig_vars, argmax = traceback[var] # Build index from already-assigned variables idx = [] for v in orig_vars: if v != var and v in assignment: idx.append(assignment[v]) assignment[var] = int(argmax[tuple(idx)]) if idx else int(argmax) return assignmentMarginal MAP—maximizing over some variables while summing out others—is NPPP-complete, strictly harder than both marginal inference (PP) and full MAP inference (NP). This is because summation and maximization don't commute, so we can't simply interleave sum and max operations.
Variable elimination is the foundational exact inference algorithm for probabilistic graphical models. Its elegance lies in exploiting the factorized structure of joint distributions to convert exponential summations into polynomial computations (when treewidth is bounded).
Key Insights:
Connection to What's Next:
Variable elimination answers one query efficiently—but what if we need multiple marginals, or if the graph structure allows more efficient organization? The next page introduces Belief Propagation, an algorithm that computes all marginals simultaneously by passing messages along graph edges. BP exploits the same structural insights as VE but organizes computation as a message-passing protocol, leading to the junction tree algorithm for exact inference and loopy BP for approximate inference.
You now understand Variable Elimination—the foundational exact inference algorithm for graphical models. You can implement VE, analyze its complexity, choose good elimination orders, and adapt it for different query types. Next, we'll see how message passing provides an alternative perspective that enables more efficient computation of multiple marginals.