Approximate Inference Deterministic - Learning Module

Loading content...

0/278

Choosing Approximation Method

The Art of Practical Inference

Throughout this module, we've developed a comprehensive toolkit for deterministic approximate inference: Laplace approximation for quick Gaussian fits at the mode, variational inference for optimizing over tractable families, and expectation propagation for moment-matched approximations with better calibration.

But knowing these methods individually isn't enough. The real art of Bayesian machine learning lies in selecting the right method for your specific problem—balancing computational constraints, accuracy requirements, model structure, and application needs.

This final page provides a systematic framework for method selection, drawing on everything we've learned. By the end, you'll have both the conceptual understanding and practical decision procedures to choose wisely among approximation strategies.

What You Will Learn

By the end of this page, you will be able to systematically evaluate inference method requirements for any problem, apply a structured decision algorithm to select among Laplace, VI, EP, and MCMC, understand how model structure influences method selection, and diagnose when your chosen method is failing and pivot appropriately.

The Method Selection Landscape

Before diving into decision procedures, let's consolidate our understanding of what each method offers:

Laplace Approximation

Strengths: Fastest, closed-form, provides marginal likelihood, works well for peaked posteriors
Weaknesses: Unimodal Gaussian only, fails for multimodal/skewed posteriors, requires mode-finding
Best for: Quick baseline, model comparison, well-behaved likelihoods

Variational Inference (VI)

Strengths: Flexible families, scalable via SVI, natural gradient methods, well-suited to deep learning
Weaknesses: Underestimates variance, mode-seeking, requires careful family selection
Best for: Large datasets, neural networks, latent variable models

Expectation Propagation (EP)

Strengths: Better calibrated, mean-seeking, excellent for GP classification
Weaknesses: Convergence not guaranteed, more complex to implement, limited to certain factor structures
Best for: GP classification, ranking models, factor graphs

Method Comparison Matrix
Criterion	Laplace	VI	EP	MCMC
Posterior approximation	N(θ̂, H⁻¹)	q*(θ) ∈ Q	∏f̃ᵢ(θ)	Samples
Objective optimized	log p(θ\|D)	ELBO (KL)	Local KL	None (sampling)
Implementation difficulty	Low	Medium	High	Medium
Convergence guarantee	Yes	Yes (to local)	No	Yes (asymptotic)
Uncertainty quality	Poor	Poor-Medium	Medium-Good	Excellent
Scalability (n)	Good	Excellent (SVI)	Moderate	Poor
Scalability (d)	O(d³)	O(d)-O(d³)	O(d³)	O(d²)
Marginal likelihood	Yes (approx)	Yes (lower bound)	Yes (approx)	Difficult

Requirement Analysis

Effective method selection begins with understanding your requirements across five dimensions:

1. Computational Budget

Real-time (< 100ms): Only pre-computed or amortized methods feasible
Interactive (< 10s): Laplace, simple VI
Offline batch (< 1h): Any deterministic method, potentially MCMC
Overnight (< 24h): MCMC with thorough diagnostics
Unconstrained: Multiple MCMC chains with extensive validation

2. Accuracy Requirements

Point estimate only: MAP suffices; no inference needed
Rough uncertainty: Laplace provides reasonable estimate
Calibrated predictions: VI with validation, EP for classification
Decision-sensitive: EP or MCMC (acquisition functions, hypothesis testing)
Exact inference: MCMC mandatory with convergence verification

3. Problem Scale

Small (n < 1K, d < 100): All methods feasible; use accuracy-appropriate choice
Medium (n 1K-100K, d 100-1K): Consider SVI, HMC still possible
Large (n > 100K): SVI essential; MCMC impractical without subsampling
High-dimensional (d > 10K): Mean-field VI or structured approximations only

4. Model Structure

Conjugate exponential family: Coordinate ascent VI has closed forms
Non-conjugate with Gaussian-like posterior: Laplace works well
Non-Gaussian likelihoods (classification): EP or VI with care
Latent variable models: VI designed for this (EM-VI connection)
Hierarchical models: Consider structured VI or partially Gibbs sampling

5. Downstream Use

Prediction only: Any method providing mean predictions
Uncertainty propagation: Need samples or parametric form
Model comparison: Need marginal likelihood (Laplace, VI ELBO, EP)
Decision making: Need accurate tail probabilities (MCMC or calibrated VI/EP)

Requirements Often Conflict

You may want exact inference on a large-scale problem with real-time latency—which is impossible. Recognizing these conflicts is the first step to making reasonable trade-offs. Document your compromises explicitly.

The Decision Algorithm

Based on the requirement dimensions, here's a systematic decision procedure:

Phase 1: Feasibility Filter

1. IF computational budget < 1 second:
     → Use Laplace OR pre-trained/amortized VI
     STOP

2. IF n > 100K AND cannot subsample:
     → Use SVI (stochastic VI)
     PROCEED to Phase 2 for family selection

3. IF d > 10K:
     → Use mean-field VI OR structured VI
     PROCEED to Phase 2

4. IF exact inference required AND budget > 1h:
     → Use MCMC with extensive diagnostics
     STOP

Phase 2: Method Selection

5. IF posterior expected to be approximately Gaussian:
     5a. IF quick baseline needed:
           → Use Laplace
     5b. IF model comparison needed:
           → Use Laplace (marginal likelihood)
     5c. IF more flexibility needed:
           → Use full-covariance VI

6. IF non-Gaussian likelihood with GP:
     → Use EP (better calibration than Laplace)

7. IF latent variable model:
     7a. IF conjugate:
           → Use coordinate ascent VI
     7b. IF non-conjugate:
           → Use SVI with reparameterization

8. IF uncertainty calibration critical:
     8a. IF budget allows:
           → Use MCMC
     8b. IF budget limited:
           → Use EP > VI > Laplace

9. DEFAULT:
     → Start with mean-field VI
     → Upgrade to full-covariance if underperforming
     → Consider EP if calibration issues persist

Phase 3: Validation and Iteration

10. Run chosen method with:
      - Multiple random initializations
      - Convergence diagnostics appropriate to method

11. Validate approximation quality:
      - Posterior predictive checks
      - Calibration plots (reliability diagrams)
      - Compare to MCMC on subset if feasible

12. IF validation fails:
      - IF underestimating variance: try EP
      - IF missing modes: try mixture VI or MCMC
      - IF numerical issues: try damping, better parameterization
      - IF still failing: invest in MCMC

13. Document method choice, alternatives considered, 
    and validation results

Model Structure Considerations

The structure of your model heavily influences which approximation methods are tractable and effective. Understanding these connections accelerates method selection.

Model Structure → Method Recommendations
Model Type	Primary Recommendation	Alternatives
Linear Regression (Gaussian)	Laplace (exact for Gaussian!)	VI, MCMC
Logistic Regression	Laplace (log-concave)	VI, EP
Gaussian Process Regression	Exact (Gaussian posteriors)	Sparse GPs with VI
GP Classification	EP (best calibration)	Laplace, VI
Bayesian Neural Network	VI (last-layer or mean-field)	MC Dropout, SWAG
Mixture Model	EM + VI hybrid	MCMC with label switching care
Hierarchical Model	MCMC or structured VI	EP for some structures
State-Space Model	Kalman (linear) / EP (non-linear)	Particle filters
Deep Generative Model (VAE)	Amortized VI	Flow-based VI

Exploiting Conditional Conjugacy:

Many models have partially conjugate structure where some variables have closed-form conditionals:

$$p(\theta_1, \theta_2 | D) \text{ where } p(\theta_1 | \theta_2, D) \text{ is tractable}$$

Strategies:

Rao-Blackwellization: Integrate out $\theta_1$ analytically, use MCMC/VI only for $\theta_2$
Blocked Gibbs: Alternate between exact sampling of $\theta_1$ and approximate sampling of $\theta_2$
Structured VI: Use mean-field across non-conjugate blocks only

Exploiting Factorization:

If the posterior factors as: $$p(\theta_1, ..., \theta_K | D) = \prod_k p(\theta_k | D)$$

You can run independent inference on each block, parallelizing computation.

Reparameterization Can Change the Game

Sometimes a change of variables transforms a difficult inference problem into an easy one. Log-transforming positive parameters, centering data, or using non-centered parameterizations (for hierarchical models) can make posteriors more Gaussian-like, improving Laplace and VI performance dramatically.

Practical Implementation Workflow

Here's a concrete workflow for implementing approximate inference in practice:

inference_workflow.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
import numpy as np
from typing import Dict, Any, Optional
from dataclasses import dataclass
from enum import Enum
 
class InferenceMethod(Enum):
    LAPLACE = "laplace"
    MEAN_FIELD_VI = "mean_field_vi"
    FULL_COV_VI = "full_cov_vi"
    EP = "expectation_propagation"
    MCMC = "mcmc"
 
@dataclass
class InferenceRequirements:
    """Capture all requirements for method selection."""
    max_seconds: float         # Computational budget
    n_datapoints: int          # Dataset size
    n_parameters: int          # Number of parameters
    requires_marginal_lik: bool = False
    requires_calibrated_uncertainty: bool = False
    requires_exact_inference: bool = False
    is_gp_classification: bool = False
    is_latent_variable_model: bool = False
 
def select_inference_method(req: InferenceRequirements) -> InferenceMethod:
    """
    Apply decision algorithm to select inference method.
    """
    # Phase 1: Feasibility filter
    if req.max_seconds < 1:
        return InferenceMethod.LAPLACE
    
    if req.n_datapoints > 100_000:
        return InferenceMethod.MEAN_FIELD_VI  # Need SVI
    
    if req.n_parameters > 10_000:
        return InferenceMethod.MEAN_FIELD_VI
    
    if req.requires_exact_inference and req.max_seconds > 3600:
        return InferenceMethod.MCMC
    
    # Phase 2: Method selection
    if req.is_gp_classification:
        return InferenceMethod.EP
    
    if req.requires_calibrated_uncertainty:
        if req.max_seconds > 3600:
            return InferenceMethod.MCMC
        else:
            return InferenceMethod.EP
    
    if req.requires_marginal_lik:
        if req.n_parameters < 1000:
            return InferenceMethod.LAPLACE
        else:
            return InferenceMethod.MEAN_FIELD_VI
    
    if req.is_latent_variable_model:
        return InferenceMethod.MEAN_FIELD_VI
    
    # Default: Start simple
    if req.n_parameters < 100:
        return InferenceMethod.FULL_COV_VI
    else:
        return InferenceMethod.MEAN_FIELD_VI
 
class InferenceResult:
    """Container for inference results with diagnostics."""
    
    def __init__(self, method: InferenceMethod):
        self.method = method
        self.converged = False
        self.diagnostics: Dict[str, Any] = {}
        self.posterior_mean: Optional[np.ndarray] = None
        self.posterior_cov: Optional[np.ndarray] = None
        self.samples: Optional[np.ndarray] = None
    
    def validate(self, X_test, y_test, true_posterior=None):
        """
        Validate inference quality.
        """
        validation = {}
        
        # 1. Posterior predictive check
        if self.posterior_mean is not None:
            # Compute predictions and their uncertainty
            pass  # Model-specific implementation
        
        # 2. Calibration check
        if self.samples is not None:
            # Compute coverage of credible intervals
            pass
        
        # 3. Comparison to ground truth
        if true_posterior is not None:
            # Compute KL divergence or other discrepancy
            pass
        
        return validation
 
def run_inference_pipeline(model, data, requirements: InferenceRequirements):
    """
    Complete inference pipeline with validation.
    """
    # Step 1: Select method
    method = select_inference_method(requirements)
    print(f"Selected method: {method.value}")
    
    # Step 2: Run inference with multiple initializations
    results = []
    for init_seed in range(3):  # 3 random restarts
        np.random.seed(init_seed)
        result = run_method(model, data, method)
        results.append(result)
    
    # Step 3: Select best result (highest ELBO / lowest energy)
    best_result = select_best_result(results)
    
    # Step 4: Run diagnostics
    check_convergence(best_result, method)
    
    # Step 5: Validate (if test data available)
    # validation = best_result.validate(X_test, y_test)
    
    # Step 6: If validation fails, consider upgrading method
    # if not validation['passes']:
    #     upgraded_method = upgrade_method(method)
    #     return run_inference_pipeline(model, data, 
    #                                   requirements._replace(method_hint=upgraded_method))
    
    return best_result
 
def upgrade_method(current: InferenceMethod) -> InferenceMethod:
    """Suggest a more sophisticated method when current fails."""
    upgrades = {
        InferenceMethod.LAPLACE: InferenceMethod.FULL_COV_VI,
        InferenceMethod.MEAN_FIELD_VI: InferenceMethod.FULL_COV_VI,
        InferenceMethod.FULL_COV_VI: InferenceMethod.EP,
        InferenceMethod.EP: InferenceMethod.MCMC,
    }
    return upgrades.get(current, InferenceMethod.MCMC)

Common Pitfalls and Solutions

Even with a good method selection, implementation issues can derail inference. Here are common pitfalls and their solutions:

Pitfalls and Solutions

•Pitfall: Laplace gives negative variances → Solution: Check you're at a true mode (positive definite Hessian). Try regularizing with small prior precision or use VI instead.
•Pitfall: VI ELBO decreases or oscillates → Solution: Reduce learning rate, use gradient clipping, try natural gradients. Check for numerical overflow in log-probabilities.
•Pitfall: EP fails to converge → Solution: Increase damping (ε = 0.1), use parallel updates, check for negative site variances.
•Pitfall: VI drastically underestimates variance → Solution: Use full-covariance VI, try EP, or validate with MCMC on a subset.
•Pitfall: Different initializations give very different results → Solution: Posterior may be multimodal. Consider mixture approximations or accept multi-modality via MCMC.
•Pitfall: Calibration is poor despite convergence → Solution: The approximating family may be fundamentally wrong. Try more flexible family or MCMC.

Debugging Decision Tree:

IF ELBO is not increasing:
   → Check gradient computation (finite differences test)
   → Reduce learning rate by 10x
   → Check for NaNs in parameters or gradients
   → Simplify model and re-run

IF ELBO increases but predictions are poor:
   → Approximation may be converged but biased
   → Compare to MCMC on subset
   → Try more flexible variational family
   → Consider model misspecification

IF different runs give very different ELBOs:
   → Multiple local optima exist
   → Run many random restarts
   → Consider if posterior is multimodal
   → Try annealing or better initialization

IF approximation looks good but predictions are overconfident:
   → Typical mean-field behavior
   → Add calibration layer (temperature scaling)
   → Use EP for better variance estimates
   → Validate uncertainty on held-out data

Case Studies

Let's apply our decision framework to realistic scenarios:

Scenario: Bayesian personalization model for product recommendations

Requirements:

n = 10M user interactions
d = 50K user + item embeddings
Latency: < 100ms for inference at serving time
Need calibrated uncertainty for exploration/exploitation

Analysis:

Scale (n=10M, d=50K) eliminates MCMC and EP
Latency requirement eliminates batch inference
Needs: Amortized VI with pre-trained inference network

Solution:

Train VAE-style model offline with large batch SVI
Use inference network (encoder) for online uncertainty estimates
Calibrate predictions on held-out data
Periodically retrain as new data arrives

Method: Amortized Mean-Field Variational Inference

Future Directions

The field of approximate inference continues to evolve rapidly. Several emerging directions are shaping the future of Bayesian machine learning:

Emerging Trends

•Normalizing Flows for VI: Flexible, learnable transformations that can approximate complex posteriors while maintaining tractable densities. Address the "too simple family" limitation of mean-field VI.
•Neural Network-Based Inference: Learn to do inference with neural networks, amortizing the cost across similar problems. Enables fast inference on new data without re-optimization.
•Stein Variational Methods: Particle-based VI that combines the flexibility of sampling with the efficiency of optimization. Bridges the deterministic-stochastic divide.
•Differentiable MCMC: Use gradients to accelerate sampling while maintaining theoretical guarantees. HMC and its variants (NUTS) have revolutionized practical MCMC.
•Automatic Inference in PPLs: Systems like Stan, Pyro, and NumPyro automatically select and tune inference algorithms, lowering the barrier to Bayesian methods.
•Uncertainty-Aware Deep Learning: Integration of Bayesian inference with modern deep learning, including Bayes-by-Backprop, SWAG, and deep ensembles as approximations.

The Convergence of Methods:

Recent research increasingly blurs the line between deterministic and stochastic methods:

Variational MCMC: Use variational approximation as MCMC proposal
Stochastic VI: Use sampling to estimate gradients of ELBO
Importance-weighted VI: Correct VI bias with importance sampling
Hamiltonian VI: Use HMC dynamics within variational inference

The future likely lies not in choosing "VI vs MCMC" but in seamlessly combining their strengths through hybrid methods tailored to specific problems.

Stay Current

Approximate inference is an active research area. What's "best practice" today may be superseded tomorrow. Keep an eye on NeurIPS, ICML, and AISTATS proceedings for new methods. But also remember: a well-validated simple method often beats a poorly-understood complex one.

Summary

Choosing the right approximation method is as important as understanding the methods themselves. This page has provided a systematic framework for analyzing requirements, applying a decision algorithm, and validating your choice.

Key Takeaways

•Analyze requirements first: Computational budget, accuracy needs, problem scale, model structure, and downstream use all influence method selection.
•Apply the decision algorithm: Start with feasibility filters, then select based on model and accuracy characteristics, then validate and iterate.
•Exploit model structure: Conjugacy, factorization, and reparameterization can dramatically simplify inference even for complex models.
•Validate systematically: Convergence is necessary but not sufficient. Check calibration, posterior predictive checks, and compare to ground truth when possible.
•Know when to upgrade: If validation fails, have a plan to try more sophisticated methods rather than accepting poor approximations.
•Document your choices: Future you (and your collaborators) will want to know why you chose a particular method and what alternatives were considered.

Module Conclusion:

You've now completed a comprehensive exploration of deterministic approximate inference methods. From the elegant simplicity of Laplace approximation through the optimization perspective of variational inference to the moment-matching approach of expectation propagation, you understand the full landscape of tractable Bayesian inference.

More importantly, you can now choose wisely among these methods—understanding their trade-offs, recognizing their failure modes, and combining them with stochastic methods when appropriate. This knowledge transforms you from someone who knows how to do inference into someone who knows which inference to do.

Module Complete

Congratulations! You've mastered deterministic approximate inference—Laplace approximation, variational inference, and expectation propagation—along with practical guidance for method selection. You're now equipped to tackle Bayesian inference problems across the full spectrum of machine learning applications.