Loading content...
Throughout this module, we've developed a comprehensive toolkit for deterministic approximate inference: Laplace approximation for quick Gaussian fits at the mode, variational inference for optimizing over tractable families, and expectation propagation for moment-matched approximations with better calibration.
But knowing these methods individually isn't enough. The real art of Bayesian machine learning lies in selecting the right method for your specific problem—balancing computational constraints, accuracy requirements, model structure, and application needs.
This final page provides a systematic framework for method selection, drawing on everything we've learned. By the end, you'll have both the conceptual understanding and practical decision procedures to choose wisely among approximation strategies.
By the end of this page, you will be able to systematically evaluate inference method requirements for any problem, apply a structured decision algorithm to select among Laplace, VI, EP, and MCMC, understand how model structure influences method selection, and diagnose when your chosen method is failing and pivot appropriately.
Before diving into decision procedures, let's consolidate our understanding of what each method offers:
Laplace Approximation
Variational Inference (VI)
Expectation Propagation (EP)
| Criterion | Laplace | VI | EP | MCMC |
|---|---|---|---|---|
| Posterior approximation | N(θ̂, H⁻¹) | q*(θ) ∈ Q | ∏f̃ᵢ(θ) | Samples |
| Objective optimized | log p(θ|D) | ELBO (KL) | Local KL | None (sampling) |
| Implementation difficulty | Low | Medium | High | Medium |
| Convergence guarantee | Yes | Yes (to local) | No | Yes (asymptotic) |
| Uncertainty quality | Poor | Poor-Medium | Medium-Good | Excellent |
| Scalability (n) | Good | Excellent (SVI) | Moderate | Poor |
| Scalability (d) | O(d³) | O(d)-O(d³) | O(d³) | O(d²) |
| Marginal likelihood | Yes (approx) | Yes (lower bound) | Yes (approx) | Difficult |
Effective method selection begins with understanding your requirements across five dimensions:
1. Computational Budget
2. Accuracy Requirements
3. Problem Scale
4. Model Structure
5. Downstream Use
You may want exact inference on a large-scale problem with real-time latency—which is impossible. Recognizing these conflicts is the first step to making reasonable trade-offs. Document your compromises explicitly.
Based on the requirement dimensions, here's a systematic decision procedure:
Phase 1: Feasibility Filter
1. IF computational budget < 1 second:
→ Use Laplace OR pre-trained/amortized VI
STOP
2. IF n > 100K AND cannot subsample:
→ Use SVI (stochastic VI)
PROCEED to Phase 2 for family selection
3. IF d > 10K:
→ Use mean-field VI OR structured VI
PROCEED to Phase 2
4. IF exact inference required AND budget > 1h:
→ Use MCMC with extensive diagnostics
STOP
Phase 2: Method Selection
5. IF posterior expected to be approximately Gaussian:
5a. IF quick baseline needed:
→ Use Laplace
5b. IF model comparison needed:
→ Use Laplace (marginal likelihood)
5c. IF more flexibility needed:
→ Use full-covariance VI
6. IF non-Gaussian likelihood with GP:
→ Use EP (better calibration than Laplace)
7. IF latent variable model:
7a. IF conjugate:
→ Use coordinate ascent VI
7b. IF non-conjugate:
→ Use SVI with reparameterization
8. IF uncertainty calibration critical:
8a. IF budget allows:
→ Use MCMC
8b. IF budget limited:
→ Use EP > VI > Laplace
9. DEFAULT:
→ Start with mean-field VI
→ Upgrade to full-covariance if underperforming
→ Consider EP if calibration issues persist
Phase 3: Validation and Iteration
10. Run chosen method with:
- Multiple random initializations
- Convergence diagnostics appropriate to method
11. Validate approximation quality:
- Posterior predictive checks
- Calibration plots (reliability diagrams)
- Compare to MCMC on subset if feasible
12. IF validation fails:
- IF underestimating variance: try EP
- IF missing modes: try mixture VI or MCMC
- IF numerical issues: try damping, better parameterization
- IF still failing: invest in MCMC
13. Document method choice, alternatives considered,
and validation results
The structure of your model heavily influences which approximation methods are tractable and effective. Understanding these connections accelerates method selection.
| Model Type | Primary Recommendation | Alternatives |
|---|---|---|
| Linear Regression (Gaussian) | Laplace (exact for Gaussian!) | VI, MCMC |
| Logistic Regression | Laplace (log-concave) | VI, EP |
| Gaussian Process Regression | Exact (Gaussian posteriors) | Sparse GPs with VI |
| GP Classification | EP (best calibration) | Laplace, VI |
| Bayesian Neural Network | VI (last-layer or mean-field) | MC Dropout, SWAG |
| Mixture Model | EM + VI hybrid | MCMC with label switching care |
| Hierarchical Model | MCMC or structured VI | EP for some structures |
| State-Space Model | Kalman (linear) / EP (non-linear) | Particle filters |
| Deep Generative Model (VAE) | Amortized VI | Flow-based VI |
Exploiting Conditional Conjugacy:
Many models have partially conjugate structure where some variables have closed-form conditionals:
$$p(\theta_1, \theta_2 | D) \text{ where } p(\theta_1 | \theta_2, D) \text{ is tractable}$$
Strategies:
Exploiting Factorization:
If the posterior factors as: $$p(\theta_1, ..., \theta_K | D) = \prod_k p(\theta_k | D)$$
You can run independent inference on each block, parallelizing computation.
Sometimes a change of variables transforms a difficult inference problem into an easy one. Log-transforming positive parameters, centering data, or using non-centered parameterizations (for hierarchical models) can make posteriors more Gaussian-like, improving Laplace and VI performance dramatically.
Here's a concrete workflow for implementing approximate inference in practice:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
import numpy as npfrom typing import Dict, Any, Optionalfrom dataclasses import dataclassfrom enum import Enum class InferenceMethod(Enum): LAPLACE = "laplace" MEAN_FIELD_VI = "mean_field_vi" FULL_COV_VI = "full_cov_vi" EP = "expectation_propagation" MCMC = "mcmc" @dataclassclass InferenceRequirements: """Capture all requirements for method selection.""" max_seconds: float # Computational budget n_datapoints: int # Dataset size n_parameters: int # Number of parameters requires_marginal_lik: bool = False requires_calibrated_uncertainty: bool = False requires_exact_inference: bool = False is_gp_classification: bool = False is_latent_variable_model: bool = False def select_inference_method(req: InferenceRequirements) -> InferenceMethod: """ Apply decision algorithm to select inference method. """ # Phase 1: Feasibility filter if req.max_seconds < 1: return InferenceMethod.LAPLACE if req.n_datapoints > 100_000: return InferenceMethod.MEAN_FIELD_VI # Need SVI if req.n_parameters > 10_000: return InferenceMethod.MEAN_FIELD_VI if req.requires_exact_inference and req.max_seconds > 3600: return InferenceMethod.MCMC # Phase 2: Method selection if req.is_gp_classification: return InferenceMethod.EP if req.requires_calibrated_uncertainty: if req.max_seconds > 3600: return InferenceMethod.MCMC else: return InferenceMethod.EP if req.requires_marginal_lik: if req.n_parameters < 1000: return InferenceMethod.LAPLACE else: return InferenceMethod.MEAN_FIELD_VI if req.is_latent_variable_model: return InferenceMethod.MEAN_FIELD_VI # Default: Start simple if req.n_parameters < 100: return InferenceMethod.FULL_COV_VI else: return InferenceMethod.MEAN_FIELD_VI class InferenceResult: """Container for inference results with diagnostics.""" def __init__(self, method: InferenceMethod): self.method = method self.converged = False self.diagnostics: Dict[str, Any] = {} self.posterior_mean: Optional[np.ndarray] = None self.posterior_cov: Optional[np.ndarray] = None self.samples: Optional[np.ndarray] = None def validate(self, X_test, y_test, true_posterior=None): """ Validate inference quality. """ validation = {} # 1. Posterior predictive check if self.posterior_mean is not None: # Compute predictions and their uncertainty pass # Model-specific implementation # 2. Calibration check if self.samples is not None: # Compute coverage of credible intervals pass # 3. Comparison to ground truth if true_posterior is not None: # Compute KL divergence or other discrepancy pass return validation def run_inference_pipeline(model, data, requirements: InferenceRequirements): """ Complete inference pipeline with validation. """ # Step 1: Select method method = select_inference_method(requirements) print(f"Selected method: {method.value}") # Step 2: Run inference with multiple initializations results = [] for init_seed in range(3): # 3 random restarts np.random.seed(init_seed) result = run_method(model, data, method) results.append(result) # Step 3: Select best result (highest ELBO / lowest energy) best_result = select_best_result(results) # Step 4: Run diagnostics check_convergence(best_result, method) # Step 5: Validate (if test data available) # validation = best_result.validate(X_test, y_test) # Step 6: If validation fails, consider upgrading method # if not validation['passes']: # upgraded_method = upgrade_method(method) # return run_inference_pipeline(model, data, # requirements._replace(method_hint=upgraded_method)) return best_result def upgrade_method(current: InferenceMethod) -> InferenceMethod: """Suggest a more sophisticated method when current fails.""" upgrades = { InferenceMethod.LAPLACE: InferenceMethod.FULL_COV_VI, InferenceMethod.MEAN_FIELD_VI: InferenceMethod.FULL_COV_VI, InferenceMethod.FULL_COV_VI: InferenceMethod.EP, InferenceMethod.EP: InferenceMethod.MCMC, } return upgrades.get(current, InferenceMethod.MCMC)Even with a good method selection, implementation issues can derail inference. Here are common pitfalls and their solutions:
Debugging Decision Tree:
IF ELBO is not increasing:
→ Check gradient computation (finite differences test)
→ Reduce learning rate by 10x
→ Check for NaNs in parameters or gradients
→ Simplify model and re-run
IF ELBO increases but predictions are poor:
→ Approximation may be converged but biased
→ Compare to MCMC on subset
→ Try more flexible variational family
→ Consider model misspecification
IF different runs give very different ELBOs:
→ Multiple local optima exist
→ Run many random restarts
→ Consider if posterior is multimodal
→ Try annealing or better initialization
IF approximation looks good but predictions are overconfident:
→ Typical mean-field behavior
→ Add calibration layer (temperature scaling)
→ Use EP for better variance estimates
→ Validate uncertainty on held-out data
Let's apply our decision framework to realistic scenarios:
Scenario: Bayesian personalization model for product recommendations
Requirements:
Analysis:
Solution:
Method: Amortized Mean-Field Variational Inference
The field of approximate inference continues to evolve rapidly. Several emerging directions are shaping the future of Bayesian machine learning:
The Convergence of Methods:
Recent research increasingly blurs the line between deterministic and stochastic methods:
The future likely lies not in choosing "VI vs MCMC" but in seamlessly combining their strengths through hybrid methods tailored to specific problems.
Approximate inference is an active research area. What's "best practice" today may be superseded tomorrow. Keep an eye on NeurIPS, ICML, and AISTATS proceedings for new methods. But also remember: a well-validated simple method often beats a poorly-understood complex one.
Choosing the right approximation method is as important as understanding the methods themselves. This page has provided a systematic framework for analyzing requirements, applying a decision algorithm, and validating your choice.
Module Conclusion:
You've now completed a comprehensive exploration of deterministic approximate inference methods. From the elegant simplicity of Laplace approximation through the optimization perspective of variational inference to the moment-matching approach of expectation propagation, you understand the full landscape of tractable Bayesian inference.
More importantly, you can now choose wisely among these methods—understanding their trade-offs, recognizing their failure modes, and combining them with stochastic methods when appropriate. This knowledge transforms you from someone who knows how to do inference into someone who knows which inference to do.
Congratulations! You've mastered deterministic approximate inference—Laplace approximation, variational inference, and expectation propagation—along with practical guidance for method selection. You're now equipped to tackle Bayesian inference problems across the full spectrum of machine learning applications.