Loading problem...
In probabilistic machine learning, inferring hidden (latent) variables from observed data is a central challenge. Given observed data x, we often want to compute the posterior distribution p(z|x), which tells us what hidden variables z are likely given what we've observed. However, computing this posterior exactly is typically intractable due to the normalization constant (evidence) in Bayes' theorem.
Variational Inference offers an elegant solution: instead of computing the exact posterior, we approximate it with a simpler, tractable distribution q(z) and optimize this approximation to be as close as possible to the true posterior. The Evidence Lower Bound (ELBO) serves as our optimization objective—a quantity that we maximize to find the best approximation.
The ELBO provides a lower bound on the log-evidence (marginal likelihood) log p(x) and is defined as:
$$\text{ELBO} = \mathbb{E}{q(z)}[\log p(x|z)] + \mathbb{E}{q(z)}[\log p(z)] - \mathbb{E}_{q(z)}[\log q(z)]$$
This can be rewritten as three interpretable components:
For this problem, all distributions are Gaussian (Normal):
The Gaussian probability density function is:
$$\mathcal{N}(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$
The entropy of a Gaussian is:
$$H[q] = \frac{1}{2}\log(2\pi e \sigma_q^2)$$
Since expectations over q(z) don't always have closed-form solutions in more complex models, we estimate them using Monte Carlo sampling:
Implement a function that computes the ELBO given:
Return the estimated ELBO value rounded to 2 decimal places.
Important: Use a fixed random seed of 42 for reproducibility when sampling from q(z).
x = [1.0]
q_mean = 0.5
q_std = 0.7
prior_mean = 0.0
prior_std = 1.0
likelihood_std = 1.0
n_samples = 10000-1.52The ELBO is computed as the sum of three components:
1. Expected Log-Likelihood E_q[log p(x|z)]: For each sample z drawn from q(z) = N(0.5, 0.7²), we compute the log-probability of observing x=1.0 under the likelihood p(x|z) = N(z, 1.0²). Averaging over 10,000 samples gives approximately -1.29.
2. Expected Log-Prior E_q[log p(z)]: For each sample z from q, we compute log p(z) where p(z) = N(0.0, 1.0²). The average is approximately -1.29.
3. Entropy H[q]: For a Gaussian, H[q] = 0.5 × log(2πe × σ²) = 0.5 × log(2π × 2.718 × 0.49) ≈ 1.06.
ELBO = -1.29 + (-1.29) + 1.06 ≈ -1.52
x = [0.0]
q_mean = 0.0
q_std = 1.0
prior_mean = 0.0
prior_std = 1.0
likelihood_std = 1.0
n_samples = 10000-1.43In this case, the approximate posterior q(z) exactly matches the prior p(z), both being N(0, 1).
1. Expected Log-Likelihood: For x=0.0 and z samples from N(0, 1), the expected log-likelihood under N(z, 1²) is approximately -1.42.
2. Expected Log-Prior: Since q = p, the samples from q have expected log-prior of approximately -1.42.
3. Entropy: H[N(0, 1)] = 0.5 × log(2πe) ≈ 1.42.
ELBO = -1.42 + (-1.42) + 1.42 ≈ -1.43
This represents a baseline scenario where the variational approximation matches the prior.
x = [1.0, 2.0, 3.0]
q_mean = 2.0
q_std = 0.5
prior_mean = 0.0
prior_std = 2.0
likelihood_std = 1.0
n_samples = 10000-5.55With multiple observations, the ELBO accounts for all data points:
1. Expected Log-Likelihood: Now we sum log-likelihoods for x₁=1.0, x₂=2.0, and x₃=3.0. Each observation contributes to the total expected log-likelihood. With q_mean=2.0, the approximation is centered near the data mean, giving a reasonable expected log-likelihood of approximately -3.24.
2. Expected Log-Prior: With a wider prior (σ=2.0), the penalty for q_mean=2.0 being away from prior_mean=0.0 is reduced. The expected log-prior is approximately -1.73.
3. Entropy: H[N(2, 0.5²)] = 0.5 × log(2πe × 0.25) ≈ 0.42 (lower entropy due to more concentrated approximation).
ELBO = -3.24 + (-1.73) + 0.42 ≈ -5.55
The ELBO balances fitting the data well (high likelihood) while staying consistent with the prior and maintaining some uncertainty (entropy).
Constraints