Loading content...
In probability theory and information theory, quantifying the difference between two probability distributions is a fundamental problem with widespread applications across machine learning, statistics, and signal processing. One of the most important measures for this purpose is the Kullback-Leibler (KL) Divergence.
The KL divergence, also known as relative entropy, measures how one probability distribution diverges from a reference distribution. It provides an asymmetric measure of the "information lost" when using one distribution to approximate another. For multivariate Gaussian distributions, this divergence can be computed analytically using a closed-form formula.
Problem Statement: Given two multivariate Gaussian (Normal) distributions P and Q, each characterized by their mean vectors (μₚ and μ_q) and covariance matrices (Σₚ and Σ_q), compute the KL divergence D_KL(P || Q).
Mathematical Formulation: For two d-dimensional multivariate Gaussians, the KL divergence is given by:
$$D_{KL}(P | Q) = \frac{1}{2} \left[ \log\frac{|\Sigma_q|}{|\Sigma_p|} - d + \text{tr}(\Sigma_q^{-1}\Sigma_p) + (\mu_q - \mu_p)^T \Sigma_q^{-1} (\mu_q - \mu_p) \right]$$
Where:
Key Properties:
Your Task: Implement a function that computes the KL divergence between two multivariate Gaussian distributions given their mean vectors and covariance matrices. The result should be rounded to 4 decimal places.
mu_p = [0.0, 0.0]
Cov_p = [[1.0, 0.0], [0.0, 1.0]]
mu_q = [1.0, 1.0]
Cov_q = [[1.0, 0.0], [0.0, 1.0]]1.0Both distributions have the same identity covariance matrix, but different means.
Step-by-step calculation:
Dimensionality: d = 2
Log determinant ratio: Since both covariance matrices are identity matrices: • |Σ_q| = |Σ_p| = 1.0 • log(|Σ_q|/|Σ_p|) = log(1) = 0
Trace term: tr(Σ_q⁻¹ Σ_p) = tr(I · I) = tr(I) = 2
Mahalanobis distance: The mean difference is [1.0, 1.0] • (μ_q - μ_p)ᵀ Σ_q⁻¹ (μ_q - μ_p) = [1, 1] · I · [1, 1]ᵀ = 1² + 1² = 2
Final computation: • D_KL = 0.5 × (0 - 2 + 2 + 2) = 0.5 × 2 = 1.0
The divergence of 1.0 reflects the distance between the two distribution centers.
mu_p = [2.0, 3.0]
Cov_p = [[2.0, 0.5], [0.5, 1.5]]
mu_q = [2.0, 3.0]
Cov_q = [[2.0, 0.5], [0.5, 1.5]]0.0Both distributions are identical - they have the same mean vectors and the same covariance matrices.
Verification using the formula:
Log determinant ratio: log(|Σ_q|/|Σ_p|) = log(1) = 0 (identical matrices)
Trace term: tr(Σ_q⁻¹ Σ_p) = tr(I) = 2 (since Σ_q⁻¹ Σ_p = I when Σ_q = Σ_p)
Mahalanobis distance: (μ_q - μ_p) = [0, 0], so the squared distance = 0
Final computation: • D_KL = 0.5 × (0 - 2 + 2 + 0) = 0.5 × 0 = 0.0
This confirms the fundamental property: KL divergence is zero if and only if the distributions are identical. When P and Q are the same distribution, no information is lost by using Q to represent P.
mu_p = [1.0, 2.0, 3.0]
Cov_p = [[2.0, 0.3, 0.1], [0.3, 1.5, 0.2], [0.1, 0.2, 1.0]]
mu_q = [0.0, 0.0, 0.0]
Cov_q = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]7.2304This example demonstrates a 3-dimensional case with more complex covariance structures.
Distribution P: Has mean [1, 2, 3] and a non-diagonal covariance matrix with correlations between dimensions.
Distribution Q: Is a standard multivariate normal (mean at origin, identity covariance).
Analysis:
Dimensionality: d = 3
Determinant calculation: • |Σ_p| ≈ 2.67 (computed from the 3×3 determinant) • |Σ_q| = 1.0 (identity matrix) • Log ratio contributes: log(1/2.67) ≈ -0.98
Trace contribution: tr(Σ_q⁻¹ Σ_p) = tr(Σ_p) = 2 + 1.5 + 1 = 4.5 (since Σ_q⁻¹ = I)
Mahalanobis distance: μ_p is far from μ_q, and with identity Σ_q: • Distance² = 1² + 2² + 3² = 14
Combined result: D_KL = 0.5 × (-0.98 - 3 + 4.5 + 14) ≈ 7.2304
The high divergence reflects both the mean displacement and the different covariance structures.
Constraints