Loading content...
The Gaussian Kernel, also referred to as the Radial Basis Function (RBF) Kernel, is one of the most powerful and widely-used kernels in machine learning. It serves as a fundamental building block for Support Vector Machines (SVMs), Gaussian Processes, and various kernel-based learning algorithms.
At its core, the Gaussian kernel provides a measure of similarity between two data points based on how close they are in the feature space. Unlike simple distance metrics that grow unboundedly, the Gaussian kernel produces values that are elegantly bounded between 0 and 1:
For two data points x and y, the Gaussian kernel is defined as:
$$K(x, y) = \exp\left(-\gamma \cdot |x - y|^2\right)$$
Where:
The gamma parameter critically influences the kernel's behavior:
Large gamma (e.g., γ = 10): Creates a highly localized kernel. Only points that are very close together have significant similarity. This can lead to more complex, wiggly decision boundaries that closely follow the training data.
Small gamma (e.g., γ = 0.01): Creates a broad, smooth kernel. Points that are relatively far apart still maintain meaningful similarity. This produces smoother, more generalized decision boundaries.
Given two sets of samples X1 (containing n1 samples) and X2 (containing n2 samples), each with d features, the kernel matrix K has shape (n1 × n2). Each element K[i, j] represents the Gaussian kernel similarity between the i-th sample from X1 and the j-th sample from X2.
Implement a function that computes the Gaussian kernel similarity matrix between two sets of multi-dimensional samples. Your implementation should:
Efficiency Consideration: For optimal performance, leverage vectorized matrix operations instead of explicit nested loops. NumPy's broadcasting capabilities are particularly well-suited for this computation.
X1 = [[0.0, 0.0], [1.0, 1.0]]
X2 = [[0.0, 0.0], [2.0, 2.0]]
gamma = 0.5[[1.0, 0.0183], [0.3679, 0.3679]]The kernel matrix K has shape (2, 2) since X1 has 2 samples and X2 has 2 samples.
Computing K[0, 0]: Distance between [0, 0] and [0, 0] • Squared distance = (0-0)² + (0-0)² = 0 • K = exp(-0.5 × 0) = exp(0) = 1.0
Computing K[0, 1]: Distance between [0, 0] and [2, 2] • Squared distance = (0-2)² + (0-2)² = 4 + 4 = 8 • K = exp(-0.5 × 8) = exp(-4) = 0.0183
Computing K[1, 0]: Distance between [1, 1] and [0, 0] • Squared distance = (1-0)² + (1-0)² = 1 + 1 = 2 • K = exp(-0.5 × 2) = exp(-1) = 0.3679
Computing K[1, 1]: Distance between [1, 1] and [2, 2] • Squared distance = (1-2)² + (1-2)² = 1 + 1 = 2 • K = exp(-0.5 × 2) = exp(-1) = 0.3679
The identical samples at the origin produce perfect similarity (1.0), while more distant pairs produce smaller similarities.
X1 = [[1.0, 2.0]]
X2 = [[1.0, 2.0], [3.0, 4.0]]
gamma = 1.0[[1.0, 0.0003]]The kernel matrix K has shape (1, 2) since X1 has 1 sample and X2 has 2 samples.
Computing K[0, 0]: Distance between [1, 2] and [1, 2] • Squared distance = (1-1)² + (2-2)² = 0 + 0 = 0 • K = exp(-1.0 × 0) = exp(0) = 1.0
Computing K[0, 1]: Distance between [1, 2] and [3, 4] • Squared distance = (1-3)² + (2-4)² = 4 + 4 = 8 • K = exp(-1.0 × 8) = exp(-8) = 0.0003
With gamma = 1.0 (higher than the previous example), the kernel is more sensitive to distance. Points [1.0, 2.0] and [3.0, 4.0] are far apart in feature space, resulting in nearly zero similarity (0.0003).
X1 = [[0.0, 0.0], [1.0, 0.0], [0.0, 1.0]]
X2 = [[0.0, 0.0], [1.0, 0.0], [0.0, 1.0]]
gamma = 1.0[[1.0, 0.3679, 0.3679], [0.3679, 1.0, 0.1353], [0.3679, 0.1353, 1.0]]This example computes the kernel matrix for the same set of points (X1 = X2), producing a symmetric similarity matrix.
Diagonal elements (same points): • K[0,0], K[1,1], K[2,2]: Distance = 0, so K = exp(0) = 1.0
Off-diagonal elements: • K[0,1] = K[1,0]: Distance between [0,0] and [1,0]
• K[0,2] = K[2,0]: Distance between [0,0] and [0,1]
• K[1,2] = K[2,1]: Distance between [1,0] and [0,1]
This symmetric structure demonstrates that when computing K(X, X), the result is always a symmetric positive semi-definite matrix—a crucial property for kernel methods.
Constraints