Loading content...
Centering is often treated as a trivial preprocessing step in standard PCA: subtract the mean from each feature, and proceed. In Kernel PCA, centering becomes far more subtle. We cannot compute the mean in feature space explicitly—we don't have coordinates in that space. Yet centering is absolutely essential: without it, the principal components we compute would be directions from the origin of feature space rather than from the centroid of the data.
Why does this matter? Because the origin in feature space is an arbitrary reference point with no intrinsic meaning, while the centroid represents the "center" of the data distribution. Principal components are fundamentally about directions of maximum variance—deviation from the mean—not about directions maximizing distance from the origin.
This page explores centering in depth. We'll develop geometric intuition for what centering accomplishes, derive the double-centering formula carefully, understand what goes wrong without proper centering, and learn diagnostic techniques to verify correct implementation. Mastering these details is essential for robust Kernel PCA applications.
By the end of this page, you will understand geometrically what centering means in feature space, why the centroid cannot be computed explicitly, how double centering handles this implicitly, what failure modes arise without centering, and how to diagnose centering issues in practice.
Before examining feature space centering, let's establish precisely why centering is essential for PCA in general.
PCA and the Covariance Matrix
PCA finds eigenvectors of the covariance matrix: $$\mathbf{C} = \frac{1}{n}\sum_{i=1}^{n}(\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T$$
This is fundamentally a second-moment matrix about the mean. If we instead used the matrix: $$\mathbf{M} = \frac{1}{n}\sum_{i=1}^{n}\mathbf{x}_i\mathbf{x}_i^T$$
(second moment about the origin), we would get different results—potentially very different.
The Geometric Interpretation
Consider a 2D dataset: points clustered at $(100, 0)$ with small variance in both directions. The true principal components should capture the directions of variance within the cluster (small scale). But if we compute moments about the origin:
Without centering, PCA finds directions maximizing the second moment about zero—a mixture of data location and data spread. This is almost never what we want. The effect is most severe when data is far from the origin, and the first 'principal component' simply points at the data centroid.
Mathematical Relationship
The second moment matrix $\mathbf{M}$ relates to the covariance matrix $\mathbf{C}$ by: $$\mathbf{M} = \mathbf{C} + \bar{\mathbf{x}}\bar{\mathbf{x}}^T$$
The term $\bar{\mathbf{x}}\bar{\mathbf{x}}^T$ is a rank-1 matrix pointing from the origin to the mean. If $|\bar{\mathbf{x}}|$ is large compared to the eigenvalues of $\mathbf{C}$, this rank-1 term dominates and the first eigenvector of $\mathbf{M}$ is approximately $\bar{\mathbf{x}}/|\bar{\mathbf{x}}|$.
In Feature Space, the Problem is Worse
In the feature space $\mathcal{F}$ induced by a kernel, the origin has even less meaning. For the RBF kernel, the feature space is infinite-dimensional, and the origin is just a mathematical abstraction. The data will almost certainly not be centered at this origin. Without explicit centering in feature space, Kernel PCA degrades into a method that mixes location effects with variance effects.
| Scenario | With Centering | Without Centering |
|---|---|---|
| Data centered at origin | Correct variance directions | Correct (identical results) |
| Data at $(100, 0)$, small variance | Finds small-scale variance directions | PC1 ≈ (1, 0) toward data location |
| Clustered data, off-center | Captures cluster structure | Dominated by mean location |
| Feature space (RBF kernel) | Proper nonlinear structure | Corrupted by arbitrary origin |
| Multiple clusters, all off-center | Captures inter/intra-cluster variance | PC1 points at grand mean |
The mean (centroid) in feature space is: $$\bar{\phi} = \frac{1}{n}\sum_{i=1}^{n}\phi(\mathbf{x}_i)$$
This is a well-defined element of $\mathcal{F}$—it's the average of all mapped training points. But there's a fundamental problem: we can't compute or store this vector explicitly when $\mathcal{F}$ is high or infinite-dimensional.
The Implicit Representation
Although we cannot represent $\bar{\phi}$ as an explicit vector of coordinates, we can work with it implicitly through inner products. Any quantity involving $\bar{\phi}$ can be rewritten in terms of kernel evaluations:
$$\langle \bar{\phi}, \phi(\mathbf{x}) \rangle = \left\langle \frac{1}{n}\sum_j \phi(\mathbf{x}_j), \phi(\mathbf{x}) \right\rangle = \frac{1}{n}\sum_j \langle \phi(\mathbf{x}_j), \phi(\mathbf{x}) \rangle = \frac{1}{n}\sum_j k(\mathbf{x}_j, \mathbf{x})$$
Similarly: $$|\bar{\phi}|^2 = \langle \bar{\phi}, \bar{\phi} \rangle = \frac{1}{n^2}\sum_i \sum_j k(\mathbf{x}_i, \mathbf{x}_j)$$
Every computation involving the centroid reduces to sums of kernel values.
We never need explicit coordinates of $\bar{\phi}$. We only need inner products involving $\bar{\phi}$, and these can always be computed from the kernel matrix. This is the essence of the kernel trick applied to centering.
Geometric Picture
Imagine the feature space $\mathcal{F}$ as a high-dimensional vector space. The mapped training points ${\phi(\mathbf{x}_1), \ldots, \phi(\mathbf{x}_n)}$ form a cloud in this space. The centroid $\bar{\phi}$ is the center of mass of this cloud.
Centering means computing: $$\tilde{\phi}(\mathbf{x}_i) = \phi(\mathbf{x}_i) - \bar{\phi}$$
Geometrically, this translates the entire point cloud so that its center of mass is at the origin of $\mathcal{F}$. After centering:
What We're Really Computing
The centered point $\tilde{\phi}(\mathbf{x}_i)$ is the "residual" after subtracting the average. It captures how $\phi(\mathbf{x}_i)$ differs from the typical mapped point. PCA on these residuals finds the principal directions of variation—exactly what we want.
For Test Points
When projecting a new point $\mathbf{x}^$, we compute: $$\tilde{\phi}(\mathbf{x}^) = \phi(\mathbf{x}^*) - \bar{\phi}$$
Critically, we subtract the training centroid $\bar{\phi}$, not a new centroid involving the test point. The training set defines the coordinate system; test points are transformed using that same system.
We need to compute the centered kernel matrix: $$\tilde{K}_{ij} = \langle \tilde{\phi}(\mathbf{x}_i), \tilde{\phi}(\mathbf{x}_j) \rangle = \langle \phi(\mathbf{x}_i) - \bar{\phi}, \phi(\mathbf{x}_j) - \bar{\phi} \rangle$$
Let's expand this inner product carefully:
$$\tilde{K}_{ij} = \langle \phi(\mathbf{x}_i), \phi(\mathbf{x}_j) \rangle - \langle \phi(\mathbf{x}_i), \bar{\phi} \rangle - \langle \bar{\phi}, \phi(\mathbf{x}_j) \rangle + \langle \bar{\phi}, \bar{\phi} \rangle$$
Term 1: $\langle \phi(\mathbf{x}_i), \phi(\mathbf{x}j) \rangle = K{ij}$
Term 2: $\langle \phi(\mathbf{x}_i), \bar{\phi} \rangle = \frac{1}{n}\sum_m \langle \phi(\mathbf{x}_i), \phi(\mathbf{x}m) \rangle = \frac{1}{n}\sum_m K{im}$
This is the mean of the $i$-th row of $\mathbf{K}$.
Term 3: $\langle \bar{\phi}, \phi(\mathbf{x}j) \rangle = \frac{1}{n}\sum_m K{mj}$
This is the mean of the $j$-th column of $\mathbf{K}$.
Term 4: $\langle \bar{\phi}, \bar{\phi} \rangle = \frac{1}{n^2}\sum_m \sum_\ell K_{m\ell}$
This is the grand mean of $\mathbf{K}$.
$$\tilde{K}{ij} = K{ij} - \frac{1}{n}\sum_m K_{im} - \frac{1}{n}\sum_m K_{mj} + \frac{1}{n^2}\sum_m \sum_\ell K_{m\ell}$$
This subtracts row means, subtracts column means, and adds back the grand mean (to avoid double-counting the centralization).
Matrix Formulation
Define the centering matrix: $$\mathbf{H} = \mathbf{I}_n - \frac{1}{n}\mathbf{1}_n\mathbf{1}_n^T$$
where $\mathbf{1}_n$ is the $n$-vector of ones. The matrix $\mathbf{H}$ is:
The centered kernel matrix is: $$\tilde{\mathbf{K}} = \mathbf{H}\mathbf{K}\mathbf{H}$$
Verification:
Why "Double" Centering?
We multiply by $\mathbf{H}$ on both sides—hence "double" centering. In standard PCA, we only center rows (subtract mean from each sample). Here, centering the kernel matrix requires centering both rows and columns because the kernel matrix contains inner products, not raw features.
Connection to Classical MDS
Double centering also appears in classical Multidimensional Scaling (MDS). If $\mathbf{D}$ is a squared distance matrix and we apply $-\frac{1}{2}\mathbf{H}\mathbf{D}\mathbf{H}$, we get a matrix whose eigendecomposition gives coordinates in Euclidean space. This connection is deep: kernel matrices and distance matrices are closely related, and centering plays analogous roles in both.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
import numpy as np def center_kernel_matrix(K: np.ndarray) -> np.ndarray: """Center kernel matrix using double centering.""" n = K.shape[0] # Compute means row_means = K.mean(axis=1, keepdims=True) col_means = K.mean(axis=0, keepdims=True) grand_mean = K.mean() # Apply centering formula K_centered = K - row_means - col_means + grand_mean return K_centered def center_kernel_matrix_explicit(K: np.ndarray) -> np.ndarray: """Center using explicit H @ K @ H formula.""" n = K.shape[0] H = np.eye(n) - np.ones((n, n)) / n return H @ K @ H def verify_centering_properties(K_centered: np.ndarray, tol: float = 1e-10): """Verify that the centered matrix has expected properties.""" n = K_centered.shape[0] # Row sums should be zero row_sums = K_centered.sum(axis=1) row_sum_check = np.abs(row_sums).max() < tol # Column sums should be zero col_sums = K_centered.sum(axis=0) col_sum_check = np.abs(col_sums).max() < tol # Matrix should be symmetric symmetry_check = np.allclose(K_centered, K_centered.T, atol=tol) # Matrix should be positive semi-definite (all eigenvalues >= 0) eigenvalues = np.linalg.eigvalsh(K_centered) psd_check = eigenvalues.min() >= -tol print(f"Row sums zero: {row_sum_check}") print(f"Column sums zero: {col_sum_check}") print(f"Symmetric: {symmetry_check}") print(f"PSD: {psd_check} (min eigenvalue: {eigenvalues.min():.2e})") return all([row_sum_check, col_sum_check, symmetry_check, psd_check]) # Example with RBF kernelnp.random.seed(42)n = 100X = np.random.randn(n, 5) # Compute RBF kernelgamma = 0.5dist_sq = np.sum((X[:, None, :] - X[None, :, :]) ** 2, axis=2)K = np.exp(-gamma * dist_sq) print("Original kernel matrix:")print(f" Shape: {K.shape}")print(f" Row sums range: [{K.sum(axis=1).min():.2f}, {K.sum(axis=1).max():.2f}]")print() K_centered = center_kernel_matrix(K) print("Centered kernel matrix:")verify_centering_properties(K_centered)print() # Verify both methods give same resultK_centered_explicit = center_kernel_matrix_explicit(K)print(f"Methods equivalent: {np.allclose(K_centered, K_centered_explicit)}")Omitting the centering step is a common implementation error. Let's understand precisely what happens and why it's problematic.
The Uncentered Eigenvalue Problem
Without centering, we solve: $$\mathbf{K}\boldsymbol{\alpha} = \mu\boldsymbol{\alpha}$$
rather than: $$\tilde{\mathbf{K}}\boldsymbol{\alpha} = \mu\boldsymbol{\alpha}$$
The eigenvectors we find correspond to principal components in a space where the data is not centered—meaning they're not true principal components at all.
Mathematical Analysis
Recall the relationship between centered and uncentered second moments: $$\mathbf{C}{\mathcal{F}} = \mathbf{M}{\mathcal{F}} - \bar{\phi}\bar{\phi}^T$$
where $\mathbf{M}_{\mathcal{F}} = \frac{1}{n}\sum_i \phi(\mathbf{x}_i)\phi(\mathbf{x}_i)^T$ is the uncentered second moment.
The uncentered kernel matrix $\mathbf{K}$ relates to uncentered operations. Its first eigenvector is influenced by $\bar{\phi}$—the component captures variance plus the "mass" of the centroid, not pure variance.
Severity Depends on Data Location
The effect is most severe when:
With the RBF kernel, data is always far from the origin in feature space (the origin has zero magnitude in the RKHS), so centering is always necessary.
If your first kernel principal component appears to capture little useful structure—representing a "global offset" rather than meaningful variation—suspect a centering problem. The first component may simply point at the centroid in feature space rather than capturing variance.
A Concrete Example
Consider data forming two clusters in 2D, both far from the origin:
With an RBF kernel and proper centering:
Without centering:
Partial Centering Errors
Another failure mode: centering in input space but not in feature space. If you subtract $\bar{\mathbf{x}}$ from each $\mathbf{x}_i$ and then apply the kernel, you have not centered in feature space. The kernel is nonlinear: $$\phi(\mathbf{x} - \bar{\mathbf{x}}) \neq \phi(\mathbf{x}) - \phi(\bar{\mathbf{x}})$$
The mean in feature space is $\bar{\phi} = \frac{1}{n}\sum_i \phi(\mathbf{x}_i)$, which is generally not equal to $\phi(\bar{\mathbf{x}})$.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
import numpy as npimport matplotlib.pyplot as plt def rbf_kernel(X, Y, gamma=0.1): """RBF kernel function.""" X_sq = np.sum(X**2, axis=1, keepdims=True) Y_sq = np.sum(Y**2, axis=1, keepdims=True) dist_sq = X_sq + Y_sq.T - 2 * X @ Y.T return np.exp(-gamma * dist_sq) def kpca_with_centering(K, n_components): """Kernel PCA with proper centering.""" # Center the kernel matrix n = K.shape[0] row_means = K.mean(axis=1, keepdims=True) col_means = K.mean(axis=0, keepdims=True) grand_mean = K.mean() K_centered = K - row_means - col_means + grand_mean # Eigendecomposition eigenvalues, eigenvectors = np.linalg.eigh(K_centered) idx = np.argsort(eigenvalues)[::-1] eigenvalues = eigenvalues[idx][:n_components] eigenvectors = eigenvectors[:, idx][:, :n_components] # Normalize and project valid = eigenvalues > 1e-10 alphas = eigenvectors[:, valid] / np.sqrt(eigenvalues[valid]) Z = K_centered @ alphas return Z, eigenvalues def kpca_without_centering(K, n_components): """INCORRECT: Kernel PCA without centering.""" # Skip centering (this is wrong!) eigenvalues, eigenvectors = np.linalg.eigh(K) idx = np.argsort(eigenvalues)[::-1] eigenvalues = eigenvalues[idx][:n_components] eigenvectors = eigenvectors[:, idx][:, :n_components] valid = eigenvalues > 1e-10 alphas = eigenvectors[:, valid] / np.sqrt(eigenvalues[valid]) Z = K @ alphas return Z, eigenvalues # Create two circles FAR from origin to highlight centering importancenp.random.seed(42)n_per_class = 100theta = np.linspace(0, 2*np.pi, n_per_class) # Inner circle at (100, 100)r1 = 1.0X1 = np.column_stack([100 + r1*np.cos(theta) + 0.1*np.random.randn(n_per_class), 100 + r1*np.sin(theta) + 0.1*np.random.randn(n_per_class)]) # Outer circle at (100, 100) r2 = 2.0X2 = np.column_stack([100 + r2*np.cos(theta) + 0.1*np.random.randn(n_per_class), 100 + r2*np.sin(theta) + 0.1*np.random.randn(n_per_class)]) X = np.vstack([X1, X2])y = np.array([0]*n_per_class + [1]*n_per_class) # Compute kernelK = rbf_kernel(X, X, gamma=0.5) # Compare with/without centeringZ_centered, eig_centered = kpca_with_centering(K, 2)Z_uncentered, eig_uncentered = kpca_without_centering(K, 2) print("Top 5 eigenvalues with centering:", eig_centered[:5].round(2))print("Top 5 eigenvalues without centering:", eig_uncentered[:5].round(2)) # The difference is dramatic - plot would show proper separation # only with centeringProjecting new test points requires careful centering using the training statistics. This distinction is crucial and a common source of bugs.
The Problem Setup
We have:
We need to compute: $$z_k(\mathbf{x}^) = \langle \mathbf{v}_k, \tilde{\phi}(\mathbf{x}^) \rangle = \sum_i \alpha_i^{(k)} \langle \tilde{\phi}(\mathbf{x}_i), \tilde{\phi}(\mathbf{x}^*) \rangle$$
where $\tilde{\phi}(\mathbf{x}^) = \phi(\mathbf{x}^) - \bar{\phi}_{\text{train}}$.
The Centered Test Kernel Formula
For the test point, we need: $$\tilde{k}_i^* = \langle \tilde{\phi}(\mathbf{x}_i), \tilde{\phi}(\mathbf{x}^*) \rangle$$
Expanding: $$= \langle \phi(\mathbf{x}_i) - \bar{\phi}, \phi(\mathbf{x}^) - \bar{\phi} \rangle$$ $$= k(\mathbf{x}_i, \mathbf{x}^) - \frac{1}{n}\sum_m k(\mathbf{x}_m, \mathbf{x}^*) - \frac{1}{n}\sum_m k(\mathbf{x}_i, \mathbf{x}m) + \frac{1}{n^2}\sum_m\sum\ell k(\mathbf{x}m, \mathbf{x}\ell)$$
Let $\mathbf{k}^* = [k(\mathbf{x}_1, \mathbf{x}^), \ldots, k(\mathbf{x}_n, \mathbf{x}^)]^T$. The centered version is:
$$\tilde{\mathbf{k}}^* = \mathbf{k}^* - \mathbf{K}\mathbf{1}/n - (\mathbf{1}^T\mathbf{k}^*/n)\mathbf{1} + (\mathbf{1}^T\mathbf{K}\mathbf{1}/n^2)\mathbf{1}$$
Interpreting Each Term
What Must Be Stored
To project test points, we must store during training:
Common Errors
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
import numpy as np class CenteringStatistics: """Stores centering statistics from training for use at test time.""" def __init__(self, K_train: np.ndarray): """ Compute and store centering statistics from training kernel matrix. Parameters: K_train: Training kernel matrix of shape (n, n) """ self.n_train = K_train.shape[0] # Row means (same as column means since K is symmetric) self.row_means = K_train.mean(axis=0) # Shape: (n,) # Grand mean self.grand_mean = K_train.mean() def center_test_kernel(self, K_test: np.ndarray) -> np.ndarray: """ Center kernel values for test points using training statistics. Parameters: K_test: Kernel matrix between test and training points Shape: (n_test, n_train) Returns: K_test_centered: Centered kernel matrix, shape (n_test, n_train) """ n_test = K_test.shape[0] # Mean of test-train kernel values for each test point test_means = K_test.mean(axis=1, keepdims=True) # (n_test, 1) # Apply centering formula K_test_centered = (K_test - self.row_means.reshape(1, -1) # Subtract training row means - test_means # Subtract test row means + self.grand_mean) # Add back grand mean return K_test_centered def verify_test_centering(X_train: np.ndarray, X_test: np.ndarray, kernel: callable): """ Verify test centering by comparing implicit and explicit centering. """ # Training kernel and centering K_train = kernel(X_train, X_train) stats = CenteringStatistics(K_train) # Test-train kernel K_test = kernel(X_test, X_train) # (n_test, n_train) K_test_centered = stats.center_test_kernel(K_test) # Verify: after centering, projections should be consistent # The rows of K_test_centered should have mean close to # the corresponding terms from the exact centering formula print("Test centering verification:") print(f" K_test shape: {K_test.shape}") print(f" K_test_centered row sums: min={K_test_centered.sum(axis=1).min():.4f}, " f"max={K_test_centered.sum(axis=1).max():.4f}") # These should NOT be exactly zero (unlike training matrix centering) # because test points are not included in defining the centering return K_test_centered # Examplenp.random.seed(42)n_train, n_test = 100, 20X_train = np.random.randn(n_train, 5)X_test = np.random.randn(n_test, 5) def rbf_kernel(X, Y, gamma=0.1): X_sq = np.sum(X**2, axis=1, keepdims=True) Y_sq = np.sum(Y**2, axis=1, keepdims=True) dist_sq = X_sq + Y_sq.T - 2 * X @ Y.T return np.exp(-gamma * dist_sq) K_centered = verify_test_centering(X_train, X_test, rbf_kernel)How can you verify that centering is implemented correctly? Several diagnostic checks can reveal problems.
Check 1: Row and Column Sums
For a properly centered training kernel matrix $\tilde{\mathbf{K}}$: $$\sum_j \tilde{K}{ij} = 0 \quad \text{for all } i$$ $$\sum_i \tilde{K}{ij} = 0 \quad \text{for all } j$$
Row and column sums should be zero (up to numerical precision). If they're not, centering is wrong.
Check 2: Symmetry Preservation
The centered kernel matrix should remain symmetric: $$\tilde{\mathbf{K}} = \tilde{\mathbf{K}}^T$$
Some centering implementations can accidentally break symmetry through numerical errors.
Check 3: Positive Semi-Definiteness
The centered kernel matrix should be positive semi-definite (all eigenvalues ≥ 0). Negative eigenvalues beyond numerical precision indicate problems.
Check 4: Eigenvalue Spectrum
The top eigenvalue of an uncentered kernel matrix is often much larger than subsequent eigenvalues (it captures the "constant" direction toward the centroid). After centering, eigenvalue decay should reflect genuine variance structure.
If your first kernel principal component seems to capture a 'constant offset' rather than meaningful variation—with all samples having similar projections of the same sign—suspect a centering issue. Properly centered KPCA components have zero mean projection on training data.
Check 5: Zero-Mean Projections
After projecting the training data onto kernel principal components, the projections should have zero mean: $$\frac{1}{n}\sum_i z_k^{(i)} = 0 \quad \text{for all } k$$
This follows because the eigenvectors represent variance directions from the centered mean.
Check 6: Agreement with Linear PCA (Linear Kernel)
For the linear kernel $k(\mathbf{x}, \mathbf{y}) = \mathbf{x}^T\mathbf{y}$, Kernel PCA should reproduce standard PCA. Compare results with scikit-learn's PCA or your own linear PCA implementation. Any discrepancy indicates an implementation bug.
Check 7: Consistency Across Reformulations
The element-wise formula and matrix formula ($\tilde{\mathbf{K}} = \mathbf{H}\mathbf{K}\mathbf{H}$) should give identical results. If they differ, one implementation has a bug.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
import numpy as npfrom sklearn.decomposition import PCA def run_centering_diagnostics(K_centered: np.ndarray, K_original: np.ndarray, X: np.ndarray = None) -> dict: """ Run comprehensive diagnostics on centered kernel matrix. Parameters: K_centered: Centered kernel matrix K_original: Original uncentered kernel matrix X: Original data (optional, for linear kernel comparison) Returns: Dictionary of diagnostic results """ results = {} tol = 1e-10 # Check 1: Row sums row_sums = K_centered.sum(axis=1) results['row_sums_zero'] = np.abs(row_sums).max() < tol results['max_row_sum'] = np.abs(row_sums).max() # Check 2: Column sums col_sums = K_centered.sum(axis=0) results['col_sums_zero'] = np.abs(col_sums).max() < tol results['max_col_sum'] = np.abs(col_sums).max() # Check 3: Symmetry results['symmetric'] = np.allclose(K_centered, K_centered.T, atol=tol) results['max_asymmetry'] = np.abs(K_centered - K_centered.T).max() # Check 4: Positive semi-definiteness eigenvalues = np.linalg.eigvalsh(K_centered) results['psd'] = eigenvalues.min() >= -tol results['min_eigenvalue'] = eigenvalues.min() results['negative_eigenvalues'] = (eigenvalues < -tol).sum() # Check 5: Eigenvalue spectrum comparison eig_orig = np.linalg.eigvalsh(K_original) results['top_eigenvalue_ratio'] = (np.max(eig_orig) / np.max(eigenvalues) if np.max(eigenvalues) > 0 else np.inf) # Check 6: Zero-mean projections (on training data) valid = eigenvalues > tol n = K_centered.shape[0] eigenvectors = np.linalg.eigh(K_centered)[1] idx = np.argsort(eigenvalues)[::-1] eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] # Project using first few components n_components = min(5, (eigenvalues > tol).sum()) if n_components > 0: alphas = eigenvectors[:, :n_components] / np.sqrt(eigenvalues[:n_components]) Z = K_centered @ alphas projection_means = Z.mean(axis=0) results['projections_zero_mean'] = np.abs(projection_means).max() < tol results['max_projection_mean'] = np.abs(projection_means).max() return results def compare_with_linear_pca(X: np.ndarray, n_components: int = 5): """ Compare Kernel PCA (linear kernel) with standard PCA. Should give identical results up to sign. """ n = X.shape[0] # Standard PCA pca = PCA(n_components=n_components) Z_pca = pca.fit_transform(X) # Kernel PCA with linear kernel K = X @ X.T # Linear kernel # Center the kernel row_means = K.mean(axis=1, keepdims=True) col_means = K.mean(axis=0, keepdims=True) grand_mean = K.mean() K_centered = K - row_means - col_means + grand_mean # Eigendecomposition eigenvalues, eigenvectors = np.linalg.eigh(K_centered) idx = np.argsort(eigenvalues)[::-1] eigenvalues = eigenvalues[idx][:n_components] eigenvectors = eigenvectors[:, idx][:, :n_components] # Normalize and project alphas = eigenvectors / np.sqrt(np.maximum(eigenvalues, 1e-10)) Z_kpca = K_centered @ alphas # Compare (accounts for sign ambiguity) correlations = [] for k in range(n_components): corr = np.abs(np.corrcoef(Z_pca[:, k], Z_kpca[:, k])[0, 1]) correlations.append(corr) print("Correlation between PCA and KPCA (linear kernel) components:") for k, corr in enumerate(correlations): status = "✓" if corr > 0.99 else "✗" print(f" Component {k+1}: {corr:.6f} {status}") return np.mean(correlations) > 0.99 # Run diagnosticsnp.random.seed(42)n = 100X = np.random.randn(n, 10)K = X @ X.T # Linear kernel # Centerrow_means = K.mean(axis=1, keepdims=True)col_means = K.mean(axis=0, keepdims=True)grand_mean = K.mean()K_centered = K - row_means - col_means + grand_mean print("Centering Diagnostics:")results = run_centering_diagnostics(K_centered, K)for key, value in results.items(): print(f" {key}: {value}") print()compare_with_linear_pca(X)Centering is a crucial step in Kernel PCA that is easy to get wrong. We've developed a deep understanding of what centering means, why it's necessary, how to implement it correctly, and how to verify correct implementation.
You now have a thorough understanding of centering in Kernel PCA—both the mathematical foundation and practical implementation. The next page tackles the pre-image problem: given a point in the reduced kernel PCA space, how do we find a corresponding point in the original input space?
What's Next:
The next page addresses the pre-image problem—one of the most challenging aspects of Kernel PCA. Given a projection in the kernel principal component space, can we find a point in the original input space that projects to that location? This problem has no closed-form solution for nonlinear kernels, requiring iterative approximation methods. Understanding the pre-image problem is essential for applications like denoising and reconstruction.