Loading content...
Color space augmentation is a powerful technique in computer vision that introduces realistic variations in image color distributions during model training. Unlike simple brightness or contrast adjustments, PCA-based color perturbation captures the natural correlations between color channels in real-world images, producing augmentations that preserve photorealistic appearance while increasing dataset diversity.
Natural images exhibit strong correlations between RGB color channels. For instance, shadows tend to affect all three channels similarly, and lighting conditions create predictable shifts across the color spectrum. Principal Component Analysis (PCA) of the RGB pixel values reveals these underlying patterns by identifying the directions of maximum variance in the 3D color space.
The technique operates as follows:
Flatten and Center: Reshape the image from (H × W × 3) to (N × 3) where N = H × W is the total number of pixels. Compute the mean of each color channel.
Compute Covariance Matrix: Calculate the 3 × 3 covariance matrix of the centered RGB values: $$\Sigma = \frac{1}{N-1} \sum_{i=1}^{N} (\mathbf{p}_i - \boldsymbol{\mu})(\mathbf{p}_i - \boldsymbol{\mu})^T$$
Eigendecomposition: Perform eigendecomposition on the covariance matrix to obtain eigenvectors {\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3} (principal components) and corresponding eigenvalues {\lambda_1, \lambda_2, \lambda_3}.
Compute Perturbation Vector: Given random alpha values {\alpha_1, \alpha_2, \alpha_3}, compute the color offset: $$\Delta\mathbf{c} = \sum_{k=1}^{3} \alpha_k \lambda_k \mathbf{e}_k$$
Apply Perturbation: Add the perturbation vector to every pixel in the image: $$\mathbf{p}'_i = \mathbf{p}_i + \Delta\mathbf{c}$$
Clamp Values: Ensure all resulting pixel values remain within the valid range [0, 255].
The eigenvalues modulate the perturbation magnitude—directions with higher variance (larger eigenvalues) receive larger perturbations when scaled by the same alpha factor. This ensures that color shifts follow the natural color variation patterns present in the image. The alpha values are typically drawn from a Gaussian distribution with mean 0 (often with σ ≈ 0.1), introducing controlled randomness that simulates variations in lighting and atmospheric conditions.
Implement a function that applies PCA-based color perturbation to an RGB image. Given an image array of shape (H, W, 3) with pixel values in [0, 255] and an alpha array of shape (3,) containing perturbation coefficients, compute and apply the color offset to produce an augmented image.
Implementation Requirements:
image = np.array([[[102, 179, 92], [14, 106, 71]], [[188, 20, 102], [121, 210, 214]]], dtype=np.uint8)
alpha = np.array([0.1, -0.05, 0.03])[[[96, 187, 93], [8, 114, 72]], [[182, 28, 103], [115, 218, 215]]]The 2×2 RGB image has 4 pixels with diverse color values. PCA reveals the principal directions of color variance across these pixels. With alpha = [0.1, -0.05, 0.03], the first principal component (highest variance) receives a positive 0.1 scaling, the second receives -0.05, and the third receives 0.03. The resulting perturbation vector, computed as Σ(αᵢ × λᵢ × eᵢ), shifts all pixels uniformly in color space. Red values generally decrease by ~6, green values increase by ~8, and blue values increase by ~1, though the exact changes depend on the eigenstructure of this specific image's color distribution.
image = np.array([[[102, 220, 225], [95, 179, 61], [234, 203, 92]], [[3, 98, 243], [14, 149, 245], [46, 106, 244]], [[99, 187, 71], [212, 153, 199], [188, 174, 65]]], dtype=np.uint8)
alpha = np.array([0.05, 0.0, -0.05])[[[98, 217, 228], [91, 176, 64], [230, 200, 95]], [[0, 95, 246], [10, 146, 248], [42, 103, 247]], [[95, 184, 74], [208, 150, 202], [184, 171, 68]]]This 3×3 image with 9 pixels provides a richer color distribution for PCA analysis. The second alpha value is 0.0, meaning the second principal component contributes nothing to the perturbation. The first and third components apply opposing effects (0.05 and -0.05), creating a balanced color shift. Notice how some pixel values (like the blue channel hitting 246, 248, 247) approach but don't exceed the 255 maximum, while one red value drops to 0 (clamped from a negative result).
image = np.array([[[0, 0, 0], [85, 85, 85], [170, 170, 170], [255, 255, 255]], [[0, 0, 0], [85, 85, 85], [170, 170, 170], [255, 255, 255]], [[0, 0, 0], [85, 85, 85], [170, 170, 170], [255, 255, 255]]], dtype=np.uint8)
alpha = np.array([0.1, 0.1, 0.1])[[[9, 9, 9], [94, 94, 94], [179, 179, 179], [255, 255, 255]], [[9, 9, 9], [94, 94, 94], [179, 179, 179], [255, 255, 255]], [[9, 9, 9], [94, 94, 94], [179, 179, 179], [255, 255, 255]]]This grayscale gradient image (identical R, G, B values per pixel) has all its color variance along the luminance axis—the single principal component pointing equally in all three color directions [1/√3, 1/√3, 1/√3]. Since all pixels have identical RGB values within each column, the PCA yields a perturbation that shifts all channels equally. Black (0,0,0) becomes (9,9,9), dark gray becomes (94,94,94), but white (255,255,255) remains at (255,255,255) due to clamping at the maximum value.
Constraints