0/318

00:00:00

Description

Editorial

Principal Component Extraction

MEDIUM20 pts

Principal Component Analysis (PCA) is one of the most powerful and widely-used techniques in data science and machine learning for dimensionality reduction. It transforms high-dimensional data into a lower-dimensional representation while preserving as much variance (information) as possible.

The core idea behind PCA is to identify the directions (called principal components) along which the data varies the most. These directions are the eigenvectors of the data's covariance matrix, and the eigenvalues indicate how much variance each direction captures.

The PCA Algorithm

Given a dataset X with n samples and m features, PCA proceeds as follows:

Step 1: Standardize the Data Center the data by subtracting the mean of each feature: $$X_{standardized} = X - \mu$$

where (\mu) is the mean vector computed across all samples for each feature.

Step 2: Compute the Covariance Matrix Calculate the covariance matrix C of the standardized data: $$C = \frac{1}{n-1} X_{standardized}^T X_{standardized}$$

The covariance matrix captures the relationships between all pairs of features.

Step 3: Eigenvalue Decomposition Find the eigenvalues (\lambda_1, \lambda_2, ..., \lambda_m) and corresponding eigenvectors (v_1, v_2, ..., v_m) of the covariance matrix. Each eigenvector represents a principal component direction, and its eigenvalue indicates the variance captured along that direction.

Step 4: Select Top k Components Sort the eigenvectors by their corresponding eigenvalues in descending order and select the top k eigenvectors as the principal components.

Important Note on Eigenvector Direction

Mathematically, if v is an eigenvector, then -v is also a valid eigenvector pointing in the opposite direction. To ensure consistent, reproducible results, apply the following normalization rule:

For each eigenvector, check the first non-zero element. If it is negative, multiply the entire eigenvector by -1 to flip its direction.

This convention ensures that all implementations produce identical results regardless of the underlying numerical library used.

Your Task

Implement a function that performs Principal Component Analysis from scratch. The function should:

Accept a 2D dataset and an integer k
Standardize the dataset by centering each feature (subtract the mean)
Compute the covariance matrix
Perform eigenvalue decomposition
Apply the sign normalization rule to ensure consistent eigenvector directions
Return the top k principal components (eigenvectors) as a 2D list, with values rounded to 4 decimal places

Example

Input

data = [[1, 2], [3, 4], [5, 6]]
k = 1

Output

[[0.7071], [0.7071]]

Explanation

For this 3×2 dataset:

Step 1: Standardization Column means: [3.0, 4.0] Centered data: [1-3, 2-4] = [-2, -2] [3-3, 4-4] = [0, 0] [5-3, 6-4] = [2, 2]

Step 2: Covariance Matrix Computing the 2×2 covariance matrix yields equal variance in both features with perfect correlation.

Step 3: Eigendecomposition The dominant eigenvector is [0.7071, 0.7071], which points along the diagonal direction where the data varies most.

Step 4: Result Since k=1, we return only the first principal component. The first element (0.7071) is positive, so no sign flip is needed.

Example

Input

data = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
k = 2

Output

[[0.5774, 0.0585], [0.5774, 0.6761], [0.5774, -0.7345]]

Explanation

For this 4×3 dataset:

Step 1: Centering The data is centered by subtracting column means [5.5, 6.5, 7.5].

Step 2-3: Covariance & Eigendecomposition The covariance matrix reveals the variance structure. The eigenanalysis finds that:

The first principal component [0.5774, 0.5774, 0.5774] captures the dominant linear trend
The second component [0.0585, 0.6761, -0.7345] captures the remaining orthogonal variance

Step 4: Result With k=2, we return both eigenvectors as columns, forming a 3×2 matrix (3 features × 2 components).

Example

Input

data = [[1, 0], [0, 1], [-1, 0], [0, -1]]
k = 2

Output

[[0.0, 1.0], [1.0, 0.0]]

Explanation

This dataset contains 4 points forming a cross pattern centered at the origin:

Geometric Interpretation The points lie along two perpendicular axes. After centering and covariance computation:

The first principal component [0.0, 1.0] is aligned with the y-axis
The second principal component [1.0, 0.0] is aligned with the x-axis

Both eigenvalues are equal (the data has equal spread in both directions). The eigenvectors from eigh() are deterministic and align with the coordinate axes for this diagonal covariance matrix.

Accepted0/0·0% Acceptance

Constraints

2 ≤ number of samples (rows) ≤ 100
2 ≤ number of features (columns) ≤ 50
1 ≤ k ≤ number of features
-10⁶ ≤ data[i][j] ≤ 10⁶
The data matrix will always be valid (all rows have the same length)
Output values should be rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

k =

data =

[[1,2],[3,4],[5,6]]

Principal Component Extraction

The PCA Algorithm

Important Note on Eigenvector Direction

Your Task

Hints

Principal Component Extraction

The PCA Algorithm

Important Note on Eigenvector Direction

Your Task

Hints