0/318

00:00:00

Description

Editorial

Pearson Correlation Matrix Computation

MEDIUM20 pts

The Pearson correlation coefficient is one of the most fundamental measures in statistics and machine learning for quantifying the linear relationship between two variables. When working with multi-dimensional datasets, we often need to compute the correlation matrix, which captures the pairwise correlation coefficients between all features in the dataset.

Understanding Correlation

The Pearson correlation coefficient between two variables X and Y is defined as:

$$\rho_{X,Y} = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y} = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2} \cdot \sqrt{\sum_{i=1}^{n}(Y_i - \bar{Y})^2}}$$

Where:

Cov(X, Y) is the covariance between X and Y
σ_X and σ_Y are the standard deviations of X and Y respectively
X̄ and Ȳ are the means of X and Y
The correlation coefficient ranges from -1 to +1

Interpretation of Values

Correlation	Interpretation
+1.0	Perfect positive linear relationship
+0.7 to +0.9	Strong positive correlation
+0.4 to +0.6	Moderate positive correlation
+0.1 to +0.3	Weak positive correlation
0.0	No linear relationship
-1.0	Perfect negative linear relationship

Your Task

Implement a function that computes the correlation matrix for a given dataset. The function should:

Accept a primary 2D array X of shape (n_samples, n_features)
Optionally accept a secondary 2D array Y of shape (n_samples, m_features)
If Y is not provided, compute the autocorrelation matrix of X (correlation of each feature with every other feature in X)
If Y is provided, compute the cross-correlation matrix between features of X and features of Y
Return the correlation matrix as a 2D numpy array

Note: The diagonal elements of an autocorrelation matrix are always 1.0 (each feature is perfectly correlated with itself).

Example

Input

X = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]

Output

[[1.0, 1.0], [1.0, 1.0]]

Explanation

The dataset X has 3 samples and 2 features. Both features increase linearly with a constant difference between them across all samples. The correlation between feature 1 (values: 1, 3, 5) and feature 2 (values: 2, 4, 6) is exactly 1.0, indicating a perfect positive linear relationship.

• corr(feature_0, feature_0) = 1.0 (self-correlation) • corr(feature_0, feature_1) = 1.0 (perfect positive correlation) • corr(feature_1, feature_0) = 1.0 (symmetric) • corr(feature_1, feature_1) = 1.0 (self-correlation)

Example

Input

X = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]

Output

[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]

Explanation

This 3×3 dataset has 3 samples and 3 features. All three features have the exact same increasing pattern (each increases by 3 across samples). Since they all move in perfect lockstep, every pairwise correlation coefficient equals 1.0. The resulting 3×3 correlation matrix has all elements equal to 1.0.

Example

Input

X = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
Y = [[2.0, 3.0], [4.0, 5.0], [6.0, 7.0]]

Output

[[1.0, 1.0], [1.0, 1.0]]

Explanation

When a secondary dataset Y is provided, we compute the cross-correlation between features of X and features of Y. Each feature in X is perfectly correlated with each feature in Y because they all follow the same linear progression pattern. The output matrix shape is (features_in_X × features_in_Y) = (2 × 2).

Accepted0/0·0% Acceptance

Constraints

2 ≤ n_samples ≤ 1000 (at least 2 samples required to compute correlation)
1 ≤ n_features, m_features ≤ 100
-10⁶ ≤ X[i][j], Y[i][j] ≤ 10⁶
X and Y (if provided) must have the same number of samples (rows)
Features with zero variance (constant values) should have correlation of 0 with other features
All values are floating-point numbers

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,2],[3,4],[5,6]]

Y =

[[2,3],[4,5],[6,7]]

Understanding Correlation

The Pearson correlation coefficient between two variables X and Y is defined as:

Where:

Cov(X, Y) is the covariance between X and Y

σ_X and σ_Y are the standard deviations of X and Y respectively

X̄ and Ȳ are the means of X and Y

The correlation coefficient ranges from -1 to +1

Correlation

Interpretation

+1.0

Perfect positive linear relationship

+0.7 to +0.9

Strong positive correlation

+0.4 to +0.6

Moderate positive correlation

+0.1 to +0.3

Weak positive correlation

0.0

No linear relationship

-1.0

Perfect negative linear relationship

Your Task

Implement a function that computes the correlation matrix for a given dataset. The function should:

Accept a primary 2D array X of shape (n_samples, n_features)

Optionally accept a secondary 2D array Y of shape (n_samples, m_features)

If Y is not provided, compute the autocorrelation matrix of X (correlation of each feature with every other feature in X)

If Y is provided, compute the cross-correlation matrix between features of X and features of Y

Return the correlation matrix as a 2D numpy array

Note: The diagonal elements of an autocorrelation matrix are always 1.0 (each feature is perfectly correlated with itself).

Pearson Correlation Matrix Computation

Understanding Correlation

Interpretation of Values

Your Task

Hints

Pearson Correlation Matrix Computation

Understanding Correlation

Interpretation of Values

Your Task

Hints