Loading content...
The Pearson correlation coefficient is one of the most fundamental measures in statistics and machine learning for quantifying the linear relationship between two variables. When working with multi-dimensional datasets, we often need to compute the correlation matrix, which captures the pairwise correlation coefficients between all features in the dataset.
The Pearson correlation coefficient between two variables X and Y is defined as:
$$\rho_{X,Y} = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y} = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2} \cdot \sqrt{\sum_{i=1}^{n}(Y_i - \bar{Y})^2}}$$
Where:
| Correlation | Interpretation |
|---|---|
| +1.0 | Perfect positive linear relationship |
| +0.7 to +0.9 | Strong positive correlation |
| +0.4 to +0.6 | Moderate positive correlation |
| +0.1 to +0.3 | Weak positive correlation |
| 0.0 | No linear relationship |
| -1.0 | Perfect negative linear relationship |
Implement a function that computes the correlation matrix for a given dataset. The function should:
Note: The diagonal elements of an autocorrelation matrix are always 1.0 (each feature is perfectly correlated with itself).
X = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]][[1.0, 1.0], [1.0, 1.0]]The dataset X has 3 samples and 2 features. Both features increase linearly with a constant difference between them across all samples. The correlation between feature 1 (values: 1, 3, 5) and feature 2 (values: 2, 4, 6) is exactly 1.0, indicating a perfect positive linear relationship.
• corr(feature_0, feature_0) = 1.0 (self-correlation) • corr(feature_0, feature_1) = 1.0 (perfect positive correlation) • corr(feature_1, feature_0) = 1.0 (symmetric) • corr(feature_1, feature_1) = 1.0 (self-correlation)
X = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]][[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]This 3×3 dataset has 3 samples and 3 features. All three features have the exact same increasing pattern (each increases by 3 across samples). Since they all move in perfect lockstep, every pairwise correlation coefficient equals 1.0. The resulting 3×3 correlation matrix has all elements equal to 1.0.
X = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
Y = [[2.0, 3.0], [4.0, 5.0], [6.0, 7.0]][[1.0, 1.0], [1.0, 1.0]]When a secondary dataset Y is provided, we compute the cross-correlation between features of X and features of Y. Each feature in X is perfectly correlated with each feature in Y because they all follow the same linear progression pattern. The output matrix shape is (features_in_X × features_in_Y) = (2 × 2).
Constraints