Loading content...
Linear regression is one of the most foundational techniques in machine learning and statistics. While iterative optimization methods like gradient descent are commonly used, there exists an elegant closed-form analytical solution that directly computes the optimal coefficients in a single calculation.
Given a feature matrix X of dimensions n × m (where n is the number of data samples and m is the number of features including the bias term) and a target vector y of length n, the goal is to find the coefficient vector θ (theta) of length m that minimizes the mean squared error between predicted and actual values.
The closed-form solution is derived by setting the gradient of the cost function to zero and solving for the coefficients:
$$\theta = (X^T X)^{-1} X^T y$$
Where:
Mathematical Intuition: This formula finds the unique set of coefficients that projects the target vector y onto the column space of X in a way that minimizes the Euclidean distance (squared error) between the true targets and the predictions.
Your Task: Write a Python function that computes the optimal linear model coefficients using the closed-form solution. The function should:
X = [[1, 1], [1, 2], [1, 3]]
y = [1, 2, 3][0.0, 1.0]The data represents three points: (1, 1), (2, 2), (3, 3). The first column of X is all 1s (intercept term), and the second column contains the feature values.
Using the closed-form solution: • θ₀ (intercept) = 0.0 • θ₁ (slope) = 1.0
This gives us the linear model: y = 0.0 + 1.0·x, which perfectly fits through all three data points with zero residual error.
X = [[1, 0], [1, 1], [1, 2], [1, 3]]
y = [2, 5, 8, 11][2.0, 3.0]The data consists of four samples with feature values 0, 1, 2, 3 and target values 2, 5, 8, 11. The pattern shows a linear relationship.
Applying the closed-form solution: • θ₀ (intercept) = 2.0 • θ₁ (slope) = 3.0
The resulting model is: y = 2.0 + 3.0·x, which predicts: • x=0: y = 2.0 + 3.0·0 = 2 ✓ • x=1: y = 2.0 + 3.0·1 = 5 ✓ • x=2: y = 2.0 + 3.0·2 = 8 ✓ • x=3: y = 2.0 + 3.0·3 = 11 ✓
X = [[1, 1, 1], [1, 2, 1], [1, 1, 2], [1, 2, 2], [1, 3, 1]]
y = [6, 8, 9, 11, 10][1.0, 2.0, 3.0]This is a multiple linear regression problem with two features (plus intercept). The feature matrix has 5 samples with 3 columns: an intercept column of 1s and two feature columns.
The closed-form solution yields: • θ₀ (intercept) = 1.0 • θ₁ (coefficient for feature 1) = 2.0 • θ₂ (coefficient for feature 2) = 3.0
The model: y = 1.0 + 2.0·x₁ + 3.0·x₂ fits the data perfectly: • (1,1): y = 1 + 2(1) + 3(1) = 6 ✓ • (2,1): y = 1 + 2(2) + 3(1) = 8 ✓ • (1,2): y = 1 + 2(1) + 3(2) = 9 ✓ • (2,2): y = 1 + 2(2) + 3(2) = 11 ✓ • (3,1): y = 1 + 2(3) + 3(1) = 10 ✓
Constraints