Loading problem...
In machine learning, training robust predictive models requires balancing two competing objectives: fitting the training data accurately and preventing overfitting to noise or irrelevant patterns. L2 regularization, commonly known as Ridge regularization, provides an elegant mathematical framework to achieve this balance.
The L2-regularized linear regression loss (or Ridge loss) combines two essential components:
1. Mean Squared Error (MSE): This measures how well the model's predictions match the actual target values. For a feature matrix X of shape (n_samples × n_features), weight vector w of shape (n_features,), and true labels y of shape (n_samples,), the MSE is:
$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2 = \frac{1}{n} |X \cdot w - y|^2$$
where ŷ = X · w represents the model's predictions.
2. L2 Regularization Penalty: This term penalizes large weight values, encouraging the model to find simpler solutions with smaller coefficients:
$$\text{Regularization} = \alpha \cdot |w|^2 = \alpha \sum_{j=1}^{m} w_j^2$$
where α (alpha) is the regularization strength hyperparameter that controls the trade-off between fitting the data and keeping weights small.
Combined Ridge Loss: The total loss function is the sum of both components:
$$\mathcal{L}_{\text{Ridge}} = \text{MSE} + \text{Regularization} = \frac{1}{n} |X \cdot w - y|^2 + \alpha |w|^2$$
Why Regularization Matters:
Your Task: Write a function that computes the L2-regularized linear regression loss given a feature matrix, weight vector, true labels, and regularization parameter. Round the final result to 4 decimal places.
X = [[1.0, 1.0], [2.0, 1.0], [3.0, 1.0], [4.0, 1.0]]
w = [0.2, 2.0]
y_true = [2.0, 3.0, 4.0, 5.0]
alpha = 0.12.204Step 1: Compute predictions (ŷ = X · w) • Sample 1: (1.0 × 0.2) + (1.0 × 2.0) = 0.2 + 2.0 = 2.2 • Sample 2: (2.0 × 0.2) + (1.0 × 2.0) = 0.4 + 2.0 = 2.4 • Sample 3: (3.0 × 0.2) + (1.0 × 2.0) = 0.6 + 2.0 = 2.6 • Sample 4: (4.0 × 0.2) + (1.0 × 2.0) = 0.8 + 2.0 = 2.8
Predictions: ŷ = [2.2, 2.4, 2.6, 2.8]
Step 2: Compute MSE • Errors: [2.2-2.0, 2.4-3.0, 2.6-4.0, 2.8-5.0] = [0.2, -0.6, -1.4, -2.2] • Squared errors: [0.04, 0.36, 1.96, 4.84] • MSE = (0.04 + 0.36 + 1.96 + 4.84) / 4 = 7.2 / 4 = 1.8
Step 3: Compute L2 regularization term • ||w||² = 0.2² + 2.0² = 0.04 + 4.0 = 4.04 • Regularization = α × ||w||² = 0.1 × 4.04 = 0.404
Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 1.8 + 0.404 = 2.204
X = [[1.0, 2.0], [3.0, 4.0]]
w = [0.5, 0.5]
y_true = [1.0, 2.0]
alpha = 0.51.5Step 1: Compute predictions (ŷ = X · w) • Sample 1: (1.0 × 0.5) + (2.0 × 0.5) = 0.5 + 1.0 = 1.5 • Sample 2: (3.0 × 0.5) + (4.0 × 0.5) = 1.5 + 2.0 = 3.5
Predictions: ŷ = [1.5, 3.5]
Step 2: Compute MSE • Errors: [1.5-1.0, 3.5-2.0] = [0.5, 1.5] • Squared errors: [0.25, 2.25] • MSE = (0.25 + 2.25) / 2 = 2.5 / 2 = 1.25
Step 3: Compute L2 regularization term • ||w||² = 0.5² + 0.5² = 0.25 + 0.25 = 0.5 • Regularization = α × ||w||² = 0.5 × 0.5 = 0.25
Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 1.25 + 0.25 = 1.5
X = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
w = [1.0, 2.0, 3.0]
y_true = [1.0, 2.0, 3.0]
alpha = 0.010.14Step 1: Compute predictions (ŷ = X · w) This is an identity matrix, so predictions equal the corresponding weights: • Sample 1: (1.0 × 1.0) + (0.0 × 2.0) + (0.0 × 3.0) = 1.0 • Sample 2: (0.0 × 1.0) + (1.0 × 2.0) + (0.0 × 3.0) = 2.0 • Sample 3: (0.0 × 1.0) + (0.0 × 2.0) + (1.0 × 3.0) = 3.0
Predictions: ŷ = [1.0, 2.0, 3.0]
Step 2: Compute MSE • Since ŷ = y_true exactly, all errors are zero • MSE = (0² + 0² + 0²) / 3 = 0
Step 3: Compute L2 regularization term • ||w||² = 1.0² + 2.0² + 3.0² = 1 + 4 + 9 = 14 • Regularization = α × ||w||² = 0.01 × 14 = 0.14
Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 0 + 0.14 = 0.14
This example demonstrates that even with perfect predictions (zero MSE), the regularization term still contributes to the total loss based on the magnitude of the weights.
Constraints