00:00:00

Description

Editorial

L2-Regularized Linear Regression Loss Computation

EASY10 pts

In machine learning, training robust predictive models requires balancing two competing objectives: fitting the training data accurately and preventing overfitting to noise or irrelevant patterns. L2 regularization, commonly known as Ridge regularization, provides an elegant mathematical framework to achieve this balance.

The L2-regularized linear regression loss (or Ridge loss) combines two essential components:

1. Mean Squared Error (MSE): This measures how well the model's predictions match the actual target values. For a feature matrix X of shape (n_samples × n_features), weight vector w of shape (n_features,), and true labels y of shape (n_samples,), the MSE is:

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2 = \frac{1}{n} |X \cdot w - y|^2$$

where ŷ = X · w represents the model's predictions.

2. L2 Regularization Penalty: This term penalizes large weight values, encouraging the model to find simpler solutions with smaller coefficients:

$$\text{Regularization} = \alpha \cdot |w|^2 = \alpha \sum_{j=1}^{m} w_j^2$$

where α (alpha) is the regularization strength hyperparameter that controls the trade-off between fitting the data and keeping weights small.

Combined Ridge Loss: The total loss function is the sum of both components:

$$\mathcal{L}_{\text{Ridge}} = \text{MSE} + \text{Regularization} = \frac{1}{n} |X \cdot w - y|^2 + \alpha |w|^2$$

Why Regularization Matters:

Prevents overfitting: By penalizing large weights, the model is discouraged from relying too heavily on any single feature
Improves generalization: Models with smaller weights tend to generalize better to unseen data
Handles multicollinearity: Ridge regression is particularly effective when features are highly correlated
Numerical stability: Adding the regularization term makes the normal equations more stable when computing closed-form solutions

Your Task: Write a function that computes the L2-regularized linear regression loss given a feature matrix, weight vector, true labels, and regularization parameter. Round the final result to 4 decimal places.

Example

Input

X = [[1.0, 1.0], [2.0, 1.0], [3.0, 1.0], [4.0, 1.0]]
w = [0.2, 2.0]
y_true = [2.0, 3.0, 4.0, 5.0]
alpha = 0.1

Output

2.204

Explanation

Step 1: Compute predictions (ŷ = X · w) • Sample 1: (1.0 × 0.2) + (1.0 × 2.0) = 0.2 + 2.0 = 2.2 • Sample 2: (2.0 × 0.2) + (1.0 × 2.0) = 0.4 + 2.0 = 2.4 • Sample 3: (3.0 × 0.2) + (1.0 × 2.0) = 0.6 + 2.0 = 2.6 • Sample 4: (4.0 × 0.2) + (1.0 × 2.0) = 0.8 + 2.0 = 2.8

Predictions: ŷ = [2.2, 2.4, 2.6, 2.8]

Step 2: Compute MSE • Errors: [2.2-2.0, 2.4-3.0, 2.6-4.0, 2.8-5.0] = [0.2, -0.6, -1.4, -2.2] • Squared errors: [0.04, 0.36, 1.96, 4.84] • MSE = (0.04 + 0.36 + 1.96 + 4.84) / 4 = 7.2 / 4 = 1.8

Step 3: Compute L2 regularization term • ||w||² = 0.2² + 2.0² = 0.04 + 4.0 = 4.04 • Regularization = α × ||w||² = 0.1 × 4.04 = 0.404

Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 1.8 + 0.404 = 2.204

Example

Input

X = [[1.0, 2.0], [3.0, 4.0]]
w = [0.5, 0.5]
y_true = [1.0, 2.0]
alpha = 0.5

Output

1.5

Explanation

Step 1: Compute predictions (ŷ = X · w) • Sample 1: (1.0 × 0.5) + (2.0 × 0.5) = 0.5 + 1.0 = 1.5 • Sample 2: (3.0 × 0.5) + (4.0 × 0.5) = 1.5 + 2.0 = 3.5

Predictions: ŷ = [1.5, 3.5]

Step 2: Compute MSE • Errors: [1.5-1.0, 3.5-2.0] = [0.5, 1.5] • Squared errors: [0.25, 2.25] • MSE = (0.25 + 2.25) / 2 = 2.5 / 2 = 1.25

Step 3: Compute L2 regularization term • ||w||² = 0.5² + 0.5² = 0.25 + 0.25 = 0.5 • Regularization = α × ||w||² = 0.5 × 0.5 = 0.25

Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 1.25 + 0.25 = 1.5

Example

Input

X = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
w = [1.0, 2.0, 3.0]
y_true = [1.0, 2.0, 3.0]
alpha = 0.01

Output

0.14

Explanation

Step 1: Compute predictions (ŷ = X · w) This is an identity matrix, so predictions equal the corresponding weights: • Sample 1: (1.0 × 1.0) + (0.0 × 2.0) + (0.0 × 3.0) = 1.0 • Sample 2: (0.0 × 1.0) + (1.0 × 2.0) + (0.0 × 3.0) = 2.0 • Sample 3: (0.0 × 1.0) + (0.0 × 2.0) + (1.0 × 3.0) = 3.0

Predictions: ŷ = [1.0, 2.0, 3.0]

Step 2: Compute MSE • Since ŷ = y_true exactly, all errors are zero • MSE = (0² + 0² + 0²) / 3 = 0

Step 3: Compute L2 regularization term • ||w||² = 1.0² + 2.0² + 3.0² = 1 + 4 + 9 = 14 • Regularization = α × ||w||² = 0.01 × 14 = 0.14

Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 0 + 0.14 = 0.14

This example demonstrates that even with perfect predictions (zero MSE), the regularization term still contributes to the total loss based on the magnitude of the weights.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_samples ≤ 10,000 (number of rows in X)
1 ≤ n_features ≤ 1,000 (number of columns in X)
Length of w equals n_features
Length of y_true equals n_samples
-10⁶ ≤ X[i][j] ≤ 10⁶
-10⁶ ≤ w[i] ≤ 10⁶
-10⁶ ≤ y_true[i] ≤ 10⁶
0 ≤ alpha ≤ 1000 (regularization parameter)
The result should be rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,1],[2,1],[3,1],[4,1]]

w =

[0.2,2]

alpha =

0.1

y_true =

[2,3,4,5]

Loading problem...

101

00:00:00

Description

Editorial

L2-Regularized Linear Regression Loss Computation

EASY10 pts

The L2-regularized linear regression loss (or Ridge loss) combines two essential components:

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2 = \frac{1}{n} |X \cdot w - y|^2$$

where ŷ = X · w represents the model's predictions.

2. L2 Regularization Penalty: This term penalizes large weight values, encouraging the model to find simpler solutions with smaller coefficients:

$$\text{Regularization} = \alpha \cdot |w|^2 = \alpha \sum_{j=1}^{m} w_j^2$$

where α (alpha) is the regularization strength hyperparameter that controls the trade-off between fitting the data and keeping weights small.

Combined Ridge Loss: The total loss function is the sum of both components:

$$\mathcal{L}_{\text{Ridge}} = \text{MSE} + \text{Regularization} = \frac{1}{n} |X \cdot w - y|^2 + \alpha |w|^2$$

Why Regularization Matters:

Prevents overfitting: By penalizing large weights, the model is discouraged from relying too heavily on any single feature
Improves generalization: Models with smaller weights tend to generalize better to unseen data
Handles multicollinearity: Ridge regression is particularly effective when features are highly correlated
Numerical stability: Adding the regularization term makes the normal equations more stable when computing closed-form solutions

Example

Input

X = [[1.0, 1.0], [2.0, 1.0], [3.0, 1.0], [4.0, 1.0]]
w = [0.2, 2.0]
y_true = [2.0, 3.0, 4.0, 5.0]
alpha = 0.1

Output

2.204

Explanation

Predictions: ŷ = [2.2, 2.4, 2.6, 2.8]

Step 3: Compute L2 regularization term • ||w||² = 0.2² + 2.0² = 0.04 + 4.0 = 4.04 • Regularization = α × ||w||² = 0.1 × 4.04 = 0.404

Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 1.8 + 0.404 = 2.204

Example

Input

X = [[1.0, 2.0], [3.0, 4.0]]
w = [0.5, 0.5]
y_true = [1.0, 2.0]
alpha = 0.5

Output

1.5

Explanation

Step 1: Compute predictions (ŷ = X · w) • Sample 1: (1.0 × 0.5) + (2.0 × 0.5) = 0.5 + 1.0 = 1.5 • Sample 2: (3.0 × 0.5) + (4.0 × 0.5) = 1.5 + 2.0 = 3.5

Predictions: ŷ = [1.5, 3.5]

Step 2: Compute MSE • Errors: [1.5-1.0, 3.5-2.0] = [0.5, 1.5] • Squared errors: [0.25, 2.25] • MSE = (0.25 + 2.25) / 2 = 2.5 / 2 = 1.25

Step 3: Compute L2 regularization term • ||w||² = 0.5² + 0.5² = 0.25 + 0.25 = 0.5 • Regularization = α × ||w||² = 0.5 × 0.5 = 0.25

Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 1.25 + 0.25 = 1.5

Example

Input

X = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
w = [1.0, 2.0, 3.0]
y_true = [1.0, 2.0, 3.0]
alpha = 0.01

Output

0.14

Explanation

Predictions: ŷ = [1.0, 2.0, 3.0]

Step 2: Compute MSE • Since ŷ = y_true exactly, all errors are zero • MSE = (0² + 0² + 0²) / 3 = 0

Step 3: Compute L2 regularization term • ||w||² = 1.0² + 2.0² + 3.0² = 1 + 4 + 9 = 14 • Regularization = α × ||w||² = 0.01 × 14 = 0.14

Step 4: Combine loss components • Ridge Loss = MSE + Regularization = 0 + 0.14 = 0.14

This example demonstrates that even with perfect predictions (zero MSE), the regularization term still contributes to the total loss based on the magnitude of the weights.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_samples ≤ 10,000 (number of rows in X)
1 ≤ n_features ≤ 1,000 (number of columns in X)
Length of w equals n_features
Length of y_true equals n_samples
-10⁶ ≤ X[i][j] ≤ 10⁶
-10⁶ ≤ w[i] ≤ 10⁶
-10⁶ ≤ y_true[i] ≤ 10⁶
0 ≤ alpha ≤ 1000 (regularization parameter)
The result should be rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,1],[2,1],[3,1],[4,1]]

w =

[0.2,2]

alpha =

0.1

y_true =

[2,3,4,5]

L2-Regularized Linear Regression Loss Computation

Hints

L2-Regularized Linear Regression Loss Computation

Hints