0/318

00:00:00

Description

Editorial

Hybrid Regularized Linear Optimizer

MEDIUM20 pts

In supervised learning, linear regression models are foundational tools for predicting continuous outcomes. However, standard linear regression can suffer from overfitting, especially when dealing with high-dimensional data or when features exhibit multicollinearity (strong correlations among predictors). Regularization techniques address these challenges by penalizing large coefficient values.

The Regularization Techniques:

L1 Regularization (Lasso) adds the absolute values of weights to the loss function: $$\text{L1 Penalty} = \alpha \cdot \lambda_1 \cdot \sum_{j=1}^{m} |w_j|$$

This promotes sparsity by driving some coefficients exactly to zero, effectively performing automatic feature selection.

L2 Regularization (Ridge) adds the squared values of weights to the loss function: $$\text{L2 Penalty} = \alpha \cdot \lambda_2 \cdot \sum_{j=1}^{m} w_j^2$$

This encourages smaller, more distributed weights but rarely sets coefficients exactly to zero.

The Hybrid Approach:

The hybrid regularized linear optimizer combines both penalties, balancing the benefits of each. The combined loss function is:

$$L(w, b) = \frac{1}{2n} \sum_{i=1}^{n} (y_i - \hat{y}i)^2 + \alpha \cdot \rho \cdot \sum{j=1}^{m} |w_j| + \frac{\alpha \cdot (1 - \rho)}{2} \cdot \sum_{j=1}^{m} w_j^2$$

Where:

n is the number of samples
m is the number of features
α (alpha) controls the overall regularization strength
ρ (l1_ratio) is the mixing parameter: ρ = 1 yields pure L1, ρ = 0 yields pure L2

Gradient Descent Optimization:

The model parameters are updated iteratively using gradient descent:

$$w_j \leftarrow w_j - \eta \cdot \left( \frac{\partial L}{\partial w_j} \right)$$ $$b \leftarrow b - \eta \cdot \left( \frac{\partial L}{\partial b} \right)$$

Where η is the learning rate. The gradient with respect to each weight includes contributions from both the mean squared error and the regularization terms:

$$\frac{\partial L}{\partial w_j} = -\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}i) \cdot x{ij} + \alpha \cdot \rho \cdot \text{sign}(w_j) + \alpha \cdot (1 - \rho) \cdot w_j$$

The gradient with respect to the bias is simply: $$\frac{\partial L}{\partial b} = -\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)$$

Your Task:

Implement a function that trains a linear model using gradient descent with combined L1 and L2 regularization. Your function should:

Initialize weights to zeros and bias to zero
Perform gradient descent for the specified number of epochs
Apply both L1 and L2 penalties according to the mixing ratio
Return the optimized weights and bias, both rounded to 2 decimal places

Example

Input

X = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]]
y = [0.0, 1.0, 2.0]
alpha = 0.1
l1_ratio = 0.5
learning_rate = 0.01
epochs = 1000

Output

{"weights": [0.44, 0.44], "bias": 0.13}

Explanation

The model learns a linear relationship where both features contribute equally (weights ≈ 0.44 each). The regularization prevents the weights from growing too large while allowing the model to capture the underlying pattern y ≈ 0.44x₁ + 0.44x₂ + 0.13. With the hybrid penalty, the weights converge to similar values due to the symmetry in the data.

Example

Input

X = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]
y = [6.0, 15.0, 24.0, 33.0]
alpha = 0.1
l1_ratio = 0.5
learning_rate = 0.01
epochs = 1000

Output

{"weights": [1.39, 0.99, 0.6], "bias": 0.91}

Explanation

With a 4x3 feature matrix, the model distributes weights across all features to fit the target. The first feature receives the highest weight (1.39) while the third feature receives the lowest (0.6). The hybrid regularization ensures weights are neither too sparse nor too uniform, achieving a balance between L1 and L2 behaviors.

Example

Input

X = [[1.0], [2.0], [3.0], [4.0], [5.0]]
y = [2.0, 4.0, 6.0, 8.0, 10.0]
alpha = 0.1
l1_ratio = 0.5
learning_rate = 0.01
epochs = 1000

Output

{"weights": [1.85], "bias": 0.5}

Explanation

This is a simple univariate regression problem where y = 2x. The regularization slightly shrinks the ideal weight of 2.0 down to 1.85, while the bias compensates to minimize overall error. This demonstrates how regularization trades off between fitting the training data perfectly and keeping weights small.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_samples ≤ 1000 (number of training examples)
1 ≤ n_features ≤ 100 (number of features per sample)
0 < alpha ≤ 10 (regularization strength)
0 ≤ l1_ratio ≤ 1 (L1/L2 mixing parameter)
0.0001 ≤ learning_rate ≤ 0.1 (gradient descent step size)
100 ≤ epochs ≤ 10000 (number of training iterations)
-10⁶ ≤ X[i][j], y[i] ≤ 10⁶ (feature and target values)
Weights and bias should be rounded to 2 decimal places in the output

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[0,0],[1,1],[2,2]]

y =

[0,1,2]

alpha =

0.1

epochs =

1000

l1_ratio =

0.5

learning_rate =

0.01

Hybrid Regularized Linear Optimizer

Hints

Hybrid Regularized Linear Optimizer

Hints