0/318

00:00:00

Description

Editorial

Sparse Regression Optimizer with L1 Penalty

MEDIUM25 pts

In machine learning, building predictive models that are both accurate and interpretable is a fundamental challenge. One powerful technique to achieve this balance is L1-regularized linear regression, commonly known as the Lasso (Least Absolute Shrinkage and Selection Operator) method.

Unlike standard linear regression, which can produce models with many non-zero coefficients, L1 regularization adds a penalty proportional to the absolute values of the model weights. This penalty serves two crucial purposes:

Automatic Feature Selection: The L1 penalty naturally drives some coefficients to exactly zero, effectively removing irrelevant features from the model
Overfitting Prevention: By constraining the magnitude of weights, the regularization prevents the model from fitting noise in the training data

The Optimization Objective

The algorithm seeks to minimize a composite loss function that balances prediction accuracy with model simplicity:

$$J(\mathbf{w}, b) = \frac{1}{2n} \sum_{i=1}^{n} \left( y_i - \left( \sum_{j=1}^{p} X_{ij} w_j + b \right) \right)^2 + \alpha \sum_{j=1}^{p} |w_j|$$

Where:

n represents the total number of training samples
p denotes the number of input features
yᵢ is the actual target value for sample i
ŷᵢ = Σⱼ Xᵢⱼwⱼ + b is the model's predicted value
wⱼ is the weight (coefficient) for feature j
b is the bias (intercept) term, which is not regularized
α controls the regularization intensity

Gradient Descent Optimization

Since the L1 penalty is not differentiable at zero, we use the subgradient method. For each weight update, the gradient of the L1 term uses the sign function:

$$\frac{\partial}{\partial w_j} |w_j| = \text{sign}(w_j)$$

The update rules become:

Weights: $w_j \leftarrow w_j - \eta \left( \frac{1}{n} \sum_{i} (\hat{y}i - y_i) X{ij} + \alpha \cdot \text{sign}(w_j) \right)$
Bias: $b \leftarrow b - \eta \cdot \frac{1}{n} \sum_{i} (\hat{y}_i - y_i)$

Where η is the learning rate.

Your Task

Implement the gradient descent optimization algorithm to train an L1-regularized linear regression model. Your function should iteratively update the weights and bias to minimize the objective function, returning the final optimized parameters.

Example

Input

X = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]]
y = [0.0, 1.0, 2.0]
alpha = 0.1
learning_rate = 0.01
max_iter = 1000

Output

weights = [0.42371644, 0.42371644], bias = 0.15385068

Explanation

With two identical features (both columns have the same values), the L1 regularization distributes the learned weight equally between them. Each feature receives approximately 0.424 of the total weight. The bias term of ~0.154 adjusts the intercept to minimize the overall prediction error. The symmetric weight distribution demonstrates how L1 regularization handles perfectly correlated features.

Example

Input

X = [[1.0], [2.0], [3.0], [4.0], [5.0]]
y = [2.0, 4.0, 6.0, 8.0, 10.0]
alpha = 0.01
learning_rate = 0.01
max_iter = 1000

Output

weights = [1.96953246], bias = 0.10694592

Explanation

This represents a perfect linear relationship where y = 2x. The learned weight of ~1.97 is slightly less than 2.0 due to the L1 penalty pushing the coefficient toward zero. The small regularization strength (α = 0.01) allows the weight to remain close to the true value while providing some shrinkage. The small bias (~0.107) compensates for this shrinkage effect.

Example

Input

X = [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]]
y = [5.0, 8.0, 11.0, 14.0]
alpha = 0.05
learning_rate = 0.01
max_iter = 500

Output

weights = [1.26664079, 1.6892517], bias = 0.42261091

Explanation

The true relationship is y = x₁ + 2x₂. After 500 iterations with moderate L1 regularization (α = 0.05), the model learns weights that approximate this pattern. The second feature receives a larger weight (~1.69 vs ~1.27) because it contributes more to the target. The L1 penalty causes both weights to be slightly shrunk from their true values, with the model achieving a balance between fitting the data and keeping weights small.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_samples ≤ 1000 (number of training samples)
1 ≤ n_features ≤ 100 (number of input features)
-10⁶ ≤ X[i][j] ≤ 10⁶ (feature values)
-10⁶ ≤ y[i] ≤ 10⁶ (target values)
0 < alpha ≤ 10 (regularization strength)
0 < learning_rate ≤ 1 (step size)
1 ≤ max_iter ≤ 10000 (maximum iterations)
Weights should be initialized to zeros
Bias should be initialized to zero

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[0,0],[1,1],[2,2]]

y =

[0,1,2]

alpha =

0.1

max_iter =

1000

learning_rate =

0.01

Sparse Regression Optimizer with L1 Penalty

Hints

Sparse Regression Optimizer with L1 Penalty

Hints