Loading content...
In supervised learning, linear regression models are foundational tools for predicting continuous outcomes. However, standard linear regression can suffer from overfitting, especially when dealing with high-dimensional data or when features exhibit multicollinearity (strong correlations among predictors). Regularization techniques address these challenges by penalizing large coefficient values.
The Regularization Techniques:
L1 Regularization (Lasso) adds the absolute values of weights to the loss function: $$\text{L1 Penalty} = \alpha \cdot \lambda_1 \cdot \sum_{j=1}^{m} |w_j|$$
This promotes sparsity by driving some coefficients exactly to zero, effectively performing automatic feature selection.
L2 Regularization (Ridge) adds the squared values of weights to the loss function: $$\text{L2 Penalty} = \alpha \cdot \lambda_2 \cdot \sum_{j=1}^{m} w_j^2$$
This encourages smaller, more distributed weights but rarely sets coefficients exactly to zero.
The Hybrid Approach:
The hybrid regularized linear optimizer combines both penalties, balancing the benefits of each. The combined loss function is:
$$L(w, b) = \frac{1}{2n} \sum_{i=1}^{n} (y_i - \hat{y}i)^2 + \alpha \cdot \rho \cdot \sum{j=1}^{m} |w_j| + \frac{\alpha \cdot (1 - \rho)}{2} \cdot \sum_{j=1}^{m} w_j^2$$
Where:
Gradient Descent Optimization:
The model parameters are updated iteratively using gradient descent:
$$w_j \leftarrow w_j - \eta \cdot \left( \frac{\partial L}{\partial w_j} \right)$$ $$b \leftarrow b - \eta \cdot \left( \frac{\partial L}{\partial b} \right)$$
Where η is the learning rate. The gradient with respect to each weight includes contributions from both the mean squared error and the regularization terms:
$$\frac{\partial L}{\partial w_j} = -\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}i) \cdot x{ij} + \alpha \cdot \rho \cdot \text{sign}(w_j) + \alpha \cdot (1 - \rho) \cdot w_j$$
The gradient with respect to the bias is simply: $$\frac{\partial L}{\partial b} = -\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)$$
Your Task:
Implement a function that trains a linear model using gradient descent with combined L1 and L2 regularization. Your function should:
X = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]]
y = [0.0, 1.0, 2.0]
alpha = 0.1
l1_ratio = 0.5
learning_rate = 0.01
epochs = 1000{"weights": [0.44, 0.44], "bias": 0.13}The model learns a linear relationship where both features contribute equally (weights ≈ 0.44 each). The regularization prevents the weights from growing too large while allowing the model to capture the underlying pattern y ≈ 0.44x₁ + 0.44x₂ + 0.13. With the hybrid penalty, the weights converge to similar values due to the symmetry in the data.
X = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]
y = [6.0, 15.0, 24.0, 33.0]
alpha = 0.1
l1_ratio = 0.5
learning_rate = 0.01
epochs = 1000{"weights": [1.39, 0.99, 0.6], "bias": 0.91}With a 4x3 feature matrix, the model distributes weights across all features to fit the target. The first feature receives the highest weight (1.39) while the third feature receives the lowest (0.6). The hybrid regularization ensures weights are neither too sparse nor too uniform, achieving a balance between L1 and L2 behaviors.
X = [[1.0], [2.0], [3.0], [4.0], [5.0]]
y = [2.0, 4.0, 6.0, 8.0, 10.0]
alpha = 0.1
l1_ratio = 0.5
learning_rate = 0.01
epochs = 1000{"weights": [1.85], "bias": 0.5}This is a simple univariate regression problem where y = 2x. The regularization slightly shrinks the ideal weight of 2.0 down to 1.85, while the bias compensates to minimize overall error. This demonstrates how regularization trades off between fitting the training data perfectly and keeping weights small.
Constraints