Loading problem...
Gradient descent is one of the most fundamental optimization algorithms in machine learning. It provides an iterative approach to finding the minimum of a function by repeatedly taking steps proportional to the negative of the gradient (or slope) at the current point.
In the context of linear regression, gradient descent is used to find the optimal coefficients (weights) that minimize the difference between predicted values and actual target values. This difference is typically measured using the Mean Squared Error (MSE) cost function.
A linear regression model predicts outputs using the equation:
$$\hat{y} = X \cdot \theta$$
Where:
The algorithm works by iteratively updating the coefficients in the direction that reduces the cost function:
Where α (alpha) is the learning rate that controls the step size.
Implement a function that performs linear regression using the gradient descent optimization algorithm. The function should:
X = [[1, 1], [1, 2], [1, 3]]
y = [1, 2, 3]
alpha = 0.01
iterations = 1000[0.1107, 0.9513]We have 3 data points with a single feature (plus intercept column). The target values show a perfect linear relationship y = x.
After 1000 iterations of gradient descent with learning rate 0.01: • The intercept coefficient converges to approximately 0.1107 • The slope coefficient converges to approximately 0.9513
Note: With more iterations or different learning rates, these would converge closer to the true values of [0, 1]. The gradient descent is still converging toward the optimal solution.
X = [[1.0, 1.0], [1.0, 2.0], [1.0, 3.0], [1.0, 4.0], [1.0, 5.0]]
y = [5.0, 8.0, 11.0, 14.0, 17.0]
alpha = 0.01
iterations = 1000[1.8, 3.0554]This dataset follows the relationship y = 2 + 3x (intercept of 2, slope of 3).
With 5 data points and the same hyperparameters: • The intercept coefficient reaches approximately 1.8 • The slope coefficient reaches approximately 3.0554
The algorithm is converging toward the true parameters [2.0, 3.0]. Additional iterations would bring the coefficients even closer to these optimal values.
X = [[1, 1, 1], [1, 2, 2], [1, 3, 3], [1, 4, 4]]
y = [3, 6, 9, 12]
alpha = 0.01
iterations = 1000[0.0986, 1.4834, 1.4834]This is a multivariate case with 4 data points and 2 features (plus intercept). The target follows y = 1.5x₁ + 1.5x₂ approximately.
After optimization: • Intercept: 0.0986 • Coefficient for feature 1: 1.4834 • Coefficient for feature 2: 1.4834
Notice both feature coefficients are equal because the features have identical values—the algorithm distributes the weight equally between correlated features.
Constraints