Training a Single Neuron with Gradient Descent (Medium) — Practice with Code Visualizer

Understanding how a single neuron learns is the foundation of all neural network training. In this problem, you will implement the complete training loop for a single artificial neuron (also called a perceptron with sigmoid activation), including forward propagation, loss computation, and backpropagation with gradient descent.

The Neuron Model

A single neuron performs the following computation:

Weighted Sum (Pre-activation): For an input feature vector x with weights w and bias b: $$z = \sum_{i=1}^{n} w_i \cdot x_i + b = \mathbf{w}^T \mathbf{x} + b$$
Sigmoid Activation: The pre-activation value is passed through the sigmoid function to produce a probability: $$\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$$

Loss Function

We use the Mean Squared Error (MSE) loss to measure the difference between predictions and true labels: $$\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)^2$$

where N is the number of training samples, ŷᵢ is the predicted output, and yᵢ is the true binary label.

Gradient Descent Updates

Backpropagation computes the gradients of the loss with respect to each parameter. For a single sample, using the chain rule:

$$\frac{\partial \mathcal{L}}{\partial w_j} = \frac{2}{N} \sum_{i=1}^{N} (\hat{y}i - y_i) \cdot \sigma'(z_i) \cdot x{ij}$$

$$\frac{\partial \mathcal{L}}{\partial b} = \frac{2}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i) \cdot \sigma'(z_i)$$

Where the derivative of sigmoid is: $\sigma'(z) = \sigma(z) \cdot (1 - \sigma(z))$

The parameters are updated using: $$w_j \leftarrow w_j - \eta \cdot \frac{\partial \mathcal{L}}{\partial w_j}$$ $$b \leftarrow b - \eta \cdot \frac{\partial \mathcal{L}}{\partial b}$$

where η is the learning rate.

Your Task

Implement a function that trains a single neuron by:

Performing forward propagation for all training samples
Computing the MSE loss
Calculating gradients via backpropagation
Updating weights and bias using gradient descent
Repeating for the specified number of epochs

Return the updated weights, updated bias, and a list of MSE values (one per epoch), all rounded to four decimal places.

Epoch 1: • For each sample, compute z = w·x + b, then ŷ = sigmoid(z) • Sample 1: z = 0.1×1 + (-0.2)×2 + 0 = -0.3, ŷ = sigmoid(-0.3) ≈ 0.426 • Sample 2: z = 0.1×2 + (-0.2)×1 + 0 = 0, ŷ = sigmoid(0) = 0.5 • Sample 3: z = 0.1×(-1) + (-0.2)×(-2) + 0 = 0.3, ŷ = sigmoid(0.3) ≈ 0.574 • MSE = [(0.426-1)² + (0.5-0)² + (0.574-0)²] / 3 ≈ 0.3033 • Gradients are computed and parameters updated using learning rate 0.1

Epoch 2: • Process repeats with updated weights, yielding MSE ≈ 0.2942 • Final weights: [0.1036, -0.1425], bias: -0.0167

With a single training sample where the feature is 1.0 and label is 1: • The neuron starts with weight 0.5 and bias 0.0 • Initial prediction: sigmoid(0.5×1 + 0) = sigmoid(0.5) ≈ 0.622 • The error is (0.622 - 1) = -0.378, so the neuron needs to increase its output • Over 3 epochs, the weight and bias are gradually increased • MSE decreases from 0.1425 → 0.1363 → 0.1305, showing learning progress • Each epoch nudges the weights to reduce the prediction error

This is a symmetric binary classification problem: • Positive samples have positive feature values, negative samples have negative values • Starting from zero weights, the neuron outputs 0.5 for all samples (sigmoid(0) = 0.5) • Initial MSE = [(0.5-1)² + (0.5-1)² + (0.5-0)² + (0.5-0)²] / 4 = 0.25 • Due to symmetry, the bias remains 0.0 throughout training • Both weights converge to the same value (0.4937) since features are symmetric • The MSE steadily decreases from 0.25 to 0.0526 over 5 epochs • Higher learning rate (0.5) enables faster convergence

The Neuron Model

A single neuron performs the following computation:

Weighted Sum (Pre-activation): For an input feature vector x with weights w and bias b: $$z = \sum_{i=1}^{n} w_i \cdot x_i + b = \mathbf{w}^T \mathbf{x} + b$$
Sigmoid Activation: The pre-activation value is passed through the sigmoid function to produce a probability: $$\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$$

Loss Function

We use the Mean Squared Error (MSE) loss to measure the difference between predictions and true labels: $$\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)^2$$

where N is the number of training samples, ŷᵢ is the predicted output, and yᵢ is the true binary label.

Gradient Descent Updates

Backpropagation computes the gradients of the loss with respect to each parameter. For a single sample, using the chain rule:

$$\frac{\partial \mathcal{L}}{\partial w_j} = \frac{2}{N} \sum_{i=1}^{N} (\hat{y}i - y_i) \cdot \sigma'(z_i) \cdot x{ij}$$

$$\frac{\partial \mathcal{L}}{\partial b} = \frac{2}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i) \cdot \sigma'(z_i)$$

Where the derivative of sigmoid is: $\sigma'(z) = \sigma(z) \cdot (1 - \sigma(z))$

The parameters are updated using: $$w_j \leftarrow w_j - \eta \cdot \frac{\partial \mathcal{L}}{\partial w_j}$$ $$b \leftarrow b - \eta \cdot \frac{\partial \mathcal{L}}{\partial b}$$

where η is the learning rate.

Your Task

Implement a function that trains a single neuron by:

Performing forward propagation for all training samples
Computing the MSE loss
Calculating gradients via backpropagation
Updating weights and bias using gradient descent
Repeating for the specified number of epochs

Return the updated weights, updated bias, and a list of MSE values (one per epoch), all rounded to four decimal places.

Epoch 2: • Process repeats with updated weights, yielding MSE ≈ 0.2942 • Final weights: [0.1036, -0.1425], bias: -0.0167

Training a Single Neuron with Gradient Descent

The Neuron Model

Loss Function

Gradient Descent Updates

Your Task

Hints

Training a Single Neuron with Gradient Descent

The Neuron Model

Loss Function

Gradient Descent Updates

Your Task

Hints