Loading problem...
Understanding how a single neuron learns is the foundation of all neural network training. In this problem, you will implement the complete training loop for a single artificial neuron (also called a perceptron with sigmoid activation), including forward propagation, loss computation, and backpropagation with gradient descent.
A single neuron performs the following computation:
Weighted Sum (Pre-activation): For an input feature vector x with weights w and bias b: $$z = \sum_{i=1}^{n} w_i \cdot x_i + b = \mathbf{w}^T \mathbf{x} + b$$
Sigmoid Activation: The pre-activation value is passed through the sigmoid function to produce a probability: $$\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$$
We use the Mean Squared Error (MSE) loss to measure the difference between predictions and true labels: $$\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)^2$$
where N is the number of training samples, ŷᵢ is the predicted output, and yᵢ is the true binary label.
Backpropagation computes the gradients of the loss with respect to each parameter. For a single sample, using the chain rule:
$$\frac{\partial \mathcal{L}}{\partial w_j} = \frac{2}{N} \sum_{i=1}^{N} (\hat{y}i - y_i) \cdot \sigma'(z_i) \cdot x{ij}$$
$$\frac{\partial \mathcal{L}}{\partial b} = \frac{2}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i) \cdot \sigma'(z_i)$$
Where the derivative of sigmoid is: $\sigma'(z) = \sigma(z) \cdot (1 - \sigma(z))$
The parameters are updated using: $$w_j \leftarrow w_j - \eta \cdot \frac{\partial \mathcal{L}}{\partial w_j}$$ $$b \leftarrow b - \eta \cdot \frac{\partial \mathcal{L}}{\partial b}$$
where η is the learning rate.
Implement a function that trains a single neuron by:
Return the updated weights, updated bias, and a list of MSE values (one per epoch), all rounded to four decimal places.
features = [[1.0, 2.0], [2.0, 1.0], [-1.0, -2.0]]
labels = [1, 0, 0]
initial_weights = [0.1, -0.2]
initial_bias = 0.0
learning_rate = 0.1
epochs = 2updated_weights = [0.1036, -0.1425]
updated_bias = -0.0167
mse_values = [0.3033, 0.2942]Epoch 1: • For each sample, compute z = w·x + b, then ŷ = sigmoid(z) • Sample 1: z = 0.1×1 + (-0.2)×2 + 0 = -0.3, ŷ = sigmoid(-0.3) ≈ 0.426 • Sample 2: z = 0.1×2 + (-0.2)×1 + 0 = 0, ŷ = sigmoid(0) = 0.5 • Sample 3: z = 0.1×(-1) + (-0.2)×(-2) + 0 = 0.3, ŷ = sigmoid(0.3) ≈ 0.574 • MSE = [(0.426-1)² + (0.5-0)² + (0.574-0)²] / 3 ≈ 0.3033 • Gradients are computed and parameters updated using learning rate 0.1
Epoch 2: • Process repeats with updated weights, yielding MSE ≈ 0.2942 • Final weights: [0.1036, -0.1425], bias: -0.0167
features = [[1.0]]
labels = [1]
initial_weights = [0.5]
initial_bias = 0.0
learning_rate = 0.1
epochs = 3updated_weights = [0.5516]
updated_bias = 0.0516
mse_values = [0.1425, 0.1363, 0.1305]With a single training sample where the feature is 1.0 and label is 1: • The neuron starts with weight 0.5 and bias 0.0 • Initial prediction: sigmoid(0.5×1 + 0) = sigmoid(0.5) ≈ 0.622 • The error is (0.622 - 1) = -0.378, so the neuron needs to increase its output • Over 3 epochs, the weight and bias are gradually increased • MSE decreases from 0.1425 → 0.1363 → 0.1305, showing learning progress • Each epoch nudges the weights to reduce the prediction error
features = [[1.0, 1.0], [2.0, 2.0], [-1.0, -1.0], [-2.0, -2.0]]
labels = [1, 1, 0, 0]
initial_weights = [0.0, 0.0]
initial_bias = 0.0
learning_rate = 0.5
epochs = 5updated_weights = [0.4937, 0.4937]
updated_bias = 0.0
mse_values = [0.25, 0.1344, 0.0874, 0.0653, 0.0526]This is a symmetric binary classification problem: • Positive samples have positive feature values, negative samples have negative values • Starting from zero weights, the neuron outputs 0.5 for all samples (sigmoid(0) = 0.5) • Initial MSE = [(0.5-1)² + (0.5-1)² + (0.5-0)² + (0.5-0)²] / 4 = 0.25 • Due to symmetry, the bias remains 0.0 throughout training • Both weights converge to the same value (0.4937) since features are symmetric • The MSE steadily decreases from 0.25 to 0.0526 over 5 epochs • Higher learning rate (0.5) enables faster convergence
Constraints