Loading content...
The sigmoid function (also called the logistic function) is one of the most fundamental building blocks in machine learning and neural networks. It transforms any real-valued number into a probability between 0 and 1, making it ideal for binary classification tasks.
The sigmoid function is mathematically defined as:
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$
This S-shaped curve has several elegant properties:
In a binary classification setting using logistic regression, predictions follow a two-step process:
For each input sample x (a feature vector), compute the linear combination:
$$z = \mathbf{x} \cdot \mathbf{w} + b$$
Where:
For a dataset matrix X with multiple samples (rows), this becomes:
$$\mathbf{z} = X\mathbf{w} + b$$
Transform the linear combination using the sigmoid function:
$$p = \sigma(z) = \frac{1}{1 + e^{-z}}$$
Finally, convert probabilities to binary predictions using a decision threshold of 0.5:
$$\hat{y} = \begin{cases} 1 & \text{if } p \geq 0.5 \ 0 & \text{if } p < 0.5 \end{cases}$$
Implement a function that performs binary classification using the sigmoid activation function. Given a matrix of input samples X, a weight vector w, and a bias term b, compute the binary class predictions for all samples.
The function should:
X = [[1, 1], [2, 2], [-1, -1], [-2, -2]]
w = [1, 1]
b = 0[1, 1, 0, 0]Computing the linear combination z = Xw + b for each sample:
• Sample [1, 1]: z = (1×1) + (1×1) + 0 = 2 • Sample [2, 2]: z = (2×1) + (2×1) + 0 = 4 • Sample [-1, -1]: z = (-1×1) + (-1×1) + 0 = -2 • Sample [-2, -2]: z = (-2×1) + (-2×1) + 0 = -4
Applying the sigmoid function σ(z) = 1/(1 + e^(-z)):
• σ(2) ≈ 0.88 → classified as 1 (since 0.88 ≥ 0.5) • σ(4) ≈ 0.98 → classified as 1 (since 0.98 ≥ 0.5) • σ(-2) ≈ 0.12 → classified as 0 (since 0.12 < 0.5) • σ(-4) ≈ 0.02 → classified as 0 (since 0.02 < 0.5)
The weight vector [1, 1] gives equal importance to both features. Samples with positive feature sums get classified as 1, while those with negative sums get classified as 0.
X = [[0, 0], [1, 0], [0, 1]]
w = [0.5, 0.5]
b = 0.5[1, 1, 1]Computing the linear combination z = Xw + b for each sample:
• Sample [0, 0]: z = (0×0.5) + (0×0.5) + 0.5 = 0.5 • Sample [1, 0]: z = (1×0.5) + (0×0.5) + 0.5 = 1.0 • Sample [0, 1]: z = (0×0.5) + (1×0.5) + 0.5 = 1.0
Applying the sigmoid function:
• σ(0.5) ≈ 0.62 → classified as 1 (since 0.62 ≥ 0.5) • σ(1.0) ≈ 0.73 → classified as 1 (since 0.73 ≥ 0.5) • σ(1.0) ≈ 0.73 → classified as 1 (since 0.73 ≥ 0.5)
The positive bias term (b = 0.5) shifts the decision boundary, making the model more inclined to predict class 1. Even [0, 0] gets classified as positive because σ(0.5) > 0.5.
X = [[2, 2], [1, 1], [0, 0], [-1, -1]]
w = [1, 1]
b = -2[1, 1, 0, 0]Computing the linear combination z = Xw + b for each sample:
• Sample [2, 2]: z = (2×1) + (2×1) + (-2) = 2 • Sample [1, 1]: z = (1×1) + (1×1) + (-2) = 0 • Sample [0, 0]: z = (0×1) + (0×1) + (-2) = -2 • Sample [-1, -1]: z = (-1×1) + (-1×1) + (-2) = -4
Applying the sigmoid function:
• σ(2) ≈ 0.88 → classified as 1 • σ(0) = 0.50 → classified as 1 (exactly at the boundary, treated as positive) • σ(-2) ≈ 0.12 → classified as 0 • σ(-4) ≈ 0.02 → classified as 0
The negative bias (b = -2) shifts the decision boundary, requiring larger feature values to exceed the 0.5 probability threshold. Note that when z = 0, σ(0) = 0.5 exactly, which meets the threshold for class 1.
Constraints