Loading content...
Every classifier must ultimately make a decision: class 0 or class 1? In logistic regression, this decision crystallizes at the decision boundary—a geometric surface in feature space that separates regions predicted as one class from regions predicted as the other.
Understanding decision boundaries transforms logistic regression from an abstract probability machine into a geometric concept you can visualize and reason about. You'll see exactly where the model is confident, where it's uncertain, and why certain points get classified as they do.
This geometric perspective is not just intellectually satisfying—it's practically essential for debugging models, understanding their limitations, and knowing when to choose more complex alternatives.
By the end of this page, you will understand: (1) the mathematical definition of the decision boundary, (2) how to visualize boundaries in 2D and understand them in higher dimensions, (3) the relationship between distance from boundary and prediction confidence, (4) the concept of margin and its connection to generalization, and (5) the inherent limitations of linear decision boundaries.
The decision boundary is the set of all points in feature space where the classifier is exactly undecided—where $P(Y=1|\mathbf{x}) = P(Y=0|\mathbf{x}) = 0.5$.
Derivation
Since $P(Y=1|\mathbf{x}) = \sigma(\mathbf{w}^T\mathbf{x} + b)$ and $\sigma(0) = 0.5$, the decision boundary occurs when:
$$\mathbf{w}^T\mathbf{x} + b = 0$$
Expanding in terms of individual features:
$$w_1 x_1 + w_2 x_2 + \cdots + w_d x_d + b = 0$$
This is the equation of a hyperplane in $\mathbb{R}^d$.
Hyperplane Terminology
Key Properties
| Dimensions (d) | Boundary Type | Equation | Example |
|---|---|---|---|
| 1 | Point | w₁x₁ + b = 0 → x₁ = -b/w₁ | Age threshold for approval |
| 2 | Line | w₁x₁ + w₂x₂ + b = 0 | Height-weight classification |
| 3 | Plane | w₁x₁ + w₂x₂ + w₃x₃ + b = 0 | RGB color classification |
| d | (d-1)-hyperplane | w^T x + b = 0 | High-dimensional text classification |
Prediction Rule Based on Boundary
The classification rule becomes simple:
$$\hat{y} = \begin{cases} 1 & \text{if } \mathbf{w}^T\mathbf{x} + b > 0 \ 0 & \text{if } \mathbf{w}^T\mathbf{x} + b < 0 \end{cases}$$
(Points exactly on the boundary are typically assigned to class 1 by convention.)
The side of the boundary determines the class. The distance from the boundary determines the confidence.
The linear boundary is a direct consequence of using a linear combination (w^T x + b) as input to the sigmoid. To get nonlinear boundaries, we must either (1) engineer nonlinear features, (2) use kernel methods, or (3) use inherently nonlinear models like neural networks or decision trees.
Two-dimensional feature spaces allow us to see decision boundaries directly. This visualization builds intuition that transfers to higher dimensions.
Anatomy of a 2D Decision Boundary
For features $(x_1, x_2)$ with model parameters $(w_1, w_2, b)$:
$$w_1 x_1 + w_2 x_2 + b = 0$$
Rearranging to slope-intercept form:
$$x_2 = -\frac{w_1}{w_2} x_1 - \frac{b}{w_2}$$
This is a line with:
The Weight Vector as Normal
The weight vector $\mathbf{w} = (w_1, w_2)^T$ is always perpendicular to the decision line. To see this, note that for any two points $\mathbf{x}_a$ and $\mathbf{x}_b$ on the boundary:
$$\mathbf{w}^T\mathbf{x}_a = -b = \mathbf{w}^T\mathbf{x}_b$$
Therefore: $$\mathbf{w}^T(\mathbf{x}_a - \mathbf{x}_b) = 0$$
The vector $\mathbf{w}$ is orthogonal to any vector lying in the boundary—exactly what 'normal' means.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import make_classification # Generate 2D datanp.random.seed(42)X, y = make_classification(n_samples=200, n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, class_sep=1.5, random_state=42) # Fit logistic regressionmodel = LogisticRegression()model.fit(X, y) # Extract parametersw = model.coef_[0]b = model.intercept_[0] print("Decision Boundary Analysis")print("=" * 50)print(f"Weight vector w: [{w[0]:.4f}, {w[1]:.4f}]")print(f"Bias b: {b:.4f}")print(f"Decision boundary equation: {w[0]:.3f}*x₁ + {w[1]:.3f}*x₂ + {b:.3f} = 0")print(f"Slope: {-w[0]/w[1]:.4f}")print(f"y-intercept: {-b/w[1]:.4f}") # Create visualizationfig, axes = plt.subplots(1, 2, figsize=(14, 6)) # Plot 1: Decision boundary with dataax1 = axes[0]x1_range = np.linspace(X[:, 0].min() - 1, X[:, 0].max() + 1, 200)x2_range = np.linspace(X[:, 1].min() - 1, X[:, 1].max() + 1, 200)X1, X2 = np.meshgrid(x1_range, x2_range)Z = model.predict_proba(np.c_[X1.ravel(), X2.ravel()])[:, 1].reshape(X1.shape) # Probability contourscontour = ax1.contourf(X1, X2, Z, levels=20, cmap='RdBu_r', alpha=0.6)plt.colorbar(contour, ax=ax1, label='P(Y=1)') # Decision boundary (z = 0)ax1.contour(X1, X2, Z, levels=[0.5], colors='black', linewidths=2) # Data pointsax1.scatter(X[y==0, 0], X[y==0, 1], c='blue', label='Class 0', edgecolors='k', s=50)ax1.scatter(X[y==1, 0], X[y==1, 1], c='red', label='Class 1', edgecolors='k', s=50) # Weight vector (scaled for visibility)center = np.array([0, -b/w[1]]) # Point on boundaryscale = 1.5ax1.quiver(center[0], center[1], w[0]*scale, w[1]*scale, color='green', scale=5, width=0.015, label='w (normal)', zorder=10) ax1.set_xlabel('x₁')ax1.set_ylabel('x₂')ax1.set_title('Decision Boundary with Probability Contours')ax1.legend(loc='upper left') # Plot 2: Confidence bandsax2 = axes[1]ax2.contourf(X1, X2, Z, levels=[0, 0.25, 0.5, 0.75, 1.0], colors=['#2166ac', '#92c5de', '#f4a582', '#b2182b'], alpha=0.7)ax2.contour(X1, X2, Z, levels=[0.25, 0.5, 0.75], colors='black', linewidths=1, linestyles=['--', '-', '--'])ax2.scatter(X[y==0, 0], X[y==0, 1], c='blue', edgecolors='k', s=30)ax2.scatter(X[y==1, 0], X[y==1, 1], c='red', edgecolors='k', s=30)ax2.set_xlabel('x₁')ax2.set_ylabel('x₂')ax2.set_title('Confidence Bands (P<0.25, 0.25-0.5, 0.5-0.75, P>0.75)') plt.tight_layout()plt.savefig('decision_boundary_2d.png', dpi=150)plt.show()The probability contours (iso-probability lines) are parallel to the decision boundary. Moving perpendicular to the boundary (in the w direction) changes probability most rapidly. Moving parallel to the boundary doesn't change probability at all. The contours are equally spaced in log-odds, but compressed near 0 and 1 in probability.
The distance from a point to the decision boundary directly determines how confident the model is in its prediction. This relationship is fundamental to understanding logistic regression's behavior.
Signed Distance to the Boundary
For a point $\mathbf{x}$, the signed distance to the decision boundary is:
$$d(\mathbf{x}) = \frac{\mathbf{w}^T\mathbf{x} + b}{|\mathbf{w}|}$$
where $|\mathbf{w}| = \sqrt{w_1^2 + w_2^2 + \cdots + w_d^2}$ is the Euclidean norm of the weight vector.
Properties of the Signed Distance
From Distance to Probability
The probability prediction depends on $z = \mathbf{w}^T\mathbf{x} + b = |\mathbf{w}| \cdot d$. Thus:
$$P(Y=1|\mathbf{x}) = \sigma(|\mathbf{w}| \cdot d)$$
Larger $|\mathbf{w}|$ means the same distance produces more confident predictions. This is why regularization (which shrinks $|\mathbf{w}|$) produces gentler, less overconfident probability estimates.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
import numpy as npimport matplotlib.pyplot as plt def sigmoid(z): return 1 / (1 + np.exp(-z)) # Model parametersw = np.array([2.0, 1.0])b = -1.0w_norm = np.linalg.norm(w) def signed_distance(x, w, b): """Compute signed distance from point to decision boundary.""" return (w @ x + b) / np.linalg.norm(w) def probability_from_distance(d, w_norm): """Convert signed distance to probability.""" z = w_norm * d return sigmoid(z) # Analyze relationship between distance and probabilitydistances = np.linspace(-3, 3, 100)probabilities = probability_from_distance(distances, w_norm) # Create visualizationfig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Left: Distance vs Probabilityax1 = axes[0]ax1.plot(distances, probabilities, 'b-', linewidth=2, label=f'||w|| = {w_norm:.2f}') # Show effect of different ||w||for w_scale in [0.5, 1.0, 2.0]: probs = probability_from_distance(distances, w_norm * w_scale) ax1.plot(distances, probs, '--', alpha=0.7, label=f'||w|| = {w_norm * w_scale:.2f}') ax1.axhline(y=0.5, color='gray', linestyle=':', alpha=0.5)ax1.axvline(x=0, color='gray', linestyle=':', alpha=0.5)ax1.fill_between([-3, 0], 0, 1, alpha=0.1, color='blue', label='Class 0 side')ax1.fill_between([0, 3], 0, 1, alpha=0.1, color='red', label='Class 1 side')ax1.set_xlabel('Signed Distance from Boundary')ax1.set_ylabel('P(Y=1)')ax1.set_title('Distance to Boundary vs Prediction Confidence')ax1.legend(loc='upper left')ax1.grid(True, alpha=0.3)ax1.set_ylim(0, 1) # Right: Sample points with their distances and probabilitiesax2 = axes[1] # Generate some pointsnp.random.seed(42)test_points = np.array([ [0, 0.5], # Near boundary [1, 0], # On class 1 side [-1, 0], # On class 0 side [2, 1], # Far on class 1 side [-2, -1], # Far on class 0 side]) # Plot decision boundaryx1_line = np.linspace(-3, 3, 100)x2_line = -(w[0] * x1_line + b) / w[1]ax2.plot(x1_line, x2_line, 'k-', linewidth=2, label='Boundary') # Plot points with color by probabilityfor i, pt in enumerate(test_points): d = signed_distance(pt, w, b) p = sigmoid(w @ pt + b) color = plt.cm.RdBu_r(p) ax2.scatter(pt[0], pt[1], c=[color], s=150, edgecolors='k', zorder=5) ax2.annotate(f'd={d:.2f}P={p:.2f}', (pt[0]+0.1, pt[1]+0.15), fontsize=9) # Draw normal vectororigin = np.array([0, -b/w[1]])w_normalized = w / w_normax2.quiver(origin[0], origin[1], w_normalized[0]*1.5, w_normalized[1]*1.5, color='green', scale=5, width=0.02, label='Normal (w/||w||)') ax2.set_xlim(-3, 3)ax2.set_ylim(-3, 3)ax2.set_xlabel('x₁')ax2.set_ylabel('x₂')ax2.set_title('Sample Points: Distance and Probability')ax2.legend(loc='upper left')ax2.set_aspect('equal')ax2.grid(True, alpha=0.3) plt.tight_layout()plt.savefig('distance_confidence.png', dpi=150)plt.show() # Print summaryprint("Distance-Probability Relationship")print("=" * 50)print(f"Weight vector: {w}")print(f"||w||: {w_norm:.4f}")print(f"For various distances:")for d in [-2, -1, -0.5, 0, 0.5, 1, 2]: p = probability_from_distance(d, w_norm) print(f" Distance {d:>5.1f} → Probability {p:.4f}")The magnitude ||w|| controls how 'sharp' the probability transition is across the boundary. Large ||w|| → sharp transition (confident predictions close to boundary). Small ||w|| → gradual transition (uncertain predictions far from boundary). Regularization reduces ||w||, producing softer, more calibrated probability estimates.
The margin is a fundamental concept linking logistic regression to Support Vector Machines (SVMs) and broader notions of classifier robustness.
Functional Margin
The functional margin of a point $(\mathbf{x}_i, y_i)$ is:
$$\gamma_i^{(f)} = y_i^* \cdot (\mathbf{w}^T\mathbf{x}_i + b)$$
where $y_i^* = 2y_i - 1 \in {-1, +1}$ converts labels to $\pm 1$.
A positive functional margin means correct classification; negative means incorrect.
Geometric Margin
The geometric margin is the functional margin normalized by $|\mathbf{w}|$:
$$\gamma_i^{(g)} = \frac{y_i^* \cdot (\mathbf{w}^T\mathbf{x}_i + b)}{|\mathbf{w}|}$$
This equals the signed distance from $\mathbf{x}_i$ to the boundary (with sign indicating correctness).
Minimum Margin
The minimum margin of a classifier is the smallest margin among all training points:
$$\gamma_{\min} = \min_i \gamma_i^{(g)}$$
This represents the 'safety buffer' of the classifier—how close the nearest correctly classified point is to the boundary.
Logistic Regression vs. Maximum Margin (SVM)
Logistic regression maximizes likelihood (equivalently, minimizes log-loss), which implicitly considers all points. SVMs explicitly maximize the minimum margin, focusing only on the nearest points (support vectors).
With L2 regularization, logistic regression tends toward larger margins, but it's a softer constraint than SVM's hard margin maximization. This leads to differences:
| Aspect | Logistic Regression | SVM |
|---|---|---|
| Objective | Maximize likelihood | Maximize margin |
| All points matter | Yes (weighted by error) | Only support vectors |
| Probabilistic output | Yes (calibrated) | No (requires calibration) |
| Margin enforcement | Soft (via regularization) | Hard (primal constraint) |
L2 regularization equivalently maximizes margin. Minimizing ||w||² while maintaining correct predictions is exactly maximizing the geometric margin. This is why well-regularized logistic regression and SVM often produce similar decision boundaries.
While 2D visualizations build intuition, real problems typically have many more dimensions. The key insights transfer directly, though visualization becomes impossible.
The Hyperplane in $\mathbb{R}^d$
The decision boundary $\mathbf{w}^T\mathbf{x} + b = 0$ defines a $(d-1)$-dimensional hyperplane:
Though we can't visualize these, their properties—linearity, normal vector $\mathbf{w}$, distance formulas—all hold.
Understanding Without Visualizing
Even when visualization fails, we can understand the boundary through:
Feature importance: Which $w_j$ are large tells us which features most influence the boundary orientation
Distance distributions: Plotting the distribution of distances from boundary for each class reveals how well-separated they are
2D projections: Projecting data onto the $\mathbf{w}$ direction (and one orthogonal direction) shows how the boundary separates classes
Probability distributions: Examining predicted probability distributions for each class shows separation quality
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import make_classification # Generate high-dimensional datanp.random.seed(42)n_samples = 500n_features = 50X, y = make_classification(n_samples=n_samples, n_features=n_features, n_informative=10, n_redundant=5, class_sep=1.0, random_state=42) # Fit modelmodel = LogisticRegression(C=1.0, max_iter=1000)model.fit(X, y) w = model.coef_[0]b = model.intercept_[0]w_norm = np.linalg.norm(w) # Compute distances from boundary for all pointsz = X @ w + bdistances = z / w_normprobabilities = 1 / (1 + np.exp(-z)) # Analysisfig, axes = plt.subplots(2, 2, figsize=(12, 10)) # Plot 1: Feature importance (|w|)ax1 = axes[0, 0]importance_order = np.argsort(np.abs(w))[::-1]top_k = 20ax1.barh(range(top_k), np.abs(w[importance_order[:top_k]])[::-1])ax1.set_yticks(range(top_k))ax1.set_yticklabels([f'Feature {importance_order[top_k-1-i]}' for i in range(top_k)])ax1.set_xlabel('|Coefficient|')ax1.set_title(f'Top {top_k} Feature Importances (of {n_features})') # Plot 2: Distance distributions by classax2 = axes[0, 1]ax2.hist(distances[y==0], bins=30, alpha=0.7, label='Class 0', color='blue')ax2.hist(distances[y==1], bins=30, alpha=0.7, label='Class 1', color='red')ax2.axvline(x=0, color='black', linestyle='--', linewidth=2, label='Boundary')ax2.set_xlabel('Signed Distance from Boundary')ax2.set_ylabel('Count')ax2.set_title('Distance Distribution by Class')ax2.legend() # Plot 3: 1D projection onto wax3 = axes[1, 0]projection = X @ w / w_norm # Project onto unit normalax3.scatter(projection[y==0], np.zeros(sum(y==0)) + np.random.randn(sum(y==0))*0.1, alpha=0.5, c='blue', label='Class 0', s=20)ax3.scatter(projection[y==1], np.zeros(sum(y==1)) + np.random.randn(sum(y==1))*0.1 + 1, alpha=0.5, c='red', label='Class 1', s=20)ax3.axvline(x=-b/w_norm, color='black', linestyle='--', linewidth=2)ax3.set_xlabel('Projection onto w Direction')ax3.set_ylabel('Class (with jitter)')ax3.set_title('Data Projected onto Normal Direction')ax3.legend() # Plot 4: Probability distributionsax4 = axes[1, 1]ax4.hist(probabilities[y==0], bins=30, alpha=0.7, label='Class 0', color='blue')ax4.hist(probabilities[y==1], bins=30, alpha=0.7, label='Class 1', color='red')ax4.axvline(x=0.5, color='black', linestyle='--', linewidth=2, label='P=0.5')ax4.set_xlabel('Predicted Probability P(Y=1)')ax4.set_ylabel('Count')ax4.set_title('Probability Distribution by Class')ax4.legend() plt.tight_layout()plt.savefig('high_dimensional_analysis.png', dpi=150)plt.show() # Summary statisticsprint(f"High-Dimensional Boundary Analysis (d={n_features})")print("=" * 60)print(f"||w||: {w_norm:.4f}")print(f"Number of features with |w| > 0.1: {sum(np.abs(w) > 0.1)}")print(f"Class separation:")print(f" Mean distance (Class 0): {distances[y==0].mean():.4f}")print(f" Mean distance (Class 1): {distances[y==1].mean():.4f}")print(f" Minimum margin: {min(distances[y==1].min(), -distances[y==0].max()):.4f}")In very high dimensions, interesting phenomena occur. Most points lie near the surface of the hypercube. Distances between random points concentrate around the mean. These effects make high-dimensional classification both easier (more room to separate) and harder (fewer points per region). Understanding the boundary analytically becomes essential when visualization fails.
The linearity of logistic regression's decision boundary is both a strength (interpretability, stability) and a limitation (inability to capture complex patterns). Understanding when linearity fails is crucial for model selection.
The XOR Problem
The classic example is the XOR (exclusive or) pattern:
| $x_1$ | $x_2$ | $y$ |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
No single line can separate the two classes. This is the archetypal linearly non-separable dataset.
Types of Nonlinear Patterns
Recognizing Linear Inseparability
Signs that a linear boundary may be insufficient:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LogisticRegressionfrom sklearn.preprocessing import PolynomialFeatures # Generate XOR-like datanp.random.seed(42)n = 200X_xor = np.vstack([ np.random.randn(n//4, 2) + [1, 1], np.random.randn(n//4, 2) + [-1, -1], np.random.randn(n//4, 2) + [1, -1], np.random.randn(n//4, 2) + [-1, 1],])y_xor = np.array([0, 0, 1, 1] * (n//4)) # Generate concentric circlesfrom sklearn.datasets import make_circlesX_circles, y_circles = make_circles(n_samples=n, noise=0.1, factor=0.5, random_state=42) # Create figurefig, axes = plt.subplots(2, 3, figsize=(14, 9)) datasets = [ (X_xor, y_xor, 'XOR Pattern'), (X_circles, y_circles, 'Concentric Circles'),] for row, (X, y, title) in enumerate(datasets): # Original with linear boundary ax = axes[row, 0] model_linear = LogisticRegression() model_linear.fit(X, y) acc_linear = model_linear.score(X, y) x1r = np.linspace(X[:,0].min()-0.5, X[:,0].max()+0.5, 100) x2r = np.linspace(X[:,1].min()-0.5, X[:,1].max()+0.5, 100) X1, X2 = np.meshgrid(x1r, x2r) Z = model_linear.predict(np.c_[X1.ravel(), X2.ravel()]).reshape(X1.shape) ax.contourf(X1, X2, Z, alpha=0.3, cmap='RdBu') ax.scatter(X[y==0,0], X[y==0,1], c='blue', s=20, edgecolors='k') ax.scatter(X[y==1,0], X[y==1,1], c='red', s=20, edgecolors='k') ax.set_title(f'{title}Linear: {acc_linear:.1%}') ax.set_aspect('equal') # With polynomial features (degree 2) ax = axes[row, 1] poly = PolynomialFeatures(degree=2) X_poly = poly.fit_transform(X) model_poly2 = LogisticRegression(max_iter=1000) model_poly2.fit(X_poly, y) acc_poly2 = model_poly2.score(X_poly, y) Z_poly = model_poly2.predict(poly.transform(np.c_[X1.ravel(), X2.ravel()])).reshape(X1.shape) ax.contourf(X1, X2, Z_poly, alpha=0.3, cmap='RdBu') ax.scatter(X[y==0,0], X[y==0,1], c='blue', s=20, edgecolors='k') ax.scatter(X[y==1,0], X[y==1,1], c='red', s=20, edgecolors='k') ax.set_title(f'Poly Degree 2: {acc_poly2:.1%}') ax.set_aspect('equal') # With polynomial features (degree 3) ax = axes[row, 2] poly3 = PolynomialFeatures(degree=3) X_poly3 = poly3.fit_transform(X) model_poly3 = LogisticRegression(max_iter=1000, C=10) model_poly3.fit(X_poly3, y) acc_poly3 = model_poly3.score(X_poly3, y) Z_poly3 = model_poly3.predict(poly3.transform(np.c_[X1.ravel(), X2.ravel()])).reshape(X1.shape) ax.contourf(X1, X2, Z_poly3, alpha=0.3, cmap='RdBu') ax.scatter(X[y==0,0], X[y==0,1], c='blue', s=20, edgecolors='k') ax.scatter(X[y==1,0], X[y==1,1], c='red', s=20, edgecolors='k') ax.set_title(f'Poly Degree 3: {acc_poly3:.1%}') ax.set_aspect('equal') plt.suptitle('Linear Limitations and Polynomial Feature Solutions', fontsize=14, y=1.02)plt.tight_layout()plt.savefig('linear_limitations.png', dpi=150)plt.show()Polynomial features, interaction terms, and domain-specific transformations can make logistic regression capture nonlinear patterns. But this increases dimensionality exponentially (degree-d polynomials in p features → O(p^d) features), risking overfitting and computational explosion. For complex nonlinear patterns, consider kernel methods, neural networks, or tree-based models.
By default, we classify as class 1 when $P(Y=1|\mathbf{x}) > 0.5$. But this threshold can be adjusted to balance different types of errors.
The Standard Threshold
With threshold $\tau = 0.5$: $$\hat{y} = \begin{cases} 1 & \text{if } P(Y=1|\mathbf{x}) > 0.5 \ 0 & \text{otherwise} \end{cases}$$
This corresponds to classifying based on which class has higher probability—the Bayes optimal rule when costs are equal.
Adjusting the Threshold
With threshold $\tau eq 0.5$: $$\hat{y} = \begin{cases} 1 & \text{if } P(Y=1|\mathbf{x}) > \tau \ 0 & \text{otherwise} \end{cases}$$
Effect on Decision Boundary
Changing $\tau$ shifts the effective decision boundary:
Mathematically, predicting class 1 when $\sigma(z) > \tau$ is equivalent to: $$z > \sigma^{-1}(\tau) = \log\left(\frac{\tau}{1-\tau}\right)$$
The new boundary is $z = \sigma^{-1}(\tau)$ instead of $z = 0$.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score # Generate imbalanced datanp.random.seed(42)n = 1000X = np.random.randn(n, 2)# Imbalanced: 20% class 1z = 0.8 * X[:, 0] - 0.5 * X[:, 1] - 1.5 # Shifted to create imbalancey = (1 / (1 + np.exp(-z)) > np.random.rand(n)).astype(int) print(f"Class distribution: Class 0: {sum(y==0)}, Class 1: {sum(y==1)}") # Fit modelmodel = LogisticRegression()model.fit(X, y) probabilities = model.predict_proba(X)[:, 1] # Evaluate at different thresholdsthresholds = [0.3, 0.5, 0.7]fig, axes = plt.subplots(1, 3, figsize=(15, 5)) print("Metrics at Different Thresholds")print("=" * 70)print(f"{'Threshold':<12} | {'Precision':>10} | {'Recall':>10} | {'F1':>10} | {'TP':>6} | {'FP':>6}")print("-" * 70) for i, tau in enumerate(thresholds): predictions = (probabilities > tau).astype(int) precision = precision_score(y, predictions, zero_division=0) recall = recall_score(y, predictions, zero_division=0) f1 = f1_score(y, predictions, zero_division=0) cm = confusion_matrix(y, predictions) print(f"{tau:<12} | {precision:>10.4f} | {recall:>10.4f} | {f1:>10.4f} | {cm[1,1]:>6} | {cm[0,1]:>6}") # Visualize ax = axes[i] x1r = np.linspace(X[:,0].min()-1, X[:,0].max()+1, 100) x2r = np.linspace(X[:,1].min()-1, X[:,1].max()+1, 100) X1, X2 = np.meshgrid(x1r, x2r) Z = model.predict_proba(np.c_[X1.ravel(), X2.ravel()])[:, 1].reshape(X1.shape) ax.contourf(X1, X2, Z > tau, alpha=0.3, cmap='RdBu') ax.contour(X1, X2, Z, levels=[tau], colors='black', linewidths=2) ax.scatter(X[y==0,0], X[y==0,1], c='blue', s=15, alpha=0.6, label='Class 0') ax.scatter(X[y==1,0], X[y==1,1], c='red', s=15, alpha=0.6, label='Class 1') ax.set_title(f'Threshold τ = {tau}Precision={precision:.2f}, Recall={recall:.2f}') ax.set_xlabel('x₁') ax.set_ylabel('x₂') if i == 0: ax.legend(loc='upper left') plt.tight_layout()plt.savefig('threshold_adjustment.png', dpi=150)plt.show() # Log-odds of different thresholdsprint("Log-odds (z) at different thresholds:")for tau in [0.1, 0.3, 0.5, 0.7, 0.9]: z_thresh = np.log(tau / (1 - tau)) print(f" τ = {tau}: z = {z_thresh:.4f}")Adjust the threshold when: (1) Classes are imbalanced and the minority class matters more, (2) False positives and false negatives have different costs (e.g., medical diagnosis), (3) You need to meet a specific precision or recall target. Use ROC curves and precision-recall curves to select the optimal threshold for your use case.
We've explored the decision boundary from multiple perspectives—mathematical, geometric, and practical. This geometric understanding is essential for reasoning about classifier behavior and limitations. Let's consolidate the key insights:
What's Next:
Having understood the decision boundary, we conclude this module with the probabilistic interpretation page—examining how logistic regression produces calibrated probabilities, what calibration means, and why probabilistic outputs are often more valuable than hard classifications.
You now have a thorough geometric understanding of logistic regression's decision boundary—how it's defined, visualized, and related to prediction confidence. This perspective is invaluable for model interpretation, debugging, and knowing when linear models are insufficient.