Logistic Regression Model - Learning Module

Loading content...

0/245

Log-Odds Interpretation

The Hidden Linear Model

Logistic regression appears nonlinear—its predictions follow an S-shaped curve, and the relationship between features and predicted probabilities is decidedly non-linear. Yet beneath this curved surface lies a perfectly linear model.

The key insight is that logistic regression is linear on the log-odds scale. While probabilities range from 0 to 1 and odds range from 0 to infinity, log-odds span all real numbers—exactly the domain where linear models are natural.

This log-odds perspective transforms logistic regression from a 'magic squashing function' into an intuitive, interpretable model. Coefficients gain clear meaning: each unit increase in a feature multiplies the odds by a specific factor. Decision boundaries become linear hyperplanes. The entire framework suddenly makes sense.

What You Will Learn

By the end of this page, you will understand: (1) the mathematical relationships among probabilities, odds, and log-odds, (2) why log-odds linearize the logistic model, (3) how to interpret logistic regression coefficients as log-odds changes and odds multipliers, (4) the geometric meaning of odds ratios, and (5) why this interpretation matters for practical applications.

From Probabilities to Odds

Before diving into log-odds, we need a solid understanding of odds themselves—a concept that predates probability theory and remains central to gambling, epidemiology, and statistical inference.

Definition of Odds

Given a probability $p$ of an event occurring (where $0 < p < 1$), the odds of that event are defined as:

$$\text{odds} = \frac{p}{1-p}$$

This is the ratio of the probability of success to the probability of failure. If an event has probability 0.75, its odds are:

$$\text{odds} = \frac{0.75}{0.25} = 3$$

We often express this as "3 to 1 odds" or "odds of 3:1 in favor."

Intuitive Interpretation

Odds express how many times more likely success is compared to failure:

Odds = 1: Success and failure equally likely ($p = 0.5$)
Odds = 2: Success is twice as likely as failure ($p = 0.67$)
Odds = 9: Success is nine times as likely as failure ($p = 0.90$)
Odds = 0.5: Failure is twice as likely as success ($p = 0.33$)

The odds take values in $(0, \infty)$:

As $p \to 0$: odds $\to 0$
As $p \to 1$: odds $\to \infty$
At $p = 0.5$: odds $= 1$

Relationship Between Probabilities and Odds
Probability (p)	Odds (p/(1-p))	Odds Notation	Interpretation
0.01	0.0101	1:99	Highly unlikely (failure 99× more likely)
0.10	0.111	1:9	Unlikely (failure 9× more likely)
0.25	0.333	1:3	Unlikely (failure 3× more likely)
0.50	1.000	1:1	Equally likely (no preference)
0.75	3.000	3:1	Likely (success 3× more likely)
0.90	9.000	9:1	Very likely (success 9× more likely)
0.99	99.00	99:1	Near certain (success 99× more likely)

Converting Back from Odds to Probability

Given odds $\omega$, we can recover the probability:

$$p = \frac{\omega}{1 + \omega}$$

This is exactly the sigmoid function applied to $\log(\omega)$! The relationships form a complete system:

$$p \to \frac{p}{1-p} = \omega \to \log(\omega) = z \to \sigma(z) = p$$

Why Odds Matter

Odds have several advantages over probabilities in certain contexts:

Multiplicative effects are natural: Doubling the odds has a clear meaning; doubling a probability often doesn't make sense (what if $p = 0.75$?).
Symmetric treatment of outcomes: Odds of $k$ and odds of $1/k$ are equally 'extreme' in opposite directions.
Range matches linear predictors better: Probabilities are bounded; odds can take any positive value, making them easier to model linearly after a log transform.

Gambling Origins

The odds formulation comes from gambling, where payouts are calculated based on odds rather than probabilities. A fair bet at 3:1 odds means you risk $1 to potentially win $3, which is appropriate when the probability of winning is 0.25 (1 in 4 outcomes favorable). Understanding this historical context helps demystify the seemingly odd choice to work with odds.

The Log-Odds (Logit) Transformation

While odds improve upon probabilities for modeling purposes, their range $(0, \infty)$ is still asymmetric around a 'neutral' value. Taking the logarithm of odds—the log-odds or logit—creates a symmetric scale spanning all real numbers.

Definition of Log-Odds (Logit)

$$\text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \log(p) - \log(1-p)$$

The logit function is the inverse of the sigmoid:

$$\sigma(z) = p \iff z = \text{logit}(p)$$

Properties of the Log-Odds Scale

Domain and Range: logit maps $(0, 1) \to (-\infty, +\infty)$
Symmetry around zero: $\text{logit}(1-p) = -\text{logit}(p)$
- $\text{logit}(0.9) = 2.20$, $\text{logit}(0.1) = -2.20$
Zero at neutrality: $\text{logit}(0.5) = \log(1) = 0$
Additive structure: Multiplicative changes in odds become additive changes in log-odds
- If odds double, log-odds increase by $\log(2) \approx 0.693$
- If odds triple, log-odds increase by $\log(3) \approx 1.099$

log_odds_exploration.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
 
def odds(p):
    """Convert probability to odds."""
    return p / (1 - p)
 
def logit(p):
    """Convert probability to log-odds (logit)."""
    return np.log(p / (1 - p))
 
def sigmoid(z):
    """Convert log-odds back to probability."""
    return 1 / (1 + np.exp(-z))
 
# Explore the transformations
probabilities = np.array([0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99])
 
print("Complete Probability ↔ Odds ↔ Log-Odds Table")
print("=" * 70)
print(f"{'Probability':>12} | {'Odds':>12} | {'Log-Odds':>12} | {'Back to Prob':>14}")
print("-" * 70)
 
for p in probabilities:
    o = odds(p)
    lo = logit(p)
    p_back = sigmoid(lo)
    print(f"{p:>12.4f} | {o:>12.4f} | {lo:>12.4f} | {p_back:>14.4f}")
 
# Demonstrate additive property of log-odds
print("\n" + "=" * 70)
print("Log-Odds Additive Property: Doubling the Odds")
print("=" * 70)
 
p1 = 0.25
odds1 = odds(p1)
odds2 = odds1 * 2  # Double the odds
p2 = odds2 / (1 + odds2)
 
lo1 = logit(p1)
lo2 = logit(p2)
 
print(f"Initial: p = {p1:.4f}, odds = {odds1:.4f}, log-odds = {lo1:.4f}")
print(f"After doubling odds: p = {p2:.4f}, odds = {odds2:.4f}, log-odds = {lo2:.4f}")
print(f"Change in log-odds: {lo2 - lo1:.4f}")
print(f"log(2) = {np.log(2):.4f}")
print("→ Doubling odds adds log(2) ≈ 0.693 to log-odds")

The Central Insight

The log-odds scale is where logistic regression 'lives.' When we say a coefficient β = 0.7, we mean that a one-unit increase in the corresponding feature adds 0.7 to the log-odds, which is equivalent to multiplying the odds by e^0.7 ≈ 2. This multiplicative interpretation of odds is the key to understanding logistic regression coefficients.

The Logistic Regression Equation

We can now state the logistic regression model equation in its most illuminating form.

The Core Equation: Linear in Log-Odds

$$\log\left(\frac{P(Y=1|X)}{1-P(Y=1|X)}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p = \mathbf{w}^T \mathbf{x} + b$$

Or equivalently:

$$\text{logit}(P(Y=1|X)) = \mathbf{w}^T \mathbf{x} + b$$

This reveals the essential structure: logistic regression assumes log-odds are a linear function of the features. The nonlinearity comes entirely from inverting this relationship to obtain probabilities.

Three Equivalent Forms

The same model can be written in three equivalent ways, each revealing different aspects:

Form 1: Log-Odds (Logit) Form $$\log\left(\frac{p}{1-p}\right) = \mathbf{w}^T \mathbf{x} + b$$

Shows the linear structure directly. Coefficients are log-odds changes.

Form 2: Odds Form $$\frac{p}{1-p} = e^{\mathbf{w}^T \mathbf{x} + b} = e^{b} \cdot e^{w_1 x_1} \cdot e^{w_2 x_2} \cdots$$

Shows multiplicative structure. Coefficients are exponentiated to get odds multipliers.

Form 3: Probability Form $$p = P(Y=1|X) = \frac{e^{\mathbf{w}^T \mathbf{x} + b}}{1 + e^{\mathbf{w}^T \mathbf{x} + b}} = \sigma(\mathbf{w}^T \mathbf{x} + b)$$

The form we use for predictions. Shows the sigmoid transformation.

What's Linear

•Log-odds as function of features
•Decision boundary in feature space
•Coefficient interpretation (additive on log-odds scale)
•The relationship log(odds) = Xβ

What's Nonlinear

•Probability as function of features
•The effect of features on probability depends on current probability
•Marginal effects are position-dependent
•The S-curve relationship p = σ(Xβ)

Why This Matters

The log-odds formulation explains several key properties of logistic regression:

Bounded predictions: No matter how extreme the linear predictor, probabilities stay in (0, 1).
Interpretable coefficients: Each coefficient has a clear meaning in terms of odds.
Proper uncertainty: Near the decision boundary ($z \approx 0$), small changes in features produce noticeable probability changes. Far from the boundary (saturated regions), even large feature changes barely affect the probability—encoding appropriate confidence.
Multiplicative odds model: Effects multiply rather than add. If drug A doubles survival odds and drug B triples them (independently), together they multiply to 6× the odds.

The 'Generalized' in GLM

This structure—linear predictor transformed by a link function—is the essence of Generalized Linear Models (GLMs). The logit link maps from probabilities to a scale where linear models make sense. Other link functions (probit, log, identity) serve similar purposes for different response types.

Interpreting Coefficients as Odds Ratios

The log-odds formulation provides the foundation for interpreting logistic regression coefficients. Each coefficient $\beta_j$ tells us how the log-odds change per unit increase in $X_j$, holding other features constant.

The Coefficient as Log-Odds Change

Consider a single feature model: $\text{logit}(p) = \beta_0 + \beta_1 X$

When $X$ increases by 1 unit:

$$\text{logit}(p_{new}) = \beta_0 + \beta_1(X+1) = \text{logit}(p_{old}) + \beta_1$$

The log-odds increase by exactly $\beta_1$.

The Odds Ratio Interpretation

Exponentiating both sides:

$$\text{odds}{new} = e^{\beta_1} \times \text{odds}{old}$$

The quantity $e^{\beta_1}$ is called the odds ratio (OR). A one-unit increase in $X$ multiplies the odds by $e^{\beta_1}$.

Examples:

If $\beta_1 = 0$: OR = $e^0 = 1$ — no effect on odds
If $\beta_1 = 0.693$: OR = $e^{0.693} = 2$ — odds double
If $\beta_1 = -0.693$: OR = $e^{-0.693} = 0.5$ — odds halve
If $\beta_1 = 1.0$: OR = $e^1 \approx 2.72$ — odds nearly triple

coefficient_interpretation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
 
# Create synthetic data with known coefficients
np.random.seed(42)
n_samples = 10000
 
# Single feature example
X_single = np.random.randn(n_samples, 1)
true_beta = 1.0  # true coefficient
log_odds = -0.5 + true_beta * X_single.ravel()
probs = 1 / (1 + np.exp(-log_odds))
y = (np.random.rand(n_samples) < probs).astype(int)
 
# Fit model
model = LogisticRegression(fit_intercept=True, solver='lbfgs')
model.fit(X_single, y)
 
print("Coefficient Interpretation Demo")
print("=" * 60)
print(f"True coefficient:      β₁ = {true_beta:.4f}")
print(f"Estimated coefficient: β̂₁ = {model.coef_[0][0]:.4f}")
print(f"Estimated intercept:   β̂₀ = {model.intercept_[0]:.4f}")
print()
 
# Odds ratio interpretation
beta_hat = model.coef_[0][0]
odds_ratio = np.exp(beta_hat)
print(f"Odds Ratio: e^β̂₁ = e^{beta_hat:.4f} = {odds_ratio:.4f}")
print(f"Interpretation: A 1-unit increase in X multiplies odds by {odds_ratio:.4f}")
print()
 
# Demonstrate with concrete predictions
X_test = np.array([[0], [1], [2]])
probs_predicted = model.predict_proba(X_test)[:, 1]
odds_predicted = probs_predicted / (1 - probs_predicted)
 
print("Predictions at different X values:")
print("-" * 60)
print(f"{'X':>5} | {'P(Y=1)':>10} | {'Odds':>12} | {'Odds Ratio':>12}")
print("-" * 60)
for i in range(len(X_test)):
    x_val = X_test[i, 0]
    p = probs_predicted[i]
    o = odds_predicted[i]
    ratio = odds_predicted[i] / odds_predicted[0] if i > 0 else 1.0
    print(f"{x_val:>5} | {p:>10.4f} | {o:>12.4f} | {ratio:>12.4f}")
 
print(f"\nNote: Each unit increase multiplies odds by ~{odds_ratio:.2f}")

Coefficient to Odds Ratio Quick Reference
Coefficient (β)	Odds Ratio (e^β)	Effect on Odds	Interpretation
-2.0	0.14	÷7.4	Strongly decreases odds (86% reduction)
-1.0	0.37	÷2.7	Decreases odds (63% reduction)
-0.5	0.61	÷1.6	Moderately decreases odds (39% reduction)
0.0	1.00	×1.0	No effect on odds
0.5	1.65	×1.6	Moderately increases odds (65% increase)
1.0	2.72	×2.7	Increases odds (172% increase)
2.0	7.39	×7.4	Strongly increases odds (639% increase)

Common Misinterpretation

The coefficient does NOT tell you how much the probability changes per unit of X. Probability changes depend on where you start. A coefficient of 0.5 means odds multiply by 1.65, but the probability might increase by 0.10 (around p=0.5) or by only 0.02 (around p=0.95). Always interpret coefficients on the odds or log-odds scale.

Marginal Effects: Why Probability Changes Are Nonlinear

While log-odds effects are constant (linear), probability effects are not. The marginal effect of a feature on probability depends on where we are in the probability space.

Mathematical Derivation

Starting from $p = \sigma(\mathbf{w}^T \mathbf{x} + b)$, the marginal effect of $X_j$ on $p$ is:

$$\frac{\partial p}{\partial X_j} = \frac{\partial \sigma(z)}{\partial z} \cdot \frac{\partial z}{\partial X_j} = \sigma(z)(1-\sigma(z)) \cdot w_j = p(1-p) \cdot w_j$$

This reveals the critical insight: the marginal effect of feature $j$ on probability is $w_j \cdot p(1-p)$.

The factor $p(1-p)$ is maximized at $p = 0.5$ (where it equals 0.25) and approaches zero as $p \to 0$ or $p \to 1$.

Implications

Near certainty, effects are small: If $p = 0.99$, then $p(1-p) = 0.0099$. Even a large coefficient $w_j = 2$ produces only a marginal effect of $\approx 0.02$.
At maximum uncertainty, effects are largest: If $p = 0.5$, then $p(1-p) = 0.25$. The same $w_j = 2$ produces a marginal effect of $0.5$.
Symmetry around 0.5: The marginal effect at $p = 0.1$ equals that at $p = 0.9$ (both have $p(1-p) = 0.09$).

marginal_effects_demo.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
import matplotlib.pyplot as plt
 
def sigmoid(z):
    return 1 / (1 + np.exp(-z))
 
# Consider a model with coefficient w = 1
w = 1.0
 
# The marginal effect on probability depends on where we are
z_values = np.linspace(-5, 5, 100)
p_values = sigmoid(z_values)
marginal_effects = p_values * (1 - p_values) * w
 
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Left: Probability as function of linear predictor
ax1 = axes[0]
ax1.plot(z_values, p_values, 'b-', linewidth=2)
ax1.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
 
# Add tangent lines at different points to show varying slopes
for z_point in [-3, 0, 3]:
    p_point = sigmoid(z_point)
    slope = p_point * (1 - p_point) * w
    z_line = np.linspace(z_point - 1.5, z_point + 1.5, 50)
    p_line = p_point + slope * (z_line - z_point)
    ax1.plot(z_line, p_line, 'r--', alpha=0.7)
    ax1.scatter([z_point], [p_point], color='red', s=50, zorder=5)
 
ax1.set_xlabel('Linear Predictor (z = wx + b)')
ax1.set_ylabel('Probability P(Y=1)')
ax1.set_title('Probability Curve with Tangent Lines')
ax1.set_ylim(-0.1, 1.1)
ax1.grid(True, alpha=0.3)
 
# Right: Marginal effect as function of probability
ax2 = axes[1]
ax2.plot(p_values, marginal_effects, 'r-', linewidth=2)
ax2.fill_between(p_values, 0, marginal_effects, alpha=0.2, color='red')
ax2.axvline(x=0.5, color='gray', linestyle='--', alpha=0.5)
ax2.set_xlabel('Current Probability')
ax2.set_ylabel('Marginal Effect = dp/dz')
ax2.set_title(f'Marginal Effect on Probability (w={w})')
ax2.set_xlim(0, 1)
ax2.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('marginal_effects.png', dpi=150)
plt.show()
 
# Print specific values
print("Marginal Effects at Different Probability Levels")
print("-" * 50)
probs_example = [0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99]
for p in probs_example:
    me = p * (1 - p) * w
    print(f"At p = {p:.2f}: marginal effect = {me:.4f}")

Average Marginal Effect (AME)

In practice, researchers often report the Average Marginal Effect (AME)—the average of dp/dX across all observations in the dataset. This provides a single summary number while acknowledging that true effects vary. Modern software like statsmodels and R's margins package can compute AMEs automatically.

Worked Example: Medical Diagnosis

Let's work through a complete example to solidify our understanding of log-odds interpretation.

Scenario: Predicting Heart Disease

A logistic regression model predicts heart disease probability based on:

Age: in years
Cholesterol: in mg/dL
Smoker: binary (0 = no, 1 = yes)

Fitted model (hypothetical): $$\text{logit}(p) = -6.0 + 0.05 \cdot \text{Age} + 0.01 \cdot \text{Cholesterol} + 0.8 \cdot \text{Smoker}$$

Interpreting Each Coefficient

Intercept ($\beta_0 = -6.0$):
- Log-odds when Age = 0, Cholesterol = 0, Smoker = 0
- Not directly meaningful (no one has age 0), but affects baseline
Age ($\beta_{Age} = 0.05$):
- Each additional year increases log-odds by 0.05
- Odds ratio: $e^{0.05} = 1.051$
- Interpretation: Each year of age increases heart disease odds by ~5%
- 10-year increase: $e^{0.5} = 1.65$ — odds increase by 65%
Cholesterol ($\beta_{Chol} = 0.01$):
- Each mg/dL increases log-odds by 0.01
- Odds ratio: $e^{0.01} = 1.01$
- Interpretation: Each 1 mg/dL increases odds by ~1%
- 50 mg/dL increase: $e^{0.5} = 1.65$ — odds increase by 65%
Smoker ($\beta_{Smoker} = 0.8$):
- Being a smoker increases log-odds by 0.8
- Odds ratio: $e^{0.8} = 2.23$
- Interpretation: Smokers have 2.23× the odds of heart disease compared to non-smokers

heart_disease_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
 
def sigmoid(z):
    return 1 / (1 + np.exp(-z))
 
# Model coefficients
beta_0 = -6.0
beta_age = 0.05
beta_chol = 0.01
beta_smoker = 0.8
 
def predict_prob(age, cholesterol, smoker):
    """Predict heart disease probability."""
    z = beta_0 + beta_age * age + beta_chol * cholesterol + beta_smoker * smoker
    return sigmoid(z)
 
def predict_odds(age, cholesterol, smoker):
    """Predict odds of heart disease."""
    p = predict_prob(age, cholesterol, smoker)
    return p / (1 - p)
 
# Calculate odds ratios
print("Odds Ratios for Each Feature")
print("=" * 50)
print(f"Age (per year):       e^{beta_age:.2f} = {np.exp(beta_age):.4f}")
print(f"Cholesterol (per mg): e^{beta_chol:.2f} = {np.exp(beta_chol):.4f}")
print(f"Smoker (yes vs no):   e^{beta_smoker:.2f} = {np.exp(beta_smoker):.4f}")
 
# Example predictions
print("\nExample Predictions")
print("=" * 50)
 
# Person 1: 50-year-old non-smoker with cholesterol 200
p1 = predict_prob(50, 200, 0)
o1 = predict_odds(50, 200, 0)
print(f"50yo, non-smoker, chol=200: P = {p1:.4f}, odds = {o1:.4f}")
 
# Person 2: Same but smoker
p2 = predict_prob(50, 200, 1)
o2 = predict_odds(50, 200, 1)
print(f"50yo, smoker, chol=200:     P = {p2:.4f}, odds = {o2:.4f}")
print(f"Odds ratio (smoker/non):    {o2/o1:.4f} ≈ e^0.8 = {np.exp(0.8):.4f}")
 
# Person 3: 10 years older
p3 = predict_prob(60, 200, 0)
o3 = predict_odds(60, 200, 0)
print(f"60yo, non-smoker, chol=200: P = {p3:.4f}, odds = {o3:.4f}")
print(f"Odds ratio (60yo/50yo):     {o3/o1:.4f} ≈ e^0.5 = {np.exp(0.5):.4f}")
 
# Combine effects
print("\nCombined Effects")
print("=" * 50)
p4 = predict_prob(60, 200, 1)
o4 = predict_odds(60, 200, 1)
print(f"60yo, smoker, chol=200:     P = {p4:.4f}, odds = {o4:.4f}")
print(f"Combined odds ratio:        {o4/o1:.4f}")
print(f"Expected (1.65 × 2.23):     {1.65 * 2.23:.4f}")

Multiplicative Combination of Effects

Notice that combined effects multiply on the odds scale. A 60-year-old smoker has 1.65 × 2.23 ≈ 3.7× the odds of a 50-year-old non-smoker. This multiplicative structure is one of the elegant properties of logistic regression—effects combine independently on the log-odds (additively) or odds (multiplicatively) scale.

Confidence Intervals for Odds Ratios

In practice, we need uncertainty estimates for our odds ratios. Since maximum likelihood estimation gives us standard errors on the log-odds scale, we construct confidence intervals there and then exponentiate.

The Procedure

Estimate coefficient: $\hat{\beta}$ with standard error $SE(\hat{\beta})$
Construct CI for log-odds coefficient: $$\hat{\beta} \pm z_{\alpha/2} \cdot SE(\hat{\beta})$$

For 95% CI with $z_{0.025} = 1.96$: $$[\hat{\beta} - 1.96 \cdot SE(\hat{\beta}), \hat{\beta} + 1.96 \cdot SE(\hat{\beta})]$$
Exponentiate to get CI for odds ratio: $$[e^{\hat{\beta} - 1.96 \cdot SE(\hat{\beta})}, e^{\hat{\beta} + 1.96 \cdot SE(\hat{\beta})}]$$

Why Exponentiate the Entire Interval?

Because the exponential function is monotonic, the confidence interval transforms correctly: $$P(L \leq \beta \leq U) = P(e^L \leq e^\beta \leq e^U)$$

Note that the odds ratio CI is asymmetric around the point estimate (because exponentiation is nonlinear), but the coverage probability is preserved.

odds_ratio_ci.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
from sklearn.linear_model import LogisticRegression
from scipy import stats
 
# Generate sample data
np.random.seed(42)
n = 1000
X = np.random.randn(n, 2)
true_beta = np.array([0.8, -0.5])
z = X @ true_beta + 0.2
p = 1 / (1 + np.exp(-z))
y = (np.random.rand(n) < p).astype(int)
 
# Fit model using statsmodels for proper inference
import statsmodels.api as sm
 
X_with_const = sm.add_constant(X)
model = sm.Logit(y, X_with_const)
result = model.fit(disp=0)
 
print("Logistic Regression with Confidence Intervals")
print("=" * 70)
 
# Extract coefficients and standard errors
coef_names = ['intercept', 'X1', 'X2']
for i, name in enumerate(coef_names):
    beta = result.params[i]
    se = result.bse[i]
    z_val = result.tvalues[i]
    p_val = result.pvalues[i]
    
    # 95% CI for coefficient (log-odds)
    ci_low_logodds = beta - 1.96 * se
    ci_high_logodds = beta + 1.96 * se
    
    # 95% CI for odds ratio
    or_point = np.exp(beta)
    ci_low_or = np.exp(ci_low_logodds)
    ci_high_or = np.exp(ci_high_logodds)
    
    print(f"\n{name}:")
    print(f"  Coefficient (log-odds): {beta:.4f} (SE: {se:.4f})")
    print(f"  95% CI (log-odds): [{ci_low_logodds:.4f}, {ci_high_logodds:.4f}]")
    print(f"  Odds Ratio: {or_point:.4f}")
    print(f"  95% CI (OR): [{ci_low_or:.4f}, {ci_high_or:.4f}]")
    print(f"  p-value: {p_val:.4f} {'***' if p_val < 0.001 else '**' if p_val < 0.01 else '*' if p_val < 0.05 else ''}")
 
print("\n" + "=" * 70)
print("Statistical Significance:")
print("An odds ratio CI that does NOT contain 1.0 indicates")
print("statistically significant association at the chosen α level.")

Interpreting the Odds Ratio CI

If the 95% CI for an odds ratio contains 1.0, we cannot reject the null hypothesis that the feature has no effect on odds. If the entire CI is above 1.0, the effect is significantly positive; if entirely below 1.0, significantly negative. The width of the CI indicates precision—narrower intervals mean more precise estimates.

Summary: The Log-Odds Perspective

We've explored the log-odds interpretation of logistic regression in depth. This perspective transforms logistic regression from seemingly nonlinear magic into a principled, interpretable linear model. Let's consolidate the essential insights:

Key Takeaways

•Odds = p/(1-p): The ratio of success probability to failure probability. Ranges from 0 to infinity.
•Log-odds = log(p/(1-p)): The logarithm of odds. Ranges from -∞ to +∞, symmetric around 0 (where p = 0.5).
•Logistic regression is linear in log-odds: Despite the S-curve in probability space, the log-odds are a linear function of features.
•Coefficients are log-odds changes: A coefficient β means a 1-unit increase in X adds β to the log-odds.
•Odds ratios: e^β gives the multiplicative factor by which odds change per unit of X. This is the most interpretable quantity in practice.
•Effects combine multiplicatively: On the odds scale, effects from different features multiply. On log-odds, they add.
•Marginal effects vary: The effect on probability depends on current probability. Maximum at p = 0.5, minimal near 0 or 1.
•Confidence intervals transform: Calculate CI on log-odds scale, then exponentiate for odds ratio CI.

What's Next:

With the log-odds interpretation mastered, we turn to the model parameters page—examining the roles of weights and bias, how they're estimated via maximum likelihood, and their geometric interpretation as defining a separating hyperplane in feature space.

Page Complete

You now understand the log-odds interpretation of logistic regression—the key that unlocks coefficient interpretation and reveals the linear structure hidden beneath the S-curve. This understanding is essential for proper model interpretation and communication of results.