Generalized Linear Models - Learning Module

Loading content...

0/245

Link Functions: Connecting Means to Predictors

The Bridge Between Worlds

In the GLM framework, the link function plays a deceptively simple but profoundly important role: it transforms the expected value of the response into a quantity that can be modeled as a linear combination of predictors.

Consider the challenge: when modeling probabilities, the mean must lie in (0,1); when modeling counts, the mean must be non-negative; when modeling durations, the mean must be strictly positive. Yet our linear predictor $\eta = \mathbf{x}^\top \boldsymbol{\beta}$ can take any real value. How do we reconcile these incompatible domains?

The link function $g(\cdot)$ is the mathematical bridge. It maps the constrained mean space to the entire real line, allowing us to say $g(\mu) = \eta$ for any $\eta \in \mathbb{R}$. The properties of this bridge—its shape, its derivatives, its interpretability—profoundly affect the behavior and meaning of our model.

What You Will Learn

By the end of this page, you will deeply understand: (1) the mathematical requirements for valid link functions, (2) the properties and interpretations of common links (identity, logit, probit, log, etc.), (3) how to choose an appropriate link for your problem, and (4) the subtle tradeoffs between canonical and non-canonical links.

Mathematical Requirements for Link Functions

Not every function can serve as a link. For a function $g: \mathcal{M} \to \mathbb{R}$ to be a valid link function, it must satisfy several mathematical requirements.

Requirement 1: Defined on the Mean Space

The link function must be defined on the set $\mathcal{M}$ of possible mean values for the chosen distribution:

Distribution	Mean Space $\mathcal{M}$
Normal	$(-\infty, \infty)$
Bernoulli/Binomial	$(0, 1)$
Poisson	$(0, \infty)$
Gamma	$(0, \infty)$
Inverse Gaussian	$(0, \infty)$

For example, the log link $g(\mu) = \log(\mu)$ is appropriate when $\mathcal{M} = (0, \infty)$ but not when negative means are possible.

Requirement 2: Monotonicity

The link function must be strictly monotonic (either strictly increasing or strictly decreasing throughout $\mathcal{M}$). This ensures a one-to-one correspondence between $\mu$ and $\eta$.

Monotonicity guarantees:

For each linear predictor value $\eta$, there exists exactly one mean $\mu = g^{-1}(\eta)$
The model is identifiable—different parameter values produce different predictions
The score equations have unique solutions (under regularity conditions)

Requirement 3: Differentiability

The link function should be twice differentiable on the interior of $\mathcal{M}$. This is needed for:

Deriving the score function in maximum likelihood estimation
Computing the Fisher information matrix
Applying Newton-Raphson or Fisher scoring algorithms

The first derivative $g'(\mu)$ appears in the weight matrix of iteratively reweighted least squares: $$w_i = \frac{1}{V(\mu_i) [g'(\mu_i)]^2}$$

Requirement 4: Range Coverage

The link function must map $\mathcal{M}$ onto the entire real line $\mathbb{R}$: $$g: \mathcal{M} \to \mathbb{R} \text{ is surjective}$$

This ensures that for any possible linear predictor value, there exists a valid mean. Without this, certain predictor combinations might produce undefined predictions.

Intuitive Summary

A valid link function is a smooth, invertible transformation that takes the constrained mean (like a probability in (0,1)) and stretches it out to cover the whole real line (where the linear predictor lives). The stretching must be done smoothly and without any folds or kinks.

The Identity Link

The identity link is the simplest possible link function:

$$g(\mu) = \mu \qquad g^{-1}(\eta) = \eta$$

With the identity link, the mean equals the linear predictor directly: $$\mu_i = \mathbf{x}_i^\top \boldsymbol{\beta}$$

Properties

Derivative: $g'(\mu) = 1$ (constant)

Interpretation: Parameters have direct, additive effects on the mean: $$\frac{\partial \mu}{\partial X_j} = \beta_j$$

A one-unit increase in $X_j$ increases the expected response by exactly $\beta_j$ units, regardless of the current value of $X_j$ or other predictors.

When It's Appropriate

The identity link is the canonical link for the Normal distribution. It's also appropriate when:

The mean can take any real value (no constraints)
Predictor effects are genuinely additive on the response scale
The relationship between predictors and response is approximately linear

The Constraint Problem

Using the identity link with a constrained distribution (like Poisson or binomial) can produce predictions outside the valid range. For example, if μ = X^T β and β^T x = -2, we predict μ = -2—impossible for a count. Most software will fit the model but produce warnings or errors at prediction time for such cases.

identity_link_demonstration.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
import matplotlib.pyplot as plt
 
# Identity link: g(μ) = μ, so μ = η directly
 
# Linear predictor values
eta = np.linspace(-3, 3, 100)
 
# With identity link, mean equals linear predictor
mu = eta  # g^{-1}(η) = η
 
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
# Left: Link function
axes[0].plot(mu, eta, 'b-', linewidth=2)
axes[0].plot(mu, mu, 'k--', alpha=0.3, label='45° line')
axes[0].set_xlabel('Mean μ', fontsize=12)
axes[0].set_ylabel('Linear Predictor η = g(μ)', fontsize=12)
axes[0].set_title('Identity Link Function\ng(μ) = μ', fontsize=14)
axes[0].legend()
axes[0].grid(True, alpha=0.3)
 
# Right: Response function (inverse link)
axes[1].plot(eta, mu, 'r-', linewidth=2)
axes[1].set_xlabel('Linear Predictor η', fontsize=12)
axes[1].set_ylabel('Mean μ = g⁻¹(η)', fontsize=12)  
axes[1].set_title('Response Function (Inverse Link)\nμ = η', fontsize=14)
axes[1].grid(True, alpha=0.3)
 
# Annotation showing linearity
axes[1].annotate('Slope = 1\n(constant marginal effect)', 
                 xy=(0, 0), xytext=(1.5, -1.5),
                 fontsize=10, ha='left',
                 arrowprops=dict(arrowstyle='->', color='gray'))
 
plt.tight_layout()
plt.savefig('identity_link.png', dpi=150, bbox_inches='tight')
plt.show()

The Logit Link

The logit link (also called the log-odds link) is the canonical link for the Bernoulli and binomial distributions:

$$g(p) = \log\left(\frac{p}{1-p}\right) = \text{logit}(p) \qquad g^{-1}(\eta) = \frac{e^\eta}{1 + e^\eta} = \frac{1}{1 + e^{-\eta}}$$

The inverse function $g^{-1}$ is the logistic (or sigmoid) function, which maps the entire real line to (0,1).

Mathematical Properties

Domain: $p \in (0, 1)$ Range: $\eta \in (-\infty, +\infty)$

Derivative: $$g'(p) = \frac{1}{p(1-p)}$$

Note that $g'(p)$ is large near 0 and 1 (where small changes in probability correspond to large changes in log-odds) and smallest at $p = 0.5$.

Symmetry: $$\text{logit}(p) = -\text{logit}(1-p)$$

This symmetry means that the effect of predictors on the probability of success equals (in magnitude but opposite sign) the effect on the probability of failure.

Interpretation: The Odds Ratio

The logit link gives rise to one of the most important quantities in epidemiology and social science: the odds ratio.

Recall that for a binary outcome, the odds of success are: $$\text{odds} = \frac{p}{1-p}$$

With the logit link: $$\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p$$

Exponentiating both sides: $$\frac{p}{1-p} = e^{\beta_0} \cdot e^{\beta_1 X_1} \cdot \ldots \cdot e^{\beta_p X_p}$$

Now consider what happens when $X_j$ increases by 1 unit (holding others constant):

$$\frac{\text{odds after}}{\text{odds before}} = \frac{e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_j(X_j+1) + \cdots}}{e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_j X_j + \cdots}} = e^{\beta_j}$$

Thus $e^{\beta_j}$ is the multiplicative change in odds per unit increase in $X_j$—the odds ratio.

Interpreting Logistic Regression Coefficients
Coefficient β_j	Odds Ratio e^(β_j)	Interpretation
β_j = 0	OR = 1.00	No effect on odds
β_j = 0.1	OR ≈ 1.11	11% increase in odds per unit X_j
β_j = 0.5	OR ≈ 1.65	65% increase in odds per unit X_j
β_j = 1.0	OR ≈ 2.72	Odds nearly triple per unit X_j
β_j = -0.5	OR ≈ 0.61	39% decrease in odds per unit X_j
β_j = -1.0	OR ≈ 0.37	Odds reduced to ~1/3 per unit X_j

logit_link_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import numpy as np
import matplotlib.pyplot as plt
 
# Logit link: g(p) = log(p/(1-p))
# Inverse (sigmoid): g^{-1}(η) = 1 / (1 + exp(-η))
 
p = np.linspace(0.001, 0.999, 1000)
eta = np.linspace(-6, 6, 1000)
 
# Logit function
logit_p = np.log(p / (1 - p))
 
# Sigmoid (inverse logit)  
sigmoid_eta = 1 / (1 + np.exp(-eta))
 
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
 
# Left: Logit function
axes[0].plot(p, logit_p, 'b-', linewidth=2)
axes[0].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0].axvline(x=0.5, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xlabel('Probability p', fontsize=12)
axes[0].set_ylabel('Log-Odds = logit(p)', fontsize=12)
axes[0].set_title('Logit Link Function\ng(p) = log(p/(1-p))', fontsize=14)
axes[0].set_xlim(0, 1)
axes[0].set_ylim(-6, 6)
axes[0].grid(True, alpha=0.3)
 
# Middle: Sigmoid function
axes[1].plot(eta, sigmoid_eta, 'r-', linewidth=2)
axes[1].axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Linear Predictor η', fontsize=12)
axes[1].set_ylabel('Probability p = σ(η)', fontsize=12)
axes[1].set_title('Sigmoid (Inverse Logit)\np = 1/(1 + e^{-η})', fontsize=14)
axes[1].set_ylim(0, 1)
axes[1].grid(True, alpha=0.3)
 
# Right: Derivative of sigmoid (sensitivity)
sigmoid_deriv = sigmoid_eta * (1 - sigmoid_eta)
axes[2].plot(eta, sigmoid_deriv, 'g-', linewidth=2)
axes[2].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[2].set_xlabel('Linear Predictor η', fontsize=12)
axes[2].set_ylabel("Sensitivity dp/dη", fontsize=12)
axes[2].set_title("Sigmoid Derivative\n∂p/∂η = p(1-p)", fontsize=14)
axes[2].annotate('Maximum sensitivity\nat η=0 (p=0.5)', 
                 xy=(0, 0.25), xytext=(2, 0.2),
                 fontsize=10, arrowprops=dict(arrowstyle='->', color='gray'))
axes[2].grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('logit_link.png', dpi=150, bbox_inches='tight')
plt.show()

The S-Curve Intuition

The sigmoid's S-shape captures a crucial phenomenon: marginal effects diminish at extremes. When p is near 0.5, a small change in the linear predictor has the largest effect on probability. But when p is near 0 or 1, the same change in η has a much smaller effect on p. This 'saturation' behavior is often realistic for binary outcomes.

The Probit Link

The probit link is an alternative to the logit for binary response data:

$$g(p) = \Phi^{-1}(p) \qquad g^{-1}(\eta) = \Phi(\eta)$$

where $\Phi$ is the cumulative distribution function (CDF) of the standard normal distribution:

$$\Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z} e^{-t^2/2} , dt$$

Motivation: The Latent Variable Interpretation

Probit regression has an elegant interpretation through latent variables. Suppose there exists an unobserved continuous variable $Y^*_i$ such that:

$$Y^*_i = \mathbf{x}_i^\top \boldsymbol{\beta} + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0, 1)$$

We observe $Y_i = 1$ if $Y^*_i > 0$ and $Y_i = 0$ otherwise. Then:

$$P(Y_i = 1) = P(Y^*_i > 0) = P(\mathbf{x}_i^\top \boldsymbol{\beta} + \varepsilon_i > 0) = P(\varepsilon_i > -\mathbf{x}_i^\top \boldsymbol{\beta})$$

Since $\varepsilon_i \sim \mathcal{N}(0,1)$ is symmetric: $$P(Y_i = 1) = \Phi(\mathbf{x}_i^\top \boldsymbol{\beta})$$

This latent variable interpretation is widely used in economics (discrete choice theory) and psychometrics.

Advantages of Probit

•Natural latent variable interpretation
•Connects to multivariate normal theory for extensions
•Thinner tails than logit (more conservative predictions near extremes)
•Standard in economics and some social sciences
•Extends naturally to ordered outcomes (ordered probit)

Disadvantages of Probit

•No closed-form odds ratio interpretation
•Computationally more expensive (requires erf function)
•Less familiar to most practitioners outside economics
•Differences from logit are often negligible in practice
•Not a canonical link for any exponential family

Logit vs. Probit: A Comparison

In practice, logit and probit give nearly identical predictions for most datasets. The relationship between them is approximately:

$$\text{logit}(p) \approx \frac{\pi}{\sqrt{3}} \cdot \Phi^{-1}(p) \approx 1.81 \cdot \Phi^{-1}(p)$$

Thus, probit coefficients are roughly 1.8 times smaller than logit coefficients for the same data.

The main difference is in the tails:

Logit has heavier tails (logistic distribution has kurtosis 6, vs. 3 for normal)
Probit approaches 0 and 1 more quickly as $|\eta|$ increases
For extreme predictions, logit is more conservative (higher uncertainty)

When to choose:

Use logit when odds ratio interpretation is valuable, or for consistency with common practice
Use probit when the latent variable interpretation is meaningful, or when convention in your field dictates

Practical Guidance

For most applications, the choice between logit and probit is a matter of discipline convention and interpretive preference—not model fit. If you're unsure, use logit: it's more common, computationally faster, and provides the interpretable odds ratio. Only switch to probit if your field expects it or the latent variable story is central to your analysis.

The Log Link

The log link is the canonical link for the Poisson distribution and commonly used for Gamma regression:

$$g(\mu) = \log(\mu) \qquad g^{-1}(\eta) = e^\eta$$

Properties

Domain: $\mu \in (0, \infty)$ Range: $\eta \in (-\infty, +\infty)$

Derivative: $g'(\mu) = 1/\mu$

Key Feature: The inverse link $\mu = e^\eta$ is always positive, regardless of $\eta$. This automatically satisfies the positivity constraint for counts and positive continuous responses.

Interpretation: Multiplicative Effects

With the log link: $$\log(\mu) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p$$

Exponentiating: $$\mu = e^{\beta_0} \cdot e^{\beta_1 X_1} \cdot \ldots \cdot e^{\beta_p X_p}$$

Predictor effects are multiplicative on the mean. When $X_j$ increases by 1:

$$\frac{\mu_{\text{after}}}{\mu_{\text{before}}} = e^{\beta_j}$$

The quantity $e^{\beta_j}$ is the rate ratio or incidence rate ratio (IRR) in epidemiology.

Interpreting Log-Link Coefficients (Poisson Regression)
Coefficient β_j	Rate Ratio e^(β_j)	Interpretation
β_j = 0	RR = 1.00	No effect on expected count
β_j = 0.1	RR ≈ 1.11	11% increase in expected count per unit X_j
β_j = 0.5	RR ≈ 1.65	65% increase in expected count per unit X_j
β_j = -0.3	RR ≈ 0.74	26% decrease in expected count per unit X_j
β_j = ln(2) ≈ 0.69	RR = 2.00	Expected count doubles per unit X_j

The Log Link for Multiplicative Phenomena

The log link is appropriate when effects combine multiplicatively rather than additively. Many phenomena exhibit this pattern:

Population growth: Each factor multiplies the growth rate
Compound interest: Returns combine multiplicatively
Epidemiology: Risk factors often multiply baseline risk
Economics: Percentage effects are multiplicative
Biology: Many biological processes involve cascading multiplicative effects

In these contexts, the log link provides:

Natural interpretation: Effects as percentage changes
Positivity guarantee: Predictions can never be negative
Proportional effects: A 10% increase is 10% whether the baseline is 5 or 5,000

log_link_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
import matplotlib.pyplot as plt
 
# Log link: g(μ) = log(μ)
# Inverse: g^{-1}(η) = exp(η)
 
mu = np.linspace(0.01, 10, 1000)
eta = np.linspace(-3, 3, 1000)
 
# Log function
log_mu = np.log(mu)
 
# Exponential (inverse)
exp_eta = np.exp(eta)
 
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
# Left: Log link function
axes[0].plot(mu, log_mu, 'b-', linewidth=2)
axes[0].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0].axvline(x=1, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xlabel('Mean μ (must be > 0)', fontsize=12)
axes[0].set_ylabel('Linear Predictor η = log(μ)', fontsize=12)
axes[0].set_title('Log Link Function\ng(μ) = log(μ)', fontsize=14)
axes[0].set_xlim(0, 10)
axes[0].set_ylim(-5, 3)
axes[0].grid(True, alpha=0.3)
 
# Right: Exponential (inverse log)
axes[1].plot(eta, exp_eta, 'r-', linewidth=2)
axes[1].axhline(y=1, color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Linear Predictor η', fontsize=12)
axes[1].set_ylabel('Mean μ = exp(η)', fontsize=12)
axes[1].set_title('Exponential (Inverse Log)\nμ = e^η (always positive!)', fontsize=14)
axes[1].set_ylim(0, 15)
axes[1].grid(True, alpha=0.3)
 
# Add annotation showing multiplicative interpretation
axes[1].annotate('η goes from 1 to 2:\nμ increases by factor e ≈ 2.72', 
                 xy=(1.5, np.exp(1.5)), xytext=(0, 10),
                 fontsize=10, arrowprops=dict(arrowstyle='->', color='gray'))
 
plt.tight_layout()
plt.savefig('log_link.png', dpi=150, bbox_inches='tight')
plt.show()

Log Link vs. Log-Transforming Y

Using a log link is NOT the same as fitting linear regression to log(Y). With a log link: (1) we model E[Y], not E[log(Y)]; (2) we don't need Y > 0 for every observation (zeros are handled via the distribution); (3) we properly account for heteroscedasticity. Log-transforming Y changes the question being asked and biases predictions when back-transformed.

The Complementary Log-Log Link

The complementary log-log (cloglog) link is an asymmetric alternative to logit and probit for binary data:

$$g(p) = \log(-\log(1-p)) \qquad g^{-1}(\eta) = 1 - \exp(-\exp(\eta))$$

Properties and Interpretation

Unlike the symmetric logit and probit, the cloglog link is asymmetric:

It approaches 0 relatively slowly as $\eta \to -\infty$
It approaches 1 rapidly as $\eta \to +\infty$

Connection to Extreme Value Theory

The cloglog link arises naturally in survival analysis and extreme value theory. If we have a binary outcome arising from whether an event occurs before a fixed time point, and the underlying hazard is constant (exponential survival), the appropriate link is cloglog.

Specifically, if $T \sim \text{Exponential}(\lambda)$ and we observe $Y = I(T \leq t_0)$:

$$P(Y = 1) = P(T \leq t_0) = 1 - e^{-\lambda t_0}$$

If $\log(\lambda t_0) = \eta$, then $P(Y=1) = 1 - e^{-e^\eta}$, which is the cloglog inverse link.

When to Use Cloglog

Grouped survival data: When binary outcomes represent "event occurred within period"
Proportion of infected/diseased: When a latent continuous process has an extreme-value distribution
Asymmetric responses: When the probability curve should be asymmetric (steep on one side, gradual on the other)

Comparing Links for Binary Data
Property	Logit	Probit	Cloglog
Symmetry	Symmetric around p=0.5	Symmetric around p=0.5	Asymmetric
Tail behavior	Heavy tails (logistic)	Moderate tails (normal)	Asymmetric tails
Canonical for	Binomial	None	None
Interpretation	Log-odds (odds ratio)	Latent normal threshold	Hazard model
Common in	Medicine, ML, general	Economics, psychometrics	Survival, epidemiology

Choosing Among Binary Links

In practice, logit, probit, and cloglog often give similar predictions in the middle range (p between 0.2 and 0.8). They differ mainly in the tails and in interpretation. Choose based on: (1) interpretive needs (odds ratios → logit), (2) field conventions, (3) theoretical model (survival → cloglog, latent normal → probit).

The Inverse Link and Other Specialized Links

The Inverse (Reciprocal) Link

The inverse link is the canonical link for the Gamma distribution:

$$g(\mu) = \frac{1}{\mu} \qquad g^{-1}(\eta) = \frac{1}{\eta}$$

Properties:

Domain: $\mu > 0$
Range: $\eta > 0$ (when used canonically) or all reals if we allow $\mu = 1/\eta$ with appropriate sign handling

Interpretation: Not intuitive. A unit increase in $X_j$ changes $1/\mu$ by $\beta_j$. This makes the inverse link unpopular despite being canonical—practitioners often prefer the log link for Gamma regression.

The Inverse Squared Link

The inverse squared link is canonical for the Inverse Gaussian distribution:

$$g(\mu) = \frac{1}{\mu^2} \qquad g^{-1}(\eta) = \frac{1}{\sqrt{\eta}}$$

The Square Root Link

The square root link is sometimes used for Poisson data:

$$g(\mu) = \sqrt{\mu} \qquad g^{-1}(\eta) = \eta^2$$

This link ensures $\mu > 0$ only for $\eta > 0$, and has variance-stabilizing properties for Poisson data (the square root transformation makes Poisson variance approximately constant).

Complete Link Function Reference
Link Name	g(μ)	g⁻¹(η)	Mean Space	Typical Use
Identity	μ	η	ℝ	Normal regression
Log	log(μ)	e^η	(0,∞)	Poisson, Gamma
Logit	log(p/(1-p))	1/(1+e^(-η))	(0,1)	Binomial
Probit	Φ⁻¹(p)	Φ(η)	(0,1)	Binomial (economics)
Cloglog	log(-log(1-p))	1-exp(-exp(η))	(0,1)	Binomial (survival)
Inverse	1/μ	1/η	(0,∞)	Gamma (canonical)
Inverse squared	1/μ²	1/√η	(0,∞)	Inverse Gaussian
Square root	√μ	η²	[0,∞)	Poisson (var-stab)

Custom Links

You can define custom link functions for special applications. As long as your function is monotonic, differentiable, and maps the mean space to ℝ, standard GLM estimation applies. Some software (like R's glm) allows user-specified link functions.

Choosing the Right Link Function

Selecting an appropriate link function involves balancing several considerations:

Decision Framework

Step 1: Ensure Validity The link must map the mean space to ℝ. For binary data, you need a link that maps (0,1) to ℝ; for count data, one that maps (0,∞) to ℝ.

Step 2: Consider the Canonical Link The canonical link has nice mathematical properties (sufficient statistics, concavity). Start with the canonical unless you have reasons to deviate.

Step 3: Prioritize Interpretability Coefficients should be meaningful for your application:

Need odds ratios → Logit
Need multiplicative effects → Log
Need additive effects → Identity (if valid)

Step 4: Consider Domain Knowledge Does theory or prior research suggest a particular relationship? In pharmacokinetics, log-linear relationships are standard; in psychometrics, probit has theoretical justification.

Link Selection Guidelines by Response Type

•Binary/Binomial Response: Logit (default, odds ratios), Probit (latent normal), Cloglog (survival context)
•Count Response: Log (default, multiplicative effects), Identity (only if means are never near zero), Square root (variance stabilization)
•Positive Continuous: Log (percentage effects, common), Inverse (canonical for Gamma), Identity (rarely appropriate)
•Unbounded Continuous: Identity (always appropriate for Normal distribution)

Comparing Links Empirically

When unsure, you can compare models with different links using:

AIC/BIC: Lower is better, but only comparable within the same response distribution
Residual plots: Check for patterns indicating systematic misfit
Prediction accuracy: Cross-validation error on held-out data
Link test: A formal test that the link is correctly specified (Pregibon's link test adds $\hat{\eta}^2$ as a predictor; if significant, the link may be wrong)

However, often different links give very similar fits, and the choice comes down to interpretation and convention.

Practical Wisdom

In practice, for most applications: use logit for binary outcomes, log for counts and positive continuous, and identity for unbounded continuous. This covers 95% of cases. Only deviate when you have specific theoretical or interpretive reasons.

Summary: Link Functions

We've explored link functions—the critical bridge between constrained means and unbounded linear predictors. Let's consolidate the key concepts:

Key Takeaways

•Link functions are transformations that map the mean $\mu$ to the linear predictor $\eta = g(\mu)$. They must be monotonic, differentiable, and map the mean space to ℝ.
•The identity link ($g(\mu) = \mu$) gives additive predictor effects—canonical for Normal, but inappropriate for constrained responses.
•The logit link ($g(p) = \log(p/(1-p))$) maps probabilities to log-odds, giving odds ratio interpretation—canonical for Binomial.
•The probit link uses the inverse normal CDF, motivated by latent normal threshold models. Similar to logit but with different tail behavior.
•The log link ($g(\mu) = \log(\mu)$) gives multiplicative effects (rate ratios)—canonical for Poisson, also common for Gamma.
•Canonical links have mathematical advantages but aren't always best for interpretation. Non-canonical links are perfectly valid.
•Link choice affects interpretation: coefficients represent effects on the transformed scale (log-odds, log-mean, etc.).

What's Next:

In the next page, we'll explore the exponential family of distributions—the mathematical foundation that makes the GLM framework possible. Understanding the exponential family reveals why certain distribution-link combinations work well and provides the tools for deriving new GLMs.

Page Complete

You now have a deep understanding of link functions—their requirements, properties, and interpretations. You can choose appropriate links for different response types and interpret the resulting coefficients correctly. Next, we'll see how the exponential family provides the theoretical foundation for GLMs.

Link Functions: Connecting Means to Predictors

The Bridge Between Worlds

What You Will Learn

Mathematical Requirements for Link Functions

Not every function can serve as a link. For a function $g: \mathcal{M} \to \mathbb{R}$ to be a valid link function, it must satisfy several mathematical requirements.

Requirement 1: Defined on the Mean Space

The link function must be defined on the set $\mathcal{M}$ of possible mean values for the chosen distribution:

Distribution	Mean Space $\mathcal{M}$
Normal	$(-\infty, \infty)$
Bernoulli/Binomial	$(0, 1)$
Poisson	$(0, \infty)$
Gamma	$(0, \infty)$
Inverse Gaussian	$(0, \infty)$

For example, the log link $g(\mu) = \log(\mu)$ is appropriate when $\mathcal{M} = (0, \infty)$ but not when negative means are possible.

Requirement 2: Monotonicity

The link function must be strictly monotonic (either strictly increasing or strictly decreasing throughout $\mathcal{M}$). This ensures a one-to-one correspondence between $\mu$ and $\eta$.

Monotonicity guarantees:

For each linear predictor value $\eta$, there exists exactly one mean $\mu = g^{-1}(\eta)$
The model is identifiable—different parameter values produce different predictions
The score equations have unique solutions (under regularity conditions)

Requirement 3: Differentiability

The link function should be twice differentiable on the interior of $\mathcal{M}$. This is needed for:

Deriving the score function in maximum likelihood estimation
Computing the Fisher information matrix
Applying Newton-Raphson or Fisher scoring algorithms

The first derivative $g'(\mu)$ appears in the weight matrix of iteratively reweighted least squares: $$w_i = \frac{1}{V(\mu_i) [g'(\mu_i)]^2}$$

Requirement 4: Range Coverage

The link function must map $\mathcal{M}$ onto the entire real line $\mathbb{R}$: $$g: \mathcal{M} \to \mathbb{R} \text{ is surjective}$$

This ensures that for any possible linear predictor value, there exists a valid mean. Without this, certain predictor combinations might produce undefined predictions.

Intuitive Summary

The Identity Link

The identity link is the simplest possible link function:

$$g(\mu) = \mu \qquad g^{-1}(\eta) = \eta$$

With the identity link, the mean equals the linear predictor directly: $$\mu_i = \mathbf{x}_i^\top \boldsymbol{\beta}$$

Properties

Derivative: $g'(\mu) = 1$ (constant)

Interpretation: Parameters have direct, additive effects on the mean: $$\frac{\partial \mu}{\partial X_j} = \beta_j$$

A one-unit increase in $X_j$ increases the expected response by exactly $\beta_j$ units, regardless of the current value of $X_j$ or other predictors.

When It's Appropriate

The identity link is the canonical link for the Normal distribution. It's also appropriate when:

The mean can take any real value (no constraints)
Predictor effects are genuinely additive on the response scale
The relationship between predictors and response is approximately linear

The Constraint Problem

identity_link_demonstration.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
import matplotlib.pyplot as plt
 
# Identity link: g(μ) = μ, so μ = η directly
 
# Linear predictor values
eta = np.linspace(-3, 3, 100)
 
# With identity link, mean equals linear predictor
mu = eta  # g^{-1}(η) = η
 
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
# Left: Link function
axes[0].plot(mu, eta, 'b-', linewidth=2)
axes[0].plot(mu, mu, 'k--', alpha=0.3, label='45° line')
axes[0].set_xlabel('Mean μ', fontsize=12)
axes[0].set_ylabel('Linear Predictor η = g(μ)', fontsize=12)
axes[0].set_title('Identity Link Function\ng(μ) = μ', fontsize=14)
axes[0].legend()
axes[0].grid(True, alpha=0.3)
 
# Right: Response function (inverse link)
axes[1].plot(eta, mu, 'r-', linewidth=2)
axes[1].set_xlabel('Linear Predictor η', fontsize=12)
axes[1].set_ylabel('Mean μ = g⁻¹(η)', fontsize=12)  
axes[1].set_title('Response Function (Inverse Link)\nμ = η', fontsize=14)
axes[1].grid(True, alpha=0.3)
 
# Annotation showing linearity
axes[1].annotate('Slope = 1\n(constant marginal effect)', 
                 xy=(0, 0), xytext=(1.5, -1.5),
                 fontsize=10, ha='left',
                 arrowprops=dict(arrowstyle='->', color='gray'))
 
plt.tight_layout()
plt.savefig('identity_link.png', dpi=150, bbox_inches='tight')
plt.show()

The Logit Link

The logit link (also called the log-odds link) is the canonical link for the Bernoulli and binomial distributions:

$$g(p) = \log\left(\frac{p}{1-p}\right) = \text{logit}(p) \qquad g^{-1}(\eta) = \frac{e^\eta}{1 + e^\eta} = \frac{1}{1 + e^{-\eta}}$$

The inverse function $g^{-1}$ is the logistic (or sigmoid) function, which maps the entire real line to (0,1).

Mathematical Properties

Domain: $p \in (0, 1)$ Range: $\eta \in (-\infty, +\infty)$

Derivative: $$g'(p) = \frac{1}{p(1-p)}$$

Note that $g'(p)$ is large near 0 and 1 (where small changes in probability correspond to large changes in log-odds) and smallest at $p = 0.5$.

Symmetry: $$\text{logit}(p) = -\text{logit}(1-p)$$

This symmetry means that the effect of predictors on the probability of success equals (in magnitude but opposite sign) the effect on the probability of failure.

Interpretation: The Odds Ratio

The logit link gives rise to one of the most important quantities in epidemiology and social science: the odds ratio.

Recall that for a binary outcome, the odds of success are: $$\text{odds} = \frac{p}{1-p}$$

With the logit link: $$\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p$$

Exponentiating both sides: $$\frac{p}{1-p} = e^{\beta_0} \cdot e^{\beta_1 X_1} \cdot \ldots \cdot e^{\beta_p X_p}$$

Now consider what happens when $X_j$ increases by 1 unit (holding others constant):

$$\frac{\text{odds after}}{\text{odds before}} = \frac{e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_j(X_j+1) + \cdots}}{e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_j X_j + \cdots}} = e^{\beta_j}$$

Thus $e^{\beta_j}$ is the multiplicative change in odds per unit increase in $X_j$—the odds ratio.

Interpreting Logistic Regression Coefficients
Coefficient β_j	Odds Ratio e^(β_j)	Interpretation
β_j = 0	OR = 1.00	No effect on odds
β_j = 0.1	OR ≈ 1.11	11% increase in odds per unit X_j
β_j = 0.5	OR ≈ 1.65	65% increase in odds per unit X_j
β_j = 1.0	OR ≈ 2.72	Odds nearly triple per unit X_j
β_j = -0.5	OR ≈ 0.61	39% decrease in odds per unit X_j
β_j = -1.0	OR ≈ 0.37	Odds reduced to ~1/3 per unit X_j

logit_link_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import numpy as np
import matplotlib.pyplot as plt
 
# Logit link: g(p) = log(p/(1-p))
# Inverse (sigmoid): g^{-1}(η) = 1 / (1 + exp(-η))
 
p = np.linspace(0.001, 0.999, 1000)
eta = np.linspace(-6, 6, 1000)
 
# Logit function
logit_p = np.log(p / (1 - p))
 
# Sigmoid (inverse logit)  
sigmoid_eta = 1 / (1 + np.exp(-eta))
 
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
 
# Left: Logit function
axes[0].plot(p, logit_p, 'b-', linewidth=2)
axes[0].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0].axvline(x=0.5, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xlabel('Probability p', fontsize=12)
axes[0].set_ylabel('Log-Odds = logit(p)', fontsize=12)
axes[0].set_title('Logit Link Function\ng(p) = log(p/(1-p))', fontsize=14)
axes[0].set_xlim(0, 1)
axes[0].set_ylim(-6, 6)
axes[0].grid(True, alpha=0.3)
 
# Middle: Sigmoid function
axes[1].plot(eta, sigmoid_eta, 'r-', linewidth=2)
axes[1].axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Linear Predictor η', fontsize=12)
axes[1].set_ylabel('Probability p = σ(η)', fontsize=12)
axes[1].set_title('Sigmoid (Inverse Logit)\np = 1/(1 + e^{-η})', fontsize=14)
axes[1].set_ylim(0, 1)
axes[1].grid(True, alpha=0.3)
 
# Right: Derivative of sigmoid (sensitivity)
sigmoid_deriv = sigmoid_eta * (1 - sigmoid_eta)
axes[2].plot(eta, sigmoid_deriv, 'g-', linewidth=2)
axes[2].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[2].set_xlabel('Linear Predictor η', fontsize=12)
axes[2].set_ylabel("Sensitivity dp/dη", fontsize=12)
axes[2].set_title("Sigmoid Derivative\n∂p/∂η = p(1-p)", fontsize=14)
axes[2].annotate('Maximum sensitivity\nat η=0 (p=0.5)', 
                 xy=(0, 0.25), xytext=(2, 0.2),
                 fontsize=10, arrowprops=dict(arrowstyle='->', color='gray'))
axes[2].grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('logit_link.png', dpi=150, bbox_inches='tight')
plt.show()

The S-Curve Intuition

The Probit Link

The probit link is an alternative to the logit for binary response data:

$$g(p) = \Phi^{-1}(p) \qquad g^{-1}(\eta) = \Phi(\eta)$$

where $\Phi$ is the cumulative distribution function (CDF) of the standard normal distribution:

$$\Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z} e^{-t^2/2} , dt$$

Motivation: The Latent Variable Interpretation

Probit regression has an elegant interpretation through latent variables. Suppose there exists an unobserved continuous variable $Y^*_i$ such that:

$$Y^*_i = \mathbf{x}_i^\top \boldsymbol{\beta} + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0, 1)$$

We observe $Y_i = 1$ if $Y^*_i > 0$ and $Y_i = 0$ otherwise. Then:

$$P(Y_i = 1) = P(Y^*_i > 0) = P(\mathbf{x}_i^\top \boldsymbol{\beta} + \varepsilon_i > 0) = P(\varepsilon_i > -\mathbf{x}_i^\top \boldsymbol{\beta})$$

Since $\varepsilon_i \sim \mathcal{N}(0,1)$ is symmetric: $$P(Y_i = 1) = \Phi(\mathbf{x}_i^\top \boldsymbol{\beta})$$

This latent variable interpretation is widely used in economics (discrete choice theory) and psychometrics.

Advantages of Probit

•Natural latent variable interpretation
•Connects to multivariate normal theory for extensions
•Thinner tails than logit (more conservative predictions near extremes)
•Standard in economics and some social sciences
•Extends naturally to ordered outcomes (ordered probit)

Disadvantages of Probit

•No closed-form odds ratio interpretation
•Computationally more expensive (requires erf function)
•Less familiar to most practitioners outside economics
•Differences from logit are often negligible in practice
•Not a canonical link for any exponential family

Logit vs. Probit: A Comparison

In practice, logit and probit give nearly identical predictions for most datasets. The relationship between them is approximately:

$$\text{logit}(p) \approx \frac{\pi}{\sqrt{3}} \cdot \Phi^{-1}(p) \approx 1.81 \cdot \Phi^{-1}(p)$$

Thus, probit coefficients are roughly 1.8 times smaller than logit coefficients for the same data.

The main difference is in the tails:

Logit has heavier tails (logistic distribution has kurtosis 6, vs. 3 for normal)
Probit approaches 0 and 1 more quickly as $|\eta|$ increases
For extreme predictions, logit is more conservative (higher uncertainty)

When to choose:

Use logit when odds ratio interpretation is valuable, or for consistency with common practice
Use probit when the latent variable interpretation is meaningful, or when convention in your field dictates

Practical Guidance

The Log Link

The log link is the canonical link for the Poisson distribution and commonly used for Gamma regression:

$$g(\mu) = \log(\mu) \qquad g^{-1}(\eta) = e^\eta$$

Properties

Domain: $\mu \in (0, \infty)$ Range: $\eta \in (-\infty, +\infty)$

Derivative: $g'(\mu) = 1/\mu$

Key Feature: The inverse link $\mu = e^\eta$ is always positive, regardless of $\eta$. This automatically satisfies the positivity constraint for counts and positive continuous responses.

Interpretation: Multiplicative Effects

With the log link: $$\log(\mu) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p$$

Exponentiating: $$\mu = e^{\beta_0} \cdot e^{\beta_1 X_1} \cdot \ldots \cdot e^{\beta_p X_p}$$

Predictor effects are multiplicative on the mean. When $X_j$ increases by 1:

$$\frac{\mu_{\text{after}}}{\mu_{\text{before}}} = e^{\beta_j}$$

The quantity $e^{\beta_j}$ is the rate ratio or incidence rate ratio (IRR) in epidemiology.

Interpreting Log-Link Coefficients (Poisson Regression)
Coefficient β_j	Rate Ratio e^(β_j)	Interpretation
β_j = 0	RR = 1.00	No effect on expected count
β_j = 0.1	RR ≈ 1.11	11% increase in expected count per unit X_j
β_j = 0.5	RR ≈ 1.65	65% increase in expected count per unit X_j
β_j = -0.3	RR ≈ 0.74	26% decrease in expected count per unit X_j
β_j = ln(2) ≈ 0.69	RR = 2.00	Expected count doubles per unit X_j

The Log Link for Multiplicative Phenomena

The log link is appropriate when effects combine multiplicatively rather than additively. Many phenomena exhibit this pattern:

Population growth: Each factor multiplies the growth rate
Compound interest: Returns combine multiplicatively
Epidemiology: Risk factors often multiply baseline risk
Economics: Percentage effects are multiplicative
Biology: Many biological processes involve cascading multiplicative effects

In these contexts, the log link provides:

Natural interpretation: Effects as percentage changes
Positivity guarantee: Predictions can never be negative
Proportional effects: A 10% increase is 10% whether the baseline is 5 or 5,000

log_link_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
import matplotlib.pyplot as plt
 
# Log link: g(μ) = log(μ)
# Inverse: g^{-1}(η) = exp(η)
 
mu = np.linspace(0.01, 10, 1000)
eta = np.linspace(-3, 3, 1000)
 
# Log function
log_mu = np.log(mu)
 
# Exponential (inverse)
exp_eta = np.exp(eta)
 
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
# Left: Log link function
axes[0].plot(mu, log_mu, 'b-', linewidth=2)
axes[0].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0].axvline(x=1, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xlabel('Mean μ (must be > 0)', fontsize=12)
axes[0].set_ylabel('Linear Predictor η = log(μ)', fontsize=12)
axes[0].set_title('Log Link Function\ng(μ) = log(μ)', fontsize=14)
axes[0].set_xlim(0, 10)
axes[0].set_ylim(-5, 3)
axes[0].grid(True, alpha=0.3)
 
# Right: Exponential (inverse log)
axes[1].plot(eta, exp_eta, 'r-', linewidth=2)
axes[1].axhline(y=1, color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Linear Predictor η', fontsize=12)
axes[1].set_ylabel('Mean μ = exp(η)', fontsize=12)
axes[1].set_title('Exponential (Inverse Log)\nμ = e^η (always positive!)', fontsize=14)
axes[1].set_ylim(0, 15)
axes[1].grid(True, alpha=0.3)
 
# Add annotation showing multiplicative interpretation
axes[1].annotate('η goes from 1 to 2:\nμ increases by factor e ≈ 2.72', 
                 xy=(1.5, np.exp(1.5)), xytext=(0, 10),
                 fontsize=10, arrowprops=dict(arrowstyle='->', color='gray'))
 
plt.tight_layout()
plt.savefig('log_link.png', dpi=150, bbox_inches='tight')
plt.show()

Log Link vs. Log-Transforming Y

The Complementary Log-Log Link

The complementary log-log (cloglog) link is an asymmetric alternative to logit and probit for binary data:

$$g(p) = \log(-\log(1-p)) \qquad g^{-1}(\eta) = 1 - \exp(-\exp(\eta))$$

Properties and Interpretation

Unlike the symmetric logit and probit, the cloglog link is asymmetric:

It approaches 0 relatively slowly as $\eta \to -\infty$
It approaches 1 rapidly as $\eta \to +\infty$

Connection to Extreme Value Theory

Specifically, if $T \sim \text{Exponential}(\lambda)$ and we observe $Y = I(T \leq t_0)$:

$$P(Y = 1) = P(T \leq t_0) = 1 - e^{-\lambda t_0}$$

If $\log(\lambda t_0) = \eta$, then $P(Y=1) = 1 - e^{-e^\eta}$, which is the cloglog inverse link.

When to Use Cloglog

Grouped survival data: When binary outcomes represent "event occurred within period"
Proportion of infected/diseased: When a latent continuous process has an extreme-value distribution
Asymmetric responses: When the probability curve should be asymmetric (steep on one side, gradual on the other)

Comparing Links for Binary Data
Property	Logit	Probit	Cloglog
Symmetry	Symmetric around p=0.5	Symmetric around p=0.5	Asymmetric
Tail behavior	Heavy tails (logistic)	Moderate tails (normal)	Asymmetric tails
Canonical for	Binomial	None	None
Interpretation	Log-odds (odds ratio)	Latent normal threshold	Hazard model
Common in	Medicine, ML, general	Economics, psychometrics	Survival, epidemiology

Choosing Among Binary Links

The Inverse Link and Other Specialized Links

The Inverse (Reciprocal) Link

The inverse link is the canonical link for the Gamma distribution:

$$g(\mu) = \frac{1}{\mu} \qquad g^{-1}(\eta) = \frac{1}{\eta}$$

Properties:

Domain: $\mu > 0$
Range: $\eta > 0$ (when used canonically) or all reals if we allow $\mu = 1/\eta$ with appropriate sign handling

The Inverse Squared Link

The inverse squared link is canonical for the Inverse Gaussian distribution:

$$g(\mu) = \frac{1}{\mu^2} \qquad g^{-1}(\eta) = \frac{1}{\sqrt{\eta}}$$

The Square Root Link

The square root link is sometimes used for Poisson data:

$$g(\mu) = \sqrt{\mu} \qquad g^{-1}(\eta) = \eta^2$$

This link ensures $\mu > 0$ only for $\eta > 0$, and has variance-stabilizing properties for Poisson data (the square root transformation makes Poisson variance approximately constant).

Complete Link Function Reference
Link Name	g(μ)	g⁻¹(η)	Mean Space	Typical Use
Identity	μ	η	ℝ	Normal regression
Log	log(μ)	e^η	(0,∞)	Poisson, Gamma
Logit	log(p/(1-p))	1/(1+e^(-η))	(0,1)	Binomial
Probit	Φ⁻¹(p)	Φ(η)	(0,1)	Binomial (economics)
Cloglog	log(-log(1-p))	1-exp(-exp(η))	(0,1)	Binomial (survival)
Inverse	1/μ	1/η	(0,∞)	Gamma (canonical)
Inverse squared	1/μ²	1/√η	(0,∞)	Inverse Gaussian
Square root	√μ	η²	[0,∞)	Poisson (var-stab)

Custom Links

Choosing the Right Link Function

Selecting an appropriate link function involves balancing several considerations:

Decision Framework

Step 1: Ensure Validity The link must map the mean space to ℝ. For binary data, you need a link that maps (0,1) to ℝ; for count data, one that maps (0,∞) to ℝ.

Step 2: Consider the Canonical Link The canonical link has nice mathematical properties (sufficient statistics, concavity). Start with the canonical unless you have reasons to deviate.

Step 3: Prioritize Interpretability Coefficients should be meaningful for your application:

Need odds ratios → Logit
Need multiplicative effects → Log
Need additive effects → Identity (if valid)

Link Selection Guidelines by Response Type

•Binary/Binomial Response: Logit (default, odds ratios), Probit (latent normal), Cloglog (survival context)
•Count Response: Log (default, multiplicative effects), Identity (only if means are never near zero), Square root (variance stabilization)
•Positive Continuous: Log (percentage effects, common), Inverse (canonical for Gamma), Identity (rarely appropriate)
•Unbounded Continuous: Identity (always appropriate for Normal distribution)

Comparing Links Empirically

When unsure, you can compare models with different links using:

AIC/BIC: Lower is better, but only comparable within the same response distribution
Residual plots: Check for patterns indicating systematic misfit
Prediction accuracy: Cross-validation error on held-out data
Link test: A formal test that the link is correctly specified (Pregibon's link test adds $\hat{\eta}^2$ as a predictor; if significant, the link may be wrong)

However, often different links give very similar fits, and the choice comes down to interpretation and convention.

Practical Wisdom

Summary: Link Functions

We've explored link functions—the critical bridge between constrained means and unbounded linear predictors. Let's consolidate the key concepts:

Key Takeaways

•Link functions are transformations that map the mean $\mu$ to the linear predictor $\eta = g(\mu)$. They must be monotonic, differentiable, and map the mean space to ℝ.
•The identity link ($g(\mu) = \mu$) gives additive predictor effects—canonical for Normal, but inappropriate for constrained responses.
•The logit link ($g(p) = \log(p/(1-p))$) maps probabilities to log-odds, giving odds ratio interpretation—canonical for Binomial.
•The probit link uses the inverse normal CDF, motivated by latent normal threshold models. Similar to logit but with different tail behavior.
•The log link ($g(\mu) = \log(\mu)$) gives multiplicative effects (rate ratios)—canonical for Poisson, also common for Gamma.
•Canonical links have mathematical advantages but aren't always best for interpretation. Non-canonical links are perfectly valid.
•Link choice affects interpretation: coefficients represent effects on the transformed scale (log-odds, log-mean, etc.).

What's Next:

Page Complete