Loading content...
Fitting a regression model yields numbers: $\hat{\beta}_0 = 23.5$, $\hat{\beta}_1 = 4.2$. But what do these numbers mean? How do we communicate them to stakeholders? What claims can we legitimately make?
Coefficient interpretation is where regression meets the real world. A technically correct model is worthless if we interpret it incorrectly or overclaim its implications. This page develops the skills to interpret coefficients rigorously, honestly, and usefully.
By the end of this page, you will interpret slope and intercept correctly with units, understand standardized coefficients, distinguish correlation from causation, recognize common interpretation errors, and communicate regression results to both technical and non-technical audiences.
The slope $\hat{\beta}_1$ is the heart of simple linear regression. It quantifies the relationship between $x$ and $y$.
Marginal Effect Interpretation
The slope represents the expected change in $y$ for a one-unit increase in $x$:
$$\hat{\beta}_1 = \frac{\Delta \hat{y}}{\Delta x}$$
If $\hat{\beta}_1 = 4.2$, then increasing $x$ by 1 unit is associated with an expected increase of 4.2 units in $y$.
Example: In a regression of house price ($y$, in thousands of dollars) on square footage ($x$, in hundreds of square feet):
$$\widehat{\text{Price}} = 50 + 4.2 \times \text{SqFt}$$
Interpretation: "For each additional 100 square feet, the expected price increases by $4,200."
The slope's numerical value depends entirely on the units of measurement. If we measured square footage in thousands (not hundreds), the slope would be 42, not 4.2. Always report units: 'β₁ = 4.2 thousand dollars per hundred square feet' is complete; 'β₁ = 4.2' is meaningless without context.
| Response (y) | Predictor (x) | Slope | Interpretation |
|---|---|---|---|
| Exam score (points) | Study hours | 5.3 | Each additional study hour → 5.3 points higher expected score |
| Salary ($1000s) | Years experience | 2.1 | Each additional year → $2,100 higher expected salary |
| Blood pressure (mmHg) | Weight (kg) | 0.4 | Each additional kg → 0.4 mmHg higher expected BP |
| Click-through rate (%) | Ad position | -0.8 | Each position lower (worse) → 0.8% lower expected CTR |
Sign of the Slope
Magnitude of the Slope
The magnitude $|\hat{\beta}_1|$ indicates the steepness of the relationship. But be careful: a "large" slope in one context (prices) may be "small" in another (probabilities). Practical significance depends on the domain.
The intercept $\hat{\beta}_0$ is often overlooked, but understanding it correctly prevents common errors.
Mathematical Definition
The intercept is the predicted value of $y$ when $x = 0$:
$$\hat{y}|_{x=0} = \hat{\beta}_0$$
It's where the regression line crosses the vertical axis.
When the Intercept is Meaningful
The intercept has a natural interpretation only when $x = 0$ is:
Example of meaningful intercept: In a regression of test score on hours studied, $\hat{\beta}_0$ represents the expected score with zero study time (the baseline before any studying).
In many cases, x = 0 is outside the observed range or nonsensical. Weight = 0 kg? Year = 0 AD? Temperature = 0°F? In such cases, the intercept is a mathematical anchor that enables accurate predictions in the observed range—don't interpret it literally.
The Intercept's Role in Prediction
Even when not directly interpretable, the intercept is essential for accurate predictions. It positions the line vertically so that predictions are correct on average. Removing or constraining the intercept to zero (when inappropriate) biases predictions.
Centering to Create Meaningful Intercepts
A useful technique: center $x$ by subtracting its mean. If we regress $y$ on $(x - \bar{x})$:
$$\hat{y} = \hat{\beta}_0^* + \hat{\beta}_1(x - \bar{x})$$
Now $\hat{\beta}_0^* = \bar{y}$ (the mean of $y$), and the intercept represents the predicted value at the average $x$. This is often more interpretable.
The raw slope $\hat{\beta}_1$ depends on the scales of $x$ and $y$. To compare effects across different variables or studies, we standardize.
Standardized (Beta) Coefficients
If we standardize both variables to have mean 0 and standard deviation 1:
$$z_x = \frac{x - \bar{x}}{s_x}, \quad z_y = \frac{y - \bar{y}}{s_y}$$
And regress $z_y$ on $z_x$, the slope is:
$$\hat{\beta}1^{\text{std}} = r{xy}$$
The standardized slope equals the correlation coefficient! This is unit-free.
You can convert without re-running the regression: β₁ˢᵗᵈ = β₁ × (sₓ/sᵧ). The standardized slope tells you: 'A one standard deviation increase in x is associated with a β₁ˢᵗᵈ standard deviation change in y.'
When to Use Standardized Coefficients
| Use Standardized When | Use Raw When |
|---|---|
| Comparing effects of variables on different scales | Units have practical meaning |
| Communicating to audiences unfamiliar with units | Stakeholders need specific predictions |
| Summarizing effect size in meta-analyses | Making policy decisions ("increase X by 10") |
| Variables measured in arbitrary units | Natural interpretation exists |
Example: In predicting salary from years of education and years of experience:
Both are correct; they answer different questions.
| Aspect | Raw Coefficient | Standardized Coefficient |
|---|---|---|
| Formula | $\hat{\beta}1 = S{xy}/S_{xx}$ | $\hat{\beta}1^{\text{std}} = r{xy}$ |
| Units | Units of $y$ per unit of $x$ | Dimensionless (std. dev. per std. dev.) |
| Range | $(-\infty, \infty)$ | $[-1, 1]$ |
| Interpretation | Absolute change in $y$ per unit $x$ | Relative change in std. dev. units |
| Use case | Practical predictions | Effect size comparison |
Perhaps no topic in regression is more important—or more frequently misunderstood—than the distinction between association and causation.
What Regression Measures
Ordinary regression measures association: the pattern of covariation between $x$ and $y$ in observed data. The slope $\hat{\beta}_1$ tells us how $y$ tends to differ when $x$ differs.
What Regression Does NOT Automatically Measure
Regression does not, by itself, tell us what would happen if we intervened to change $x$. Intervention is a causal concept; covariation is an associational concept.
Correlation (association) does not imply causation. A regression slope of 5 tells us 'observations with higher x tend to have higher y.' It does NOT tell us 'increasing x will cause y to increase.' These are different claims with different implications.
Why Might Association ≠ Causation?
Three main threats to causal interpretation:
Confounding: A third variable $z$ causes both $x$ and $y$. Even with no direct $x → y$ effect, we observe an $x$-$y$ association.
Reverse causation: $y$ causes $x$, not vice versa. Example: Successful companies advertise more (success → advertising), not advertising → success.
Selection bias: The sample itself is selected based on $x$ or $y$ in ways that distort relationships.
| Observed Association | Confounding Variable | Reality |
|---|---|---|
| Ice cream sales ↔ drowning deaths | Temperature/season | Hot weather causes both (no direct link) |
| Firefighters present ↔ fire damage | Fire severity | Bigger fires bring more firefighters AND cause more damage |
| Hospital care ↔ death rate | Patient severity | Sicker patients seek hospitals AND die more often |
| Shoe size ↔ reading ability (in children) | Age | Older children have bigger feet AND read better |
When Can We Make Causal Claims?
Causal interpretation is justified when:
Randomized experiment: $x$ is randomly assigned, breaking confounding. (Gold standard)
Natural experiment: Some external event randomly varies $x$. (Approximates randomization)
Careful observational design: All confounders measured and controlled for, with strong theory. (Requires strong assumptions)
Safe Language for Observational Data
Use associational language:
Avoid causal language:
Regression coefficients are estimated from observed data. Using them to predict far outside the observed range—extrapolation—is treacherous.
What is Extrapolation?
Given data with $x$ ranging from 10 to 50, predicting at $x = 45$ is interpolation (within range). Predicting at $x = 100$ is extrapolation (outside range).
Why is Extrapolation Dangerous?
The linear relationship is only confirmed within the observed range. Beyond it:
A linear model for Olympic 100m sprint times vs. year might fit well from 1900-2020. Extrapolating backward predicts ancient Greeks ran the 100m in 3 seconds. Extrapolating forward predicts zero or negative times by 2300. Both are absurd—the linear model breaks down outside the observed range.
Quantifying Extrapolation Risk
Recall the prediction variance at a new $x_0$:
$$\text{Var}(\hat{y}0) = \sigma^2 \left[ \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{S{xx}} \right]$$
The term $(x_0 - \bar{x})^2/S_{xx}$ grows as $x_0$ moves away from the mean. Prediction intervals widen rapidly during extrapolation—the model honestly admits uncertainty.
Best Practices
Regression results must be communicated to various audiences: fellow data scientists, business stakeholders, policymakers, and the general public. Tailoring your interpretation is essential.
For Technical Audiences
Provide complete details:
"The OLS estimate is $\hat{\beta}_1 = 4.23$ (SE = 0.45, p < 0.001, 95% CI: 3.35–5.11). For each additional 100 sqft, expected price increases by $4,230, holding other factors constant. The model explains 78% of variance (R² = 0.78)."
For Business Stakeholders
Focus on actionable insights with uncertainty:
"Larger homes sell for more—roughly $4,000 more per 100 additional square feet. This relationship is statistically reliable. However, this doesn't mean adding 100 sqft will increase any particular home's value by exactly $4,000."
For General Audiences
Simplify without distorting:
"We found that bigger homes tend to sell for higher prices. On average, a home 100 square feet larger sells for about $4,000 more."
In simple regression, we cannot hold other factors constant—we only have one predictor. The phrase 'all else being equal' or 'ceteris paribus' technically applies only to multiple regression. In simple regression, correctly say 'on average' or 'among observations in our data.'
Even experienced practitioners make interpretation mistakes. Here are the most common and how to avoid them:
| Error | What's Wrong | Correction |
|---|---|---|
| "β = 4 means x causes 4 units change in y" | Conflates association with causation | "Associated with" or "predicts," not "causes" |
| "If we increase x by 1, y increases by β" | Implies intervention when only observing | "Observations with 1 higher x have β higher y on average" |
| Ignoring units | Slope is meaningless without units | Always specify units: "$4.2K per 100 sqft" |
| Over-interpreting small R² | "Regression failed" if R² < 0.5 | Low R² may still yield useful/significant predictors |
| Interpreting intercept literally when x=0 is impossible | "At 0 years old, salary is $X" | Note when intercept has no real-world meaning |
| Extrapolating confidently | Predicting far outside data range | Acknowledge as extrapolation with high uncertainty |
| Confusing statistical with practical significance | "p < 0.05 so β is important" | A tiny but precise effect may not matter practically |
A slope of β₁ = 0.001 might be statistically significant (p < 0.001) with enough data. But if x is 'advertising dollars' and y is 'revenue,' a $0.001 increase per advertising dollar is practically useless. Always consider: Is the effect size large enough to matter?
Defensive Practices
Interpreting regression coefficients correctly is where technical analysis meets real-world communication. Let's consolidate:
What's Next
We've covered model formulation, derivation, geometry, and interpretation. The final piece is understanding the assumptions that underpin all of this theory. Under what conditions are OLS estimates unbiased, consistent, and efficient? When do our confidence intervals and p-values mean what we think they mean? The next page addresses these critical questions.
You can now interpret regression coefficients correctly, communicate results to diverse audiences, and avoid common interpretation errors. Next, we'll examine the assumptions required for valid statistical inference—the fine print that makes everything work.