Simple Linear Regression - Learning Module

Loading content...

0/278

Coefficient Interpretation

Making Sense of Numbers

Fitting a regression model yields numbers: $\hat{\beta}_0 = 23.5$, $\hat{\beta}_1 = 4.2$. But what do these numbers mean? How do we communicate them to stakeholders? What claims can we legitimately make?

Coefficient interpretation is where regression meets the real world. A technically correct model is worthless if we interpret it incorrectly or overclaim its implications. This page develops the skills to interpret coefficients rigorously, honestly, and usefully.

What You Will Learn

By the end of this page, you will interpret slope and intercept correctly with units, understand standardized coefficients, distinguish correlation from causation, recognize common interpretation errors, and communicate regression results to both technical and non-technical audiences.

The Slope Coefficient

The slope $\hat{\beta}_1$ is the heart of simple linear regression. It quantifies the relationship between $x$ and $y$.

Marginal Effect Interpretation

The slope represents the expected change in $y$ for a one-unit increase in $x$:

$$\hat{\beta}_1 = \frac{\Delta \hat{y}}{\Delta x}$$

If $\hat{\beta}_1 = 4.2$, then increasing $x$ by 1 unit is associated with an expected increase of 4.2 units in $y$.

Example: In a regression of house price ($y$, in thousands of dollars) on square footage ($x$, in hundreds of square feet):

$$\widehat{\text{Price}} = 50 + 4.2 \times \text{SqFt}$$

Interpretation: "For each additional 100 square feet, the expected price increases by $4,200."

Units Matter Critically

The slope's numerical value depends entirely on the units of measurement. If we measured square footage in thousands (not hundreds), the slope would be 42, not 4.2. Always report units: 'β₁ = 4.2 thousand dollars per hundred square feet' is complete; 'β₁ = 4.2' is meaningless without context.

Slope Interpretation Examples
Response (y)	Predictor (x)	Slope	Interpretation
Exam score (points)	Study hours	5.3	Each additional study hour → 5.3 points higher expected score
Salary ($1000s)	Years experience	2.1	Each additional year → $2,100 higher expected salary
Blood pressure (mmHg)	Weight (kg)	0.4	Each additional kg → 0.4 mmHg higher expected BP
Click-through rate (%)	Ad position	-0.8	Each position lower (worse) → 0.8% lower expected CTR

Sign of the Slope

$\hat{\beta}_1 > 0$: Positive relationship. As $x$ increases, $\hat{y}$ increases.
$\hat{\beta}_1 < 0$: Negative relationship. As $x$ increases, $\hat{y}$ decreases.
$\hat{\beta}_1 = 0$: No linear relationship (but could be nonlinear).

Magnitude of the Slope

The magnitude $|\hat{\beta}_1|$ indicates the steepness of the relationship. But be careful: a "large" slope in one context (prices) may be "small" in another (probabilities). Practical significance depends on the domain.

The Intercept Coefficient

The intercept $\hat{\beta}_0$ is often overlooked, but understanding it correctly prevents common errors.

Mathematical Definition

The intercept is the predicted value of $y$ when $x = 0$:

$$\hat{y}|_{x=0} = \hat{\beta}_0$$

It's where the regression line crosses the vertical axis.

When the Intercept is Meaningful

The intercept has a natural interpretation only when $x = 0$ is:

A possible value in the data range
Scientifically meaningful

Example of meaningful intercept: In a regression of test score on hours studied, $\hat{\beta}_0$ represents the expected score with zero study time (the baseline before any studying).

When the Intercept is Not Meaningful

In many cases, x = 0 is outside the observed range or nonsensical. Weight = 0 kg? Year = 0 AD? Temperature = 0°F? In such cases, the intercept is a mathematical anchor that enables accurate predictions in the observed range—don't interpret it literally.

Meaningful Intercept

•Exam score vs. study hours (0 hours is possible)
•Revenue vs. ad spend (0 spend means organic only)
•Distance traveled vs. time (at t=0, at starting point)
•Tip amount vs. bill size (0 bill = no tip)

Not Meaningful Intercept

•Blood pressure vs. weight (weight 0 is impossible)
•Salary vs. years experience (can't have negative years)
•Temperature vs. altitude (0 altitude may be outside range)
•GDP vs. population (0 population = no economy)

The Intercept's Role in Prediction

Even when not directly interpretable, the intercept is essential for accurate predictions. It positions the line vertically so that predictions are correct on average. Removing or constraining the intercept to zero (when inappropriate) biases predictions.

Centering to Create Meaningful Intercepts

A useful technique: center $x$ by subtracting its mean. If we regress $y$ on $(x - \bar{x})$:

$$\hat{y} = \hat{\beta}_0^* + \hat{\beta}_1(x - \bar{x})$$

Now $\hat{\beta}_0^* = \bar{y}$ (the mean of $y$), and the intercept represents the predicted value at the average $x$. This is often more interpretable.

Standardized Coefficients

The raw slope $\hat{\beta}_1$ depends on the scales of $x$ and $y$. To compare effects across different variables or studies, we standardize.

Standardized (Beta) Coefficients

If we standardize both variables to have mean 0 and standard deviation 1:

$$z_x = \frac{x - \bar{x}}{s_x}, \quad z_y = \frac{y - \bar{y}}{s_y}$$

And regress $z_y$ on $z_x$, the slope is:

$$\hat{\beta}1^{\text{std}} = r{xy}$$

The standardized slope equals the correlation coefficient! This is unit-free.

Converting Between Raw and Standardized

You can convert without re-running the regression: β₁ˢᵗᵈ = β₁ × (sₓ/sᵧ). The standardized slope tells you: 'A one standard deviation increase in x is associated with a β₁ˢᵗᵈ standard deviation change in y.'

When to Use Standardized Coefficients

Use Standardized When	Use Raw When
Comparing effects of variables on different scales	Units have practical meaning
Communicating to audiences unfamiliar with units	Stakeholders need specific predictions
Summarizing effect size in meta-analyses	Making policy decisions ("increase X by 10")
Variables measured in arbitrary units	Natural interpretation exists

Example: In predicting salary from years of education and years of experience:

Raw: "Each year of education → $5,000 more salary"
Standardized: "Education has twice the effect of experience (β̂ = 0.4 vs 0.2)"

Both are correct; they answer different questions.

Raw vs. Standardized Coefficient Comparison
Aspect	Raw Coefficient	Standardized Coefficient
Formula	$\hat{\beta}1 = S{xy}/S_{xx}$	$\hat{\beta}1^{\text{std}} = r{xy}$
Units	Units of $y$ per unit of $x$	Dimensionless (std. dev. per std. dev.)
Range	$(-\infty, \infty)$	$[-1, 1]$
Interpretation	Absolute change in $y$ per unit $x$	Relative change in std. dev. units
Use case	Practical predictions	Effect size comparison

Association vs. Causation

Perhaps no topic in regression is more important—or more frequently misunderstood—than the distinction between association and causation.

What Regression Measures

Ordinary regression measures association: the pattern of covariation between $x$ and $y$ in observed data. The slope $\hat{\beta}_1$ tells us how $y$ tends to differ when $x$ differs.

What Regression Does NOT Automatically Measure

Regression does not, by itself, tell us what would happen if we intervened to change $x$. Intervention is a causal concept; covariation is an associational concept.

The Fundamental Caution

Correlation (association) does not imply causation. A regression slope of 5 tells us 'observations with higher x tend to have higher y.' It does NOT tell us 'increasing x will cause y to increase.' These are different claims with different implications.

Why Might Association ≠ Causation?

Three main threats to causal interpretation:

Confounding: A third variable $z$ causes both $x$ and $y$. Even with no direct $x → y$ effect, we observe an $x$-$y$ association.
Reverse causation: $y$ causes $x$, not vice versa. Example: Successful companies advertise more (success → advertising), not advertising → success.
Selection bias: The sample itself is selected based on $x$ or $y$ in ways that distort relationships.

Classic Confounding Examples
Observed Association	Confounding Variable	Reality
Ice cream sales ↔ drowning deaths	Temperature/season	Hot weather causes both (no direct link)
Firefighters present ↔ fire damage	Fire severity	Bigger fires bring more firefighters AND cause more damage
Hospital care ↔ death rate	Patient severity	Sicker patients seek hospitals AND die more often
Shoe size ↔ reading ability (in children)	Age	Older children have bigger feet AND read better

When Can We Make Causal Claims?

Causal interpretation is justified when:

Randomized experiment: $x$ is randomly assigned, breaking confounding. (Gold standard)
Natural experiment: Some external event randomly varies $x$. (Approximates randomization)
Careful observational design: All confounders measured and controlled for, with strong theory. (Requires strong assumptions)

Safe Language for Observational Data

Use associational language:

✅ "X is associated with higher Y"
✅ "Higher X predicts higher Y"
✅ "For each unit difference in X, we observe 5 units difference in Y"

Avoid causal language:

❌ "X causes Y"
❌ "Increasing X will increase Y"
❌ "To raise Y, raise X"

Extrapolation Dangers

Regression coefficients are estimated from observed data. Using them to predict far outside the observed range—extrapolation—is treacherous.

What is Extrapolation?

Given data with $x$ ranging from 10 to 50, predicting at $x = 45$ is interpolation (within range). Predicting at $x = 100$ is extrapolation (outside range).

Why is Extrapolation Dangerous?

The linear relationship is only confirmed within the observed range. Beyond it:

The true relationship may become nonlinear
The underlying mechanism may change
No data exists to detect or correct errors
Predictions can become absurd

Classic Extrapolation Failure

A linear model for Olympic 100m sprint times vs. year might fit well from 1900-2020. Extrapolating backward predicts ancient Greeks ran the 100m in 3 seconds. Extrapolating forward predicts zero or negative times by 2300. Both are absurd—the linear model breaks down outside the observed range.

Interpolation (Generally Safe)

•Predicting within the observed range of x
•Data verifies the model in this region
•Errors are bounded by observed residuals
•Predictions typically reliable

Extrapolation (Often Dangerous)

•Predicting outside the observed range of x
•No data to verify the model here
•Errors can grow unboundedly
•Predictions require strong assumptions

Quantifying Extrapolation Risk

Recall the prediction variance at a new $x_0$:

$$\text{Var}(\hat{y}0) = \sigma^2 \left[ \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{S{xx}} \right]$$

The term $(x_0 - \bar{x})^2/S_{xx}$ grows as $x_0$ moves away from the mean. Prediction intervals widen rapidly during extrapolation—the model honestly admits uncertainty.

Best Practices

Report the range of $x$ in your data
Be explicit when predictions are extrapolations
Use wide prediction intervals for extrapolated values
Consider domain knowledge about boundary behavior

Practical Communication

Regression results must be communicated to various audiences: fellow data scientists, business stakeholders, policymakers, and the general public. Tailoring your interpretation is essential.

For Technical Audiences

Provide complete details:

"The OLS estimate is $\hat{\beta}_1 = 4.23$ (SE = 0.45, p < 0.001, 95% CI: 3.35–5.11). For each additional 100 sqft, expected price increases by $4,230, holding other factors constant. The model explains 78% of variance (R² = 0.78)."

For Business Stakeholders

Focus on actionable insights with uncertainty:

"Larger homes sell for more—roughly $4,000 more per 100 additional square feet. This relationship is statistically reliable. However, this doesn't mean adding 100 sqft will increase any particular home's value by exactly $4,000."

For General Audiences

Simplify without distorting:

"We found that bigger homes tend to sell for higher prices. On average, a home 100 square feet larger sells for about $4,000 more."

Communication Checklist

•State units clearly — "dollars per square foot" not just "4.2"
•Use plain language — "associated with" not "the coefficient of"
•Acknowledge uncertainty — Report confidence intervals or ranges
•Avoid causal overclaiming — Unless justified by design
•Provide context — Is 4.2 large or small in this domain?
•Note limitations —Range of data, possible confounders
•Use examples — "A 2,000 sqft home vs. a 1,500 sqft home..."

The 'All Else Equal' Caveat

In simple regression, we cannot hold other factors constant—we only have one predictor. The phrase 'all else being equal' or 'ceteris paribus' technically applies only to multiple regression. In simple regression, correctly say 'on average' or 'among observations in our data.'

Common Interpretation Errors

Even experienced practitioners make interpretation mistakes. Here are the most common and how to avoid them:

Common Interpretation Errors
Error	What's Wrong	Correction
"β = 4 means x causes 4 units change in y"	Conflates association with causation	"Associated with" or "predicts," not "causes"
"If we increase x by 1, y increases by β"	Implies intervention when only observing	"Observations with 1 higher x have β higher y on average"
Ignoring units	Slope is meaningless without units	Always specify units: "$4.2K per 100 sqft"
Over-interpreting small R²	"Regression failed" if R² < 0.5	Low R² may still yield useful/significant predictors
Interpreting intercept literally when x=0 is impossible	"At 0 years old, salary is $X"	Note when intercept has no real-world meaning
Extrapolating confidently	Predicting far outside data range	Acknowledge as extrapolation with high uncertainty
Confusing statistical with practical significance	"p < 0.05 so β is important"	A tiny but precise effect may not matter practically

Statistical Significance ≠ Practical Importance

A slope of β₁ = 0.001 might be statistically significant (p < 0.001) with enough data. But if x is 'advertising dollars' and y is 'revenue,' a $0.001 increase per advertising dollar is practically useless. Always consider: Is the effect size large enough to matter?

Defensive Practices

Write out interpretations in words before presenting—errors become obvious
Include units in every equation you present
Have a non-statistician review your plain-language summary
Ask yourself: "What would have to be true for my causal interpretation to be valid?"
Report effect sizes alongside p-values
Visualize predictions to check for absurdities

Summary: Coefficient Interpretation

Interpreting regression coefficients correctly is where technical analysis meets real-world communication. Let's consolidate:

Key Takeaways

•Slope meaning: Expected change in y per one-unit change in x (always with units!)
•Intercept meaning: Expected y when x = 0 (meaningful only if x = 0 makes sense)
•Standardized coefficients: Unit-free, equal to correlation for simple regression
•Association ≠ Causation: Regression measures covariation, not intervention effects
•Confounding: Third variables can create spurious associations
•Extrapolation: Predictions outside the data range are unreliable
•Communication: Tailor language to audience; always include units and uncertainty
•Common errors: Know the pitfalls—overclaiming causation, ignoring units, conflating statistical and practical significance

What's Next

We've covered model formulation, derivation, geometry, and interpretation. The final piece is understanding the assumptions that underpin all of this theory. Under what conditions are OLS estimates unbiased, consistent, and efficient? When do our confidence intervals and p-values mean what we think they mean? The next page addresses these critical questions.

Page Complete

You can now interpret regression coefficients correctly, communicate results to diverse audiences, and avoid common interpretation errors. Next, we'll examine the assumptions required for valid statistical inference—the fine print that makes everything work.