Loading content...
MSE tells you the average squared error. MAE tells you the average absolute error. But neither answers a fundamental question: How much of the variation in outcomes is my model actually explaining?
Enter R-squared (R²), also called the coefficient of determination. R² provides a scale-independent measure that answers: 'Of all the variance in y, what fraction does my model capture?'
An R² of 0.85 means your model explains 85% of the variance in outcomes—a statement that's meaningful regardless of whether you're predicting house prices in dollars or temperatures in Celsius.
By the end of this page, you will understand the mathematical definition and derivation of R², the geometric interpretation as projection, why R² ranges from 0 to 1 (usually), edge cases where R² can be negative, the overfitting problem and why adjusted R² exists, how to compute and use adjusted R², and limitations and appropriate use of R².
R-squared is defined as the proportion of variance in the dependent variable that is explained by the model:
$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$$
Where:
Let's unpack this formula to build intuition.
The Three Sum of Squares
We can decompose the total variation in y into two components:
$$SS_{tot} = SS_{reg} + SS_{res}$$
$SS_{tot}$ — Total Sum of Squares
$SS_{res}$ — Residual Sum of Squares
$SS_{reg}$ — Regression (Explained) Sum of Squares
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
import numpy as np def r_squared_from_scratch(y_true: np.ndarray, y_pred: np.ndarray) -> dict: """ Compute R² and its components from scratch. Returns detailed breakdown of sum of squares. """ y_true = np.asarray(y_true) y_pred = np.asarray(y_pred) n = len(y_true) # Mean of true values (baseline prediction) y_mean = np.mean(y_true) # Sum of Squares components ss_tot = np.sum((y_true - y_mean) ** 2) # Total variance ss_res = np.sum((y_true - y_pred) ** 2) # Residual (unexplained) ss_reg = np.sum((y_pred - y_mean) ** 2) # Explained by model # R-squared r_squared = 1 - (ss_res / ss_tot) # Alternative formulation (same result for OLS) r_squared_alt = ss_reg / ss_tot return { 'ss_tot': ss_tot, 'ss_res': ss_res, 'ss_reg': ss_reg, 'r_squared': r_squared, 'r_squared_alt': r_squared_alt, 'variance_explained_pct': r_squared * 100, 'mse': ss_res / n } # Example: Predicting test scoresy_true = np.array([65, 72, 80, 85, 90, 75, 70, 88, 95, 78])y_pred = np.array([68, 70, 82, 84, 88, 77, 72, 85, 92, 80]) # Calculate R²result = r_squared_from_scratch(y_true, y_pred) print("=== R² Breakdown ===")print(f"Mean of y: {np.mean(y_true):.2f}")print(f"\nSum of Squares:")print(f" SS_total (variance around mean): {result['ss_tot']:.2f}")print(f" SS_residual (unexplained): {result['ss_res']:.2f}")print(f" SS_regression (explained): {result['ss_reg']:.2f}")print(f"\nR² = 1 - (SS_res / SS_tot)")print(f"R² = 1 - ({result['ss_res']:.2f} / {result['ss_tot']:.2f})")print(f"R² = {result['r_squared']:.4f}")print(f"\n→ Model explains {result['variance_explained_pct']:.1f}% of variance in test scores")Notice that $SS_{res} = n \times MSE$. So R² can be written as: $R^2 = 1 - \frac{n \times MSE}{SS_{tot}} = 1 - \frac{MSE}{Var(y)}$. R² normalizes MSE by the total variance, producing a scale-free metric.
R² has an intuitive interpretation: the fraction of variance in y that is 'explained' by the model.
Baseline Comparison
R² implicitly compares your model to a baseline model that predicts the mean for all samples:
$$R^2 = 1 - \frac{\text{Model MSE}}{\text{Baseline MSE}}$$
So R² measures: How much better is my model than just predicting the mean?
| R² Value | Interpretation | Model Quality |
|---|---|---|
| 1.0 | Perfect predictions (all variance explained) | Perfect (or overfitting) |
| 0.9-1.0 | Explains 90%+ of variance | Excellent |
| 0.7-0.9 | Explains 70-90% of variance | Good |
| 0.5-0.7 | Explains 50-70% of variance | Moderate |
| 0.3-0.5 | Explains 30-50% of variance | Weak |
| 0.0-0.3 | Explains little variance | Poor (but may still be useful) |
| 0.0 | Same as predicting the mean | No predictive power |
| < 0 | Worse than predicting the mean | Actively harmful |
Important Caveats
R² values must be interpreted in context:
Domain-dependent expectations: In physics, R² > 0.99 is common; in social sciences, R² = 0.3 might be excellent.
Doesn't mean 'accurate': R² = 0.9 doesn't mean predictions are within 10% of true values. It means 90% of variance is explained.
Can be misleading for non-linear patterns: R² only measures linear association when used with linear models.
Sensitive to outcome variance: If y has very high variance, even a good model might have moderate R². If y has low variance, even a mediocre model might have high R².
123456789101112131415161718192021222324252627282930313233343536373839404142
import numpy as np def r_squared_vs_accuracy(): """ Demonstrate that high R² doesn't mean predictions are 'close'. """ np.random.seed(42) # Scenario 1: High R² with large absolute errors y_true_1 = np.array([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]) y_pred_1 = y_true_1 + np.array([-30, 20, -40, 35, -25, 30, -35, 40, -20, 25]) ss_tot_1 = np.sum((y_true_1 - np.mean(y_true_1)) ** 2) ss_res_1 = np.sum((y_true_1 - y_pred_1) ** 2) r2_1 = 1 - ss_res_1 / ss_tot_1 mae_1 = np.mean(np.abs(y_true_1 - y_pred_1)) # Scenario 2: Lower R² with smaller absolute errors y_true_2 = np.array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118]) y_pred_2 = y_true_2 + np.array([-3, 2, -4, 3, -2, 3, -3, 4, -2, 2]) ss_tot_2 = np.sum((y_true_2 - np.mean(y_true_2)) ** 2) ss_res_2 = np.sum((y_true_2 - y_pred_2) ** 2) r2_2 = 1 - ss_res_2 / ss_tot_2 mae_2 = np.mean(np.abs(y_true_2 - y_pred_2)) print("=== R² vs Prediction Accuracy ===") print("\nScenario 1: High variance target (100 to 1000)") print(f" R²: {r2_1:.4f}") print(f" MAE: {mae_1:.1f}") print(f" → High R² but predictions off by ~{mae_1:.0f} on average!") print("\nScenario 2: Low variance target (100 to 118)") print(f" R²: {r2_2:.4f}") print(f" MAE: {mae_2:.1f}") print(f" → Lower R² but predictions off by only ~{mae_2:.0f}!") print("\n*** Key Insight ***") print("R² depends on target variance. Same MAE gives different R²") print("depending on how spread out the true values are.") r_squared_vs_accuracy()Never evaluate a model solely on R². Always report R² alongside MAE/RMSE for absolute error magnitude, residual plots for systematic patterns, and domain-specific metrics for business relevance.
R² has an elegant geometric interpretation in the space of observations.
Vectors in n-Dimensional Space
Consider each quantity as a vector in $\mathbb{R}^n$ (one dimension per data point):
R² as Cosine of Angle
For linear regression (OLS), R² equals the squared correlation between y and $\hat{y}$:
$$R^2 = \text{Corr}(y, \hat{y})^2 = \cos^2(\theta)$$
Where $\theta$ is the angle between the centered vectors $(\mathbf{y} - \bar{\mathbf{y}})$ and $(\hat{\mathbf{y}} - \bar{\mathbf{y}})$.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
import numpy as np def geometric_r_squared(y_true: np.ndarray, y_pred: np.ndarray): """ Demonstrate the geometric interpretation of R². R² = cos²(θ) between centered y and y_hat vectors. """ # Center the vectors y_centered = y_true - np.mean(y_true) y_hat_centered = y_pred - np.mean(y_pred) # Compute angle using dot product # cos(θ) = (a · b) / (||a|| × ||b||) dot_product = np.dot(y_centered, y_hat_centered) norm_y = np.linalg.norm(y_centered) norm_y_hat = np.linalg.norm(y_hat_centered) cos_theta = dot_product / (norm_y * norm_y_hat) theta_radians = np.arccos(np.clip(cos_theta, -1, 1)) theta_degrees = np.degrees(theta_radians) # R² from geometry r_squared_geometric = cos_theta ** 2 # R² from standard formula ss_tot = np.sum((y_true - np.mean(y_true)) ** 2) ss_res = np.sum((y_true - y_pred) ** 2) r_squared_standard = 1 - ss_res / ss_tot # Correlation coefficient correlation = np.corrcoef(y_true, y_pred)[0, 1] print("=== Geometric R² Interpretation ===") print(f"\nVector norms:") print(f" ||y - ȳ|| = {norm_y:.4f}") print(f" ||ŷ - ȳ|| = {norm_y_hat:.4f}") print(f"\nAngle between centered vectors:") print(f" cos(θ) = {cos_theta:.4f}") print(f" θ = {theta_degrees:.2f}°") print(f"\nR² calculations:") print(f" cos²(θ) = {r_squared_geometric:.4f}") print(f" 1 - SS_res/SS_tot = {r_squared_standard:.4f}") print(f" Corr(y, ŷ)² = {correlation**2:.4f}") print(f"\n→ All three methods give the same R²!") return r_squared_geometric # Exampley_true = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])y_pred = np.array([1.2, 2.1, 2.8, 4.2, 5.1, 5.9, 7.2, 7.8, 9.1, 10.2]) geometric_r_squared(y_true, y_pred)Projection Interpretation
In linear regression, the predictions $\hat{\mathbf{y}}$ are the orthogonal projection of $\mathbf{y}$ onto the column space of the design matrix $\mathbf{X}$.
By the Pythagorean theorem (since residuals are orthogonal to predictions):
$$||\mathbf{y} - \bar{\mathbf{y}}||^2 = ||\hat{\mathbf{y}} - \bar{\mathbf{y}}||^2 + ||\mathbf{y} - \hat{\mathbf{y}}||^2$$ $$SS_{tot} = SS_{reg} + SS_{res}$$
This decomposition only holds exactly for OLS linear regression, which is why $SS_{reg}/SS_{tot} = R^2$ only for linear models.
A common misconception is that R² must be between 0 and 1. In fact, R² can be negative when the model performs worse than the baseline mean prediction.
$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$$
If $SS_{res} > SS_{tot}$, then $R^2 < 0$.
This means your model's predictions are further from the true values than the mean is!
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np def demonstrate_negative_r_squared(): """ Show scenarios where R² becomes negative. """ # True values y_true = np.array([10, 20, 30, 40, 50]) y_mean = np.mean(y_true) # = 30 print("True values:", y_true) print(f"Mean: {y_mean}") print(f"\nBaseline (mean) predictions: [{y_mean}] × 5") ss_tot = np.sum((y_true - y_mean) ** 2) print(f"SS_tot (variance around mean): {ss_tot}") # Scenario 1: Good predictions y_pred_good = np.array([12, 22, 28, 38, 52]) ss_res_good = np.sum((y_true - y_pred_good) ** 2) r2_good = 1 - ss_res_good / ss_tot print(f"\n--- Good Model ---") print(f"Predictions: {y_pred_good}") print(f"SS_res: {ss_res_good}") print(f"R²: {r2_good:.4f} (positive)") # Scenario 2: Bad predictions (worse than mean) y_pred_bad = np.array([50, 10, 50, 10, 50]) # Wrong direction! ss_res_bad = np.sum((y_true - y_pred_bad) ** 2) r2_bad = 1 - ss_res_bad / ss_tot print(f"\n--- Bad Model ---") print(f"Predictions: {y_pred_bad}") print(f"SS_res: {ss_res_bad}") print(f"R²: {r2_bad:.4f} (NEGATIVE!)") print(f"\n→ Model is worse than just predicting the mean!") # Scenario 3: Predicting exactly the mean y_pred_mean = np.full_like(y_true, y_mean, dtype=float) ss_res_mean = np.sum((y_true - y_pred_mean) ** 2) r2_mean = 1 - ss_res_mean / ss_tot print(f"\n--- Baseline Model (predict mean) ---") print(f"Predictions: {y_pred_mean}") print(f"SS_res: {ss_res_mean} (equals SS_tot)") print(f"R²: {r2_mean:.4f} (exactly zero)") demonstrate_negative_r_squared()When Does Negative R² Happen?
Model applied to wrong data: Training data distribution differs from test data
Incorrect model specification: Model structure fundamentally wrong for the problem
No intercept term: If linear regression has no intercept and data doesn't pass through origin
Adversarial or random predictions: Predictions uncorrelated or anti-correlated with truth
Data leakage during training: Model overfit to leaked information that's absent at test time
If you see negative R² on test data, something is seriously wrong. The absolute minimum a reasonable model should achieve is R² = 0 (predict the training mean). Negative R² indicates either a bug, data problem, or fundamental model mismatch.
R² has a fundamental flaw: it never decreases when you add more predictors, even if those predictors have no real relationship with the outcome.
Mathematically, adding a variable (or increasing model complexity) can only reduce $SS_{res}$—the model can always use the extra degree of freedom to fit the training data better, even if just fitting noise.
This leads to a critical problem: R² on training data is inflated for complex models.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as npfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import r2_score def demonstrate_r_squared_inflation(): """ Show how R² increases with model complexity, even for useless random features. """ np.random.seed(42) # True relationship: y = 2*x1 + 3*x2 + noise n_samples = 100 X_true = np.random.randn(n_samples, 2) y = 2 * X_true[:, 0] + 3 * X_true[:, 1] + np.random.randn(n_samples) * 0.5 print("True model: y = 2*x1 + 3*x2 + noise") print(f"Samples: {n_samples}\n") # Progressively add random (useless) features r_squared_values = [] for n_noise_features in range(0, 80, 5): # Create dataset with true features + noise features X_noise = np.random.randn(n_samples, n_noise_features) X_full = np.hstack([X_true, X_noise]) if n_noise_features > 0 else X_true # Fit model and compute R² model = LinearRegression() model.fit(X_full, y) y_pred = model.predict(X_full) r2 = r2_score(y, y_pred) r_squared_values.append((2 + n_noise_features, r2)) if n_noise_features in [0, 10, 30, 50, 70]: print(f"Features: {2 + n_noise_features:2d} ({n_noise_features} noise) → R² = {r2:.4f}") print(f"\n*** Key Insight ***") print("R² keeps increasing even though noise features add NOTHING!") print("With enough features, R² → 1.0 (perfect fit to training noise)") demonstrate_r_squared_inflation()# Example output:# Features: 2 (0 noise) → R² = 0.9721# Features: 12 (10 noise) → R² = 0.9841# Features: 32 (30 noise) → R² = 0.9944# Features: 52 (50 noise) → R² = 0.9982# Features: 72 (70 noise) → R² = 0.9996The Extreme Case
With $n$ data points and $n$ features, a linear model can fit the training data perfectly ($R^2 = 1$) regardless of the true relationship—it just memorizes each point.
This is why:
Always evaluate R² on held-out test data, not training data. Test R² will drop (sometimes dramatically) for overfit models, revealing the true generalization performance.
Adjusted R² corrects for the inflation problem by penalizing model complexity:
$$R^2_{adj} = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}$$
Where:
How It Works
The adjustment factor $\frac{n-1}{n-p-1}$ is always ≥ 1 (since $p \geq 0$), so:
$$(1 - R^2_{adj}) = (1 - R^2) \times \frac{n-1}{n-p-1} \geq (1 - R^2)$$
Therefore $R^2_{adj} \leq R^2$ — adjusted R² is always less than or equal to regular R².
| Property | R² | Adjusted R² |
|---|---|---|
| Always increases with features | Yes | No |
| Penalizes complexity | No | Yes |
| Can be negative | Only if worse than mean | Yes (if model overly complex) |
| Suitable for model selection | No (training data) | Yes (approximation) |
| Interpretation | Variance explained | Variance explained, adjusted for p |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
import numpy as npfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import r2_score def adjusted_r_squared(r2: float, n: int, p: int) -> float: """ Compute adjusted R² from R², sample size, and number of predictors. Parameters: ----------- r2 : float - R-squared value n : int - Number of samples p : int - Number of predictors (excluding intercept) """ if n - p - 1 <= 0: raise ValueError("n must be greater than p + 1") return 1 - ((1 - r2) * (n - 1) / (n - p - 1)) def compare_r2_adjusted(): """ Compare R² and Adjusted R² as features increase. """ np.random.seed(42) n_samples = 100 X_true = np.random.randn(n_samples, 2) y = 2 * X_true[:, 0] + 3 * X_true[:, 1] + np.random.randn(n_samples) * 0.5 print("Effect of adding noise features on R² vs Adjusted R²") print(f"{'Features':^10} | {'R²':^10} | {'Adj R²':^10} | {'Difference':^10}") print("-" * 48) for n_noise in [0, 5, 10, 20, 30, 50, 70, 90]: X_noise = np.random.randn(n_samples, max(1, n_noise)) X = np.hstack([X_true, X_noise]) if n_noise > 0 else X_true n_features = X.shape[1] model = LinearRegression() model.fit(X, y) r2 = r2_score(y, model.predict(X)) adj_r2 = adjusted_r_squared(r2, n_samples, n_features) print(f"{n_features:^10} | {r2:^10.4f} | {adj_r2:^10.4f} | {r2 - adj_r2:^10.4f}") print("\n*** Key Insight ***") print("R² keeps increasing, but Adjusted R² DECREASES when") print("useless features are added—correctly penalizing overfitting.") compare_r2_adjusted()Adjusted R² for Model Selection
When comparing models with different numbers of features on the same dataset:
However, Adjusted $R^2$ is still computed on training data—it's an approximation, not a substitute for cross-validation on held-out data.
Use Adjusted R² when comparing models with different numbers of features on the same training data. For model evaluation on held-out data, regular R² is fine since you're not fitting to that data. For formal model selection, consider AIC/BIC or cross-validation.
R² is widely used but often misused. Understanding its limitations prevents interpretation errors.
Major Limitations
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import numpy as npfrom sklearn.metrics import r2_score def demonstrate_pitfalls(): """ Show scenarios where R² is misleading. """ np.random.seed(42) # Pitfall 1: High R² with non-linear relationship print("=== Pitfall 1: Non-linearity hidden by R² ===") x = np.linspace(0, 10, 100) y_nonlinear = np.sin(x) * x + np.random.randn(100) * 0.5 # Fit linear model from sklearn.linear_model import LinearRegression lr = LinearRegression() lr.fit(x.reshape(-1, 1), y_nonlinear) y_pred = lr.predict(x.reshape(-1, 1)) r2 = r2_score(y_nonlinear, y_pred) print(f"True relationship: y = sin(x) * x") print(f"Linear model R²: {r2:.4f}") print("→ R² looks decent, but the model is fundamentally wrong!") # Pitfall 2: Range restriction print("\n=== Pitfall 2: Range restriction ===") x_full = np.linspace(0, 100, 1000) y_full = 2 * x_full + np.random.randn(1000) * 10 # Full range lr.fit(x_full.reshape(-1, 1), y_full) r2_full = r2_score(y_full, lr.predict(x_full.reshape(-1, 1))) # Restricted range (50-60 only) mask = (x_full >= 50) & (x_full <= 60) x_restricted = x_full[mask] y_restricted = y_full[mask] lr.fit(x_restricted.reshape(-1, 1), y_restricted) r2_restricted = r2_score(y_restricted, lr.predict(x_restricted.reshape(-1, 1))) print(f"Full range (0-100): R² = {r2_full:.4f}") print(f"Restricted (50-60): R² = {r2_restricted:.4f}") print("→ Same model quality, but range affects R² dramatically!") # Pitfall 3: R² with non-proportional variance print("\n=== Pitfall 3: Heteroscedasticity ===") x = np.linspace(1, 100, 100) # Noise increases with x (heteroscedastic) y_hetero = 2 * x + np.random.randn(100) * x * 0.5 lr.fit(x.reshape(-1, 1), y_hetero) r2 = r2_score(y_hetero, lr.predict(x.reshape(-1, 1))) print(f"Heteroscedastic data R²: {r2:.4f}") print("→ High R², but residuals aren't random—model assumptions violated!") demonstrate_pitfalls()Best Practices
Here's how to effectively compute and report R² in real-world workflows.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
import numpy as npfrom sklearn.model_selection import cross_val_score, train_test_splitfrom sklearn.linear_model import LinearRegression, Ridgefrom sklearn.metrics import r2_score, mean_squared_error, mean_absolute_errorfrom scipy import stats def comprehensive_regression_report(y_true, y_pred, n_features, model_name="Model"): """ Generate comprehensive regression metrics report including R² analysis. """ n = len(y_true) # Basic metrics mse = mean_squared_error(y_true, y_pred) rmse = np.sqrt(mse) mae = mean_absolute_error(y_true, y_pred) # R² and Adjusted R² r2 = r2_score(y_true, y_pred) adj_r2 = 1 - ((1 - r2) * (n - 1) / (n - n_features - 1)) # Additional diagnostics residuals = y_true - y_pred residual_std = np.std(residuals) # Correlation between predictions and actuals correlation = np.corrcoef(y_true, y_pred)[0, 1] # 95% confidence interval for R² (approximate, using Fisher transformation) z = 0.5 * np.log((1 + correlation) / (1 - correlation)) se = 1 / np.sqrt(n - 3) z_lower = z - 1.96 * se z_upper = z + 1.96 * se r_lower = (np.exp(2 * z_lower) - 1) / (np.exp(2 * z_lower) + 1) r_upper = (np.exp(2 * z_upper) - 1) / (np.exp(2 * z_upper) + 1) r2_ci = (r_lower**2, r_upper**2) print(f"╔══════ {model_name} Regression Report ══════╗") print(f"║ Samples: {n}, Features: {n_features}") print(f"╠════════════════════════════════════════════╣") print(f"║ CORE METRICS") print(f"║ MSE: {mse:.4f}") print(f"║ RMSE: {rmse:.4f}") print(f"║ MAE: {mae:.4f}") print(f"╠════════════════════════════════════════════╣") print(f"║ R² ANALYSIS") print(f"║ R²: {r2:.4f} ({r2*100:.1f}% variance explained)") print(f"║ Adjusted R²:{adj_r2:.4f}") print(f"║ 95% CI: ({r2_ci[0]:.4f}, {r2_ci[1]:.4f})") print(f"╠════════════════════════════════════════════╣") print(f"║ QUALITY CHECKS") print(f"║ Residual Mean: {np.mean(residuals):.4f} (should be ~0)") print(f"║ Residual Std: {residual_std:.4f}") print(f"╚════════════════════════════════════════════╝") if adj_r2 < r2 - 0.05: print("⚠️ Large gap between R² and Adjusted R² suggests overfitting") return { 'r2': r2, 'adj_r2': adj_r2, 'rmse': rmse, 'mae': mae, 'r2_ci': r2_ci } def cross_validated_r2(X, y, model, cv=5): """ Compute R² with cross-validation for robust estimation. """ scores = cross_val_score(model, X, y, cv=cv, scoring='r2') print(f"\nCross-Validated R² (k={cv}):") print(f" Mean: {scores.mean():.4f}") print(f" Std: {scores.std():.4f}") print(f" Range: [{scores.min():.4f}, {scores.max():.4f}]") return scores # Example usagenp.random.seed(42)n_samples, n_features = 200, 5X = np.random.randn(n_samples, n_features)true_coef = np.array([3.0, -2.0, 1.5, 0, 0]) # Last two features are noisey = X @ true_coef + np.random.randn(n_samples) * 2 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = LinearRegression()model.fit(X_train, y_train)y_pred = model.predict(X_test) comprehensive_regression_report(y_test, y_pred, n_features, "Linear Regression")cross_validated_r2(X, y, LinearRegression())R² provides a scale-independent measure of model explanatory power. Let's consolidate the key insights:
What's Next
R² and RMSE are absolute metrics. But what if you need error as a percentage of the true value? MAPE (Mean Absolute Percentage Error) and SMAPE provide relative error metrics that are scale-invariant and often preferred in business contexts. We'll explore these next.
You now understand R² deeply—its computation, interpretation, limitations, and when to use adjusted R². You can communicate model quality effectively and avoid common interpretation pitfalls.