Loading learning content...
When a machine learning model makes a prediction, stakeholders across every domain—from healthcare to finance to criminal justice—increasingly demand answers to a simple question: Why?
Why did the model deny this loan? Why did it flag this transaction as fraud? Why did it predict this patient is high-risk? The answers to these questions have real consequences for individuals, organizations, and society.
SHAP (SHapley Additive exPlanations) has emerged as the gold standard for answering these questions. Introduced by Lundberg and Lee in 2017, SHAP provides a unified framework for feature attribution that is:
Today, SHAP is the most cited interpretability method in machine learning research and the de facto standard in industries requiring model explanations.
This page focuses on practical mastery of SHAP values: understanding the framework, computing explanations efficiently, visualizing results effectively, and avoiding common mistakes. The next page dives deeper into the underlying Shapley theory for those seeking mathematical rigor.
SHAP values answer a precise question for each prediction:
How much did each feature contribute to moving this prediction away from the baseline (average) prediction?
For any prediction $f(x)$, SHAP provides a decomposition:
$$f(x) = \phi_0 + \sum_{j=1}^{p} \phi_j(x)$$
Where:
This decomposition is exact: the SHAP values sum precisely to the difference between the prediction and the average. No information is lost or unattributed.
Imagine a team project where the output is the prediction. SHAP values tell you exactly how much credit (positive or negative) each feature deserves for the final result. The allocation is 'fair' in a mathematically precise sense: it satisfies several desirable properties derived from game theory.
Consider a house price model predicting $450,000 for a specific house. The average house price in training data is $300,000. SHAP values might show:
| Feature | Value | SHAP Value | Interpretation |
|---|---|---|---|
| Size | 2,500 sq ft | +$80,000 | Large size increases price |
| Location | Downtown | +$50,000 | Premium location adds value |
| Age | 40 years | -$20,000 | Older age decreases value |
| Bedrooms | 3 | +$15,000 | More bedrooms add value |
| Condition | Good | +$25,000 | Good condition premium |
Verification: $300,000 + $80,000 + $50,000 - $20,000 + $15,000 + $25,000 = $450,000 ✓
The sum of SHAP values plus base value exactly equals the prediction. Every dollar is accounted for.
SHAP provides both local and global interpretability:
Local (per-prediction): SHAP values for a single instance explain why that specific prediction differs from the average. Each prediction has its own SHAP values.
Global (model-level): Aggregate SHAP values across many predictions to understand overall feature importance. Unlike permutation importance, SHAP-based global importance considers the magnitude and direction of effects across all predictions.
This duality is a key strength — you can zoom from individual explanations to model-wide patterns seamlessly.
SHAP values are unique in satisfying a set of desirable mathematical properties. These properties come from Shapley's theorem in cooperative game theory, which proves that only one allocation method satisfies all of them.
$$f(x) = \phi_0 + \sum_{j=1}^{p} \phi_j(x)$$
The SHAP values plus base value exactly equal the model output. Nothing is left unexplained. This seems obvious but many attribution methods don't satisfy it — they give attributions that don't sum to the actual prediction.
If a feature doesn't contribute to any prediction (it's truly unused by the model), its SHAP value is zero:
$$x_j \text{ not used by } f \Rightarrow \phi_j(x) = 0$$
Features that don't matter get zero credit.
If you change a model so that a feature's contribution increases (or stays the same) in all contexts, its SHAP value can only increase (or stay the same), never decrease.
Formally: If for all subsets $S \subseteq {1,...,p} \setminus {j}$: $$f'_S \cup {j}(x) - f'_S(x) \geq f_S \cup {j}(x) - f_S(x)$$
Then $\phi'_j(x) \geq \phi_j(x)$.
This prevents counterintuitive situations where making a feature more important in the model actually decreases its attribution.
Shapley (1953) proved that there is exactly ONE allocation method satisfying all these properties. This is profound: if you want your attributions to be fair, locally accurate, and consistent, you MUST use Shapley values. There's no alternative.
For additive models, SHAP values are additive. If you combine two models:
$$f(x) = g(x) + h(x)$$
Then:
$$\phi_j^f(x) = \phi_j^g(x) + \phi_j^h(x)$$
This means SHAP values of ensemble models can be understood by decomposing into component attributions.
If two features contribute identically in all contexts (they're functionally interchangeable), they receive equal SHAP values. You can't have arbitrary favoritism toward one of two equivalent features.
| Property | Meaning | Why It Matters |
|---|---|---|
| Local Accuracy | Attributions sum to prediction | Complete explanation—no unexplained residual |
| Missingness | Unused features get zero | No credit for features the model ignores |
| Consistency | More important → higher attribution | Intuitive relationship between model behavior and explanation |
| Symmetry | Equivalent features get equal credit | No arbitrary favoritism |
| Additivity | Combine linearly for ensemble models | Decomposable explanations for complex models |
The exact SHAP value formula requires exponentially many model evaluations — $O(2^p)$ for $p$ features. This is computationally intractable for real-world problems. Several algorithms provide tractable approximations or exact solutions for specific model classes.
KernelSHAP treats SHAP value computation as a weighted linear regression problem. It's model-agnostic but approximate.
Algorithm intuition:
The weighting scheme ensures the solution converges to true SHAP values as sample size increases.
1234567891011121314151617181920212223242526272829303132333435363738
import shapimport numpy as npfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_split # Load datadata = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split( data.data, data.target, test_size=0.2, random_state=42) # Train any modelmodel = RandomForestClassifier(n_estimators=100, random_state=42)model.fit(X_train, y_train) # Create KernelSHAP explainer# Uses a background dataset to define "feature absent" valuesbackground = shap.sample(X_train, 100) # Subsample for efficiencyexplainer = shap.KernelExplainer( model.predict_proba, # Model prediction function background # Background data) # Compute SHAP values for test instances# This is slow for KernelSHAP — each instance requires many model callsshap_values = explainer.shap_values(X_test[:10]) # Explain 10 instances # shap_values[1] contains SHAP values for class 1 (malignant)# Shape: (n_samples, n_features)print("SHAP values shape:", shap_values[1].shape)print("Base value (expected value):", explainer.expected_value[1]) # Verify local accuracy for first instanceprediction_proba = model.predict_proba(X_test[:1])[0, 1]shap_sum = explainer.expected_value[1] + shap_values[1][0].sum()print(f"Model prediction: {prediction_proba:.4f}")print(f"Base + SHAP sum: {shap_sum:.4f}")For tree-based models (Random Forest, XGBoost, LightGBM, CatBoost), TreeSHAP provides exact SHAP values in polynomial time — a remarkable breakthrough.
Algorithm insight: TreeSHAP exploits the recursive tree structure. By traversing the tree once per instance and tracking all possible feature subsets simultaneously, it computes exact SHAP values in $O(TLD^2)$ time where $T$ is number of trees, $L$ is maximum leaves, and $D$ is depth.
This is orders of magnitude faster than KernelSHAP and provides exact rather than approximate values.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import shapimport xgboost as xgbimport numpy as npfrom sklearn.datasets import fetch_california_housingfrom sklearn.model_selection import train_test_split # Load datahousing = fetch_california_housing()X_train, X_test, y_train, y_test = train_test_split( housing.data, housing.target, test_size=0.2, random_state=42) # Train XGBoost modelmodel = xgb.XGBRegressor( n_estimators=100, max_depth=6, learning_rate=0.1, random_state=42)model.fit(X_train, y_train) # Create TreeExplainer — automatically uses TreeSHAPexplainer = shap.TreeExplainer(model) # Compute SHAP values for entire test set — fast!shap_values = explainer.shap_values(X_test) # Resultsprint("SHAP values shape:", shap_values.shape) # (n_samples, n_features)print("Expected value:", explainer.expected_value) # Verify local accuracyidx = 0prediction = model.predict(X_test[idx:idx+1])[0]shap_sum = explainer.expected_value + shap_values[idx].sum()print(f"\nInstance {idx}:")print(f" Prediction: {prediction:.4f}")print(f" Base + SHAP: {shap_sum:.4f}")print(f" Difference: {abs(prediction - shap_sum):.6f}") # Should be ~0 # TreeSHAP is so fast we can explain thousands of instancesimport timestart = time.time()all_shap = explainer.shap_values(X_test)elapsed = time.time() - startprint(f"\nExplained {len(X_test)} instances in {elapsed:.2f} seconds")For deep learning models, DeepSHAP combines SHAP with DeepLIFT attributions. It propagates SHAP values through the network layers using efficient backpropagation-like updates.
Key insight: DeepSHAP approximates SHAP values by assuming features interact in specific ways through the network. It's not exact but is much faster than KernelSHAP for deep networks.
An alternative for neural networks that combines integrated gradients with SHAP theory. Uses gradient information for efficient computation.
| Algorithm | Model Types | Exactness | Complexity | Use Case |
|---|---|---|---|---|
| KernelSHAP | Any model | Approximate | O(n_samples × 2^k) where k = sample budget | Any model, small # of explanations |
| TreeSHAP | Tree ensembles | Exact | O(T × L × D²) | XGBoost, LightGBM, RF, CatBoost |
| DeepSHAP | Neural networks | Approximate | O(forward + backward pass) | Deep learning models |
| GradientSHAP | Differentiable models | Approximate | O(n_samples × gradient) | Neural networks, smooth models |
| LinearSHAP | Linear models | Exact | O(p) | Linear/logistic regression |
Always prefer model-specific explainers when available: TreeExplainer for trees, LinearExplainer for linear models. Fall back to KernelExplainer only when no specialized explainer exists. The speed difference can be 100x or more.
The SHAP library provides powerful visualizations that have become standard in ML interpretability. Understanding these visualizations is essential for communicating model behavior.
The force plot shows how each feature pushes the prediction from the base value (average) to the final prediction.
123456789101112131415161718192021222324252627
import shapimport matplotlib.pyplot as plt # Assuming explainer and shap_values computed as above# Force plot for a single instanceshap.initjs() # Required for interactive plots in notebooks # Single prediction explanationidx = 0shap.force_plot( explainer.expected_value, # Base value shap_values[idx], # SHAP values for this instance X_test[idx], # Feature values for this instance feature_names=housing.feature_names # Feature names) # Force plot as matplotlib figure (for saving)shap.force_plot( explainer.expected_value, shap_values[idx], X_test[idx], feature_names=housing.feature_names, matplotlib=True, show=False)plt.tight_layout()plt.savefig("force_plot.png", dpi=150, bbox_inches='tight')Reading a force plot:
The summary plot aggregates SHAP values across all instances to show global feature importance and effect direction.
1234567891011121314151617181920212223242526272829
import shapimport matplotlib.pyplot as plt # Summary plot — shows feature importance across dataset# Each dot is one instance; x-axis is SHAP value, color is feature value plt.figure(figsize=(10, 8))shap.summary_plot( shap_values, X_test, feature_names=housing.feature_names, show=False)plt.tight_layout()plt.savefig("summary_plot.png", dpi=150, bbox_inches='tight')plt.show() # Bar version — just shows mean absolute SHAP values (magnitude only)plt.figure(figsize=(10, 6))shap.summary_plot( shap_values, X_test, feature_names=housing.feature_names, plot_type="bar", show=False)plt.tight_layout()plt.savefig("summary_bar.png", dpi=150, bbox_inches='tight')plt.show()Reading a summary plot (beeswarm):
Interpreting patterns:
The dependence plot shows how a single feature's SHAP values vary with its actual values, revealing nonlinear effects and interactions.
12345678910111213141516171819202122232425262728293031
import shapimport matplotlib.pyplot as plt # Dependence plot for a specific feature# Shows SHAP value vs feature value, colored by interacting feature plt.figure(figsize=(10, 6))shap.dependence_plot( "MedInc", # Feature to plot on x-axis shap_values, # SHAP values X_test, # Feature values feature_names=housing.feature_names, interaction_index="auto", # Automatically find interacting feature show=False)plt.tight_layout()plt.savefig("dependence_income.png", dpi=150, bbox_inches='tight')plt.show() # Specify interaction feature manuallyplt.figure(figsize=(10, 6))shap.dependence_plot( "HouseAge", shap_values, X_test, feature_names=housing.feature_names, interaction_index="MedInc", # Color by median income show=False)plt.tight_layout()plt.show()Reading a dependence plot:
The waterfall plot shows the cumulative contribution of each feature to a prediction.
123456789101112131415161718192021
import shapimport matplotlib.pyplot as plt # Waterfall plot for single instance# Shows step-by-step contribution from base to final prediction idx = 0 # Create Explanation object for new SHAP APIexplanation = shap.Explanation( values=shap_values[idx], base_values=explainer.expected_value, data=X_test[idx], feature_names=housing.feature_names) plt.figure(figsize=(10, 8))shap.waterfall_plot(explanation, show=False)plt.tight_layout()plt.savefig("waterfall.png", dpi=150, bbox_inches='tight')plt.show()| Plot Type | Purpose | When to Use |
|---|---|---|
| Force Plot | Explain single prediction | Debugging individual cases, stakeholder explanations |
| Summary (beeswarm) | Global importance + direction | Overall model understanding, feature selection |
| Summary (bar) | Global importance ranking | Quick importance overview, presentations |
| Dependence | Feature effect + interactions | Understanding nonlinear relationships |
| Waterfall | Decomposed single prediction | Detailed case analysis, auditing |
| Heatmap | Compare explanations across instances | Cohort analysis, pattern detection |
A key advantage of SHAP is providing both local (per-prediction) and global (model-wide) importance from the same underlying attribution.
Global SHAP importance is computed by aggregating local SHAP values:
Mean Absolute SHAP Value (most common): $$I_j = \frac{1}{n} \sum_{i=1}^{n} |\phi_j(x_i)|$$
This measures the average magnitude of a feature's impact, regardless of direction.
Mean SHAP Value: $$I_j^{signed} = \frac{1}{n} \sum_{i=1}^{n} \phi_j(x_i)$$
This measures the average directional impact (positive means feature tends to increase predictions).
12345678910111213141516171819202122232425262728293031323334353637383940414243
import numpy as npimport pandas as pd # Compute global importance from SHAP values# shap_values shape: (n_samples, n_features) # Mean absolute SHAP value — standard global importanceglobal_importance = np.abs(shap_values).mean(axis=0) # Mean SHAP value — directional importancedirectional_importance = shap_values.mean(axis=0) # Standard deviation — shows variability of impactimportance_std = np.abs(shap_values).std(axis=0) # Create summary dataframeimportance_df = pd.DataFrame({ 'feature': housing.feature_names, 'mean_abs_shap': global_importance, 'mean_shap': directional_importance, 'std_abs_shap': importance_std}).sort_values('mean_abs_shap', ascending=False) print("Global SHAP Feature Importance")print("="*60)print(importance_df.to_string(index=False)) # Compare with permutation importance (they often agree but not always)from sklearn.inspection import permutation_importance perm_result = permutation_importance( model, X_test, y_test, n_repeats=30, random_state=42) comparison = pd.DataFrame({ 'feature': housing.feature_names, 'shap_importance': global_importance, 'permutation_importance': perm_result.importances_mean})comparison['shap_rank'] = comparison['shap_importance'].rank(ascending=False)comparison['perm_rank'] = comparison['permutation_importance'].rank(ascending=False)print("\nSHAP vs Permutation Importance Comparison:")print(comparison.sort_values('shap_importance', ascending=False).to_string(index=False))SHAP global importance and permutation importance measure different things:
| Aspect | SHAP | Permutation |
|---|---|---|
| What it measures | Average magnitude of feature contribution | Performance drop when feature is randomized |
| Correlated features | Both features get credit | Importance is diluted |
| Model reliance | How much does feature contribute to predictions | How much does shuffling hurt performance |
| Direction | Can be positive or negative | Always measures loss increase |
When they disagree:
SHAP and permutation importance are complementary. SHAP answers 'What contributes to predictions?' while permutation answers 'What is necessary for performance?' Features can contribute without being necessary (if correlated alternatives exist), and vice versa in degenerate cases.
SHAP can be extended to capture interaction effects — how pairs of features jointly contribute beyond their individual effects.
For each pair of features $(i, j)$, the SHAP interaction value $\Phi_{i,j}$ captures their synergistic effect:
$$\phi_i(x) = \Phi_{i,i}(x) + \sum_{j \neq i} \Phi_{i,j}(x)$$
Note: $\Phi_{i,j} = \Phi_{j,i}$ (interactions are symmetric).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import shapimport numpy as npimport matplotlib.pyplot as plt # TreeSHAP can compute exact interaction values (expensive but tractable)# Only available for TreeExplainer explainer = shap.TreeExplainer(model) # Compute interaction values (this is slower than regular SHAP)# Shape: (n_samples, n_features, n_features)shap_interaction = explainer.shap_interaction_values(X_test[:100]) # Subset for speed print("Interaction values shape:", shap_interaction.shape)# e.g., (100, 8, 8) for 100 samples and 8 features # Verify: sum of interactions equals regular SHAP valuesidx = 0regular_shap = explainer.shap_values(X_test[idx:idx+1])[0]reconstructed = shap_interaction[idx].sum(axis=1) print("\nVerification for instance 0:")for i, name in enumerate(housing.feature_names): print(f" {name}: regular={regular_shap[i]:.4f}, from_interactions={reconstructed[i]:.4f}") # Visualize main effects vs interactions# Sum absolute interactions by feature pairmean_abs_interaction = np.abs(shap_interaction).mean(axis=0) # Off-diagonal elements are the true interactionsnp.fill_diagonal(mean_abs_interaction, 0) # Plot interaction heatmapplt.figure(figsize=(10, 8))plt.imshow(mean_abs_interaction, cmap='RdBu_r', aspect='auto')plt.xticks(range(len(housing.feature_names)), housing.feature_names, rotation=45, ha='right')plt.yticks(range(len(housing.feature_names)), housing.feature_names)plt.colorbar(label='Mean |Interaction|')plt.title('SHAP Feature Interaction Matrix')plt.tight_layout()plt.savefig("interaction_heatmap.png", dpi=150)plt.show() # Find top interactionsinteractions = []for i in range(len(housing.feature_names)): for j in range(i+1, len(housing.feature_names)): interactions.append({ 'feature_1': housing.feature_names[i], 'feature_2': housing.feature_names[j], 'interaction_strength': mean_abs_interaction[i, j] }) import pandas as pdinteraction_df = pd.DataFrame(interactions).sort_values('interaction_strength', ascending=False)print("\nTop Feature Interactions:")print(interaction_df.head(10).to_string(index=False))SHAP interaction values require O(p²) computation per instance compared to O(p) for regular SHAP values. For high-dimensional data (p > 100), this becomes very expensive. Use only when interaction understanding is specifically needed, and on a representative subsample.
SHAP is powerful but can be misused. Here are common mistakes and how to avoid them.
The mistake: 'SHAP shows income has high importance, so income causes the outcome.'
Reality: SHAP measures model reliance, not causation. The model uses income to predict, but this doesn't mean changing income would change the outcome. Income might be correlated with true causal factors.
Best practice: State 'The model relies heavily on income for predictions' rather than 'Income causes the outcome.'
For KernelSHAP and related methods, the background distribution (reference dataset) significantly affects SHAP values. Different backgrounds give different attributions.
1234567891011121314151617181920212223
import shapimport numpy as np # Different backgrounds give different SHAP values! # Background 1: Random samplebackground_random = shap.sample(X_train, 100)explainer_random = shap.KernelExplainer(model.predict, background_random)shap_random = explainer_random.shap_values(X_test[:5]) # Background 2: K-means summary (clusters the data)background_kmeans = shap.kmeans(X_train, 10)explainer_kmeans = shap.KernelExplainer(model.predict, background_kmeans)shap_kmeans = explainer_kmeans.shap_values(X_test[:5]) # Compareprint("SHAP values vary with background choice:")print(f"Random background: {shap_random[0]}")print(f"K-means background: {shap_kmeans[0]}") # Best practice: Use a representative background# - For TreeExplainer: uses training data implicitly# - For KernelExplainer: use training data or thoughtful sampleMany SHAP implementations assume features are independent when computing "absent" feature values. This is called the interventional vs observational SHAP debate.
Problem: If features are correlated, sampling them independently creates unrealistic data points (e.g., pregnant males).
TreeSHAP solution: Use feature_perturbation='tree_path_dependent' (default) which respects conditional distributions.
Deploying SHAP in production systems requires careful engineering to balance explanation quality with latency and cost.
Pattern 1: Pre-computed Explanations For batch predictions, compute SHAP values alongside predictions and store them. Explanations are retrieved rather than computed on-demand.
Pattern 2: On-demand Lightweight For real-time, use fast explainers (TreeSHAP) and limit to top-k features. Compute full explanations asynchronously if needed.
Pattern 3: Surrogate Model Train a fast interpretable surrogate (e.g., linear) on top of complex model predictions. Use surrogate coefficients as approximate explanations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
import shapimport numpy as npimport jsonfrom typing import Dict, List, Anyfrom dataclasses import dataclassfrom functools import lru_cache @dataclassclass SHAPExplanation: """Structured SHAP explanation for API responses.""" prediction: float base_value: float top_positive: List[Dict[str, Any]] top_negative: List[Dict[str, Any]] all_contributions: Dict[str, float] def to_dict(self) -> Dict: return { 'prediction': self.prediction, 'base_value': self.base_value, 'explanation': { 'positive_factors': self.top_positive, 'negative_factors': self.top_negative }, 'full_attribution': self.all_contributions } class ProductionSHAPExplainer: """Production-ready SHAP explainer with caching and top-k selection.""" def __init__(self, model, feature_names: List[str], background_data: np.ndarray): self.model = model self.feature_names = feature_names self.explainer = shap.TreeExplainer(model) # Use TreeSHAP for speed self.base_value = self.explainer.expected_value def explain(self, x: np.ndarray, top_k: int = 5) -> SHAPExplanation: """ Generate SHAP explanation for a single instance. Parameters ---------- x : ndarray of shape (n_features,) top_k : number of top features to highlight Returns ------- SHAPExplanation with top positive/negative contributors """ # Ensure correct shape if x.ndim == 1: x = x.reshape(1, -1) # Compute prediction and SHAP values prediction = float(self.model.predict(x)[0]) shap_values = self.explainer.shap_values(x)[0] # Create full attribution dict all_contributions = { name: float(val) for name, val in zip(self.feature_names, shap_values) } # Sort by absolute value sorted_indices = np.argsort(np.abs(shap_values))[::-1] # Extract top positive and negative top_positive = [] top_negative = [] for idx in sorted_indices: if len(top_positive) >= top_k and len(top_negative) >= top_k: break val = shap_values[idx] entry = { 'feature': self.feature_names[idx], 'value': float(x[0, idx]), 'contribution': float(val) } if val > 0 and len(top_positive) < top_k: top_positive.append(entry) elif val < 0 and len(top_negative) < top_k: top_negative.append(entry) return SHAPExplanation( prediction=prediction, base_value=float(self.base_value), top_positive=top_positive, top_negative=top_negative, all_contributions=all_contributions ) def explain_batch(self, X: np.ndarray) -> List[SHAPExplanation]: """Efficiently explain a batch of instances.""" predictions = self.model.predict(X) shap_values = self.explainer.shap_values(X) explanations = [] for i in range(len(X)): explanations.append(self._build_explanation( X[i], predictions[i], shap_values[i] )) return explanations # Usage exampleif __name__ == "__main__": # Setup explainer = ProductionSHAPExplainer(model, housing.feature_names, X_train) # Single explanation explanation = explainer.explain(X_test[0]) # Format for API response response = explanation.to_dict() print(json.dumps(response, indent=2))For real-time APIs: TreeSHAP typically computes in 1-10ms per instance. KernelSHAP takes 100ms-10s. For batch systems, precompute during ETL and store explanations alongside predictions.
SHAP values provide the most principled framework for feature attribution in machine learning. Let's consolidate the key insights:
You now understand SHAP values in practice. The next page dives into the mathematical foundations: Shapley values from cooperative game theory, the formal proof of uniqueness, and why this framework is the 'correct' way to attribute contributions. This theoretical understanding will deepen your interpretation skills and prepare you for advanced applications.