Machine LearningML Interpretability & Fairness

Feature Attribution Methods

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

2 / 5

SHAP Values

The Gold Standard of Feature Attribution

When a machine learning model makes a prediction, stakeholders across every domain—from healthcare to finance to criminal justice—increasingly demand answers to a simple question: Why?

Why did the model deny this loan? Why did it flag this transaction as fraud? Why did it predict this patient is high-risk? The answers to these questions have real consequences for individuals, organizations, and society.

SHAP (SHapley Additive exPlanations) has emerged as the gold standard for answering these questions. Introduced by Lundberg and Lee in 2017, SHAP provides a unified framework for feature attribution that is:

Theoretically grounded in cooperative game theory (Shapley values)
Model-agnostic yet with efficient specializations for common model types
Locally accurate — attributions for a prediction sum exactly to the prediction
Consistently interpretable — same feature contributions mean the same thing across models

Today, SHAP is the most cited interpretability method in machine learning research and the de facto standard in industries requiring model explanations.

What You Will Learn

This page focuses on practical mastery of SHAP values: understanding the framework, computing explanations efficiently, visualizing results effectively, and avoiding common mistakes. The next page dives deeper into the underlying Shapley theory for those seeking mathematical rigor.

The SHAP Framework

SHAP values answer a precise question for each prediction:

How much did each feature contribute to moving this prediction away from the baseline (average) prediction?

The Decomposition Property

For any prediction $f(x)$, SHAP provides a decomposition:

$$f(x) = \phi_0 + \sum_{j=1}^{p} \phi_j(x)$$

Where:

$\phi_0$ is the base value — the model's average prediction over the training data
$\phi_j(x)$ is the SHAP value for feature $j$ — that feature's contribution to moving the prediction from base value to $f(x)$

This decomposition is exact: the SHAP values sum precisely to the difference between the prediction and the average. No information is lost or unattributed.

The Fair Allocation Metaphor

Imagine a team project where the output is the prediction. SHAP values tell you exactly how much credit (positive or negative) each feature deserves for the final result. The allocation is 'fair' in a mathematically precise sense: it satisfies several desirable properties derived from game theory.

Example: Understanding a House Price Prediction

Consider a house price model predicting $450,000 for a specific house. The average house price in training data is $300,000. SHAP values might show:

Feature	Value	SHAP Value	Interpretation
Size	2,500 sq ft	+$80,000	Large size increases price
Location	Downtown	+$50,000	Premium location adds value
Age	40 years	-$20,000	Older age decreases value
Bedrooms	3	+$15,000	More bedrooms add value
Condition	Good	+$25,000	Good condition premium

Verification: $300,000 + $80,000 + $50,000 - $20,000 + $15,000 + $25,000 = $450,000 ✓

The sum of SHAP values plus base value exactly equals the prediction. Every dollar is accounted for.

Local vs Global Interpretations

SHAP provides both local and global interpretability:

Local (per-prediction): SHAP values for a single instance explain why that specific prediction differs from the average. Each prediction has its own SHAP values.

Global (model-level): Aggregate SHAP values across many predictions to understand overall feature importance. Unlike permutation importance, SHAP-based global importance considers the magnitude and direction of effects across all predictions.

This duality is a key strength — you can zoom from individual explanations to model-wide patterns seamlessly.

Theoretical Properties (Why SHAP is 'Fair')

SHAP values are unique in satisfying a set of desirable mathematical properties. These properties come from Shapley's theorem in cooperative game theory, which proves that only one allocation method satisfies all of them.

The Three Core Properties

1. Local Accuracy (Efficiency)

$$f(x) = \phi_0 + \sum_{j=1}^{p} \phi_j(x)$$

The SHAP values plus base value exactly equal the model output. Nothing is left unexplained. This seems obvious but many attribution methods don't satisfy it — they give attributions that don't sum to the actual prediction.

2. Missingness (Null Player)

If a feature doesn't contribute to any prediction (it's truly unused by the model), its SHAP value is zero:

$$x_j \text{ not used by } f \Rightarrow \phi_j(x) = 0$$

Features that don't matter get zero credit.

3. Consistency (Monotonicity)

If you change a model so that a feature's contribution increases (or stays the same) in all contexts, its SHAP value can only increase (or stay the same), never decrease.

Formally: If for all subsets $S \subseteq {1,...,p} \setminus {j}$: $$f'_S \cup {j}(x) - f'_S(x) \geq f_S \cup {j}(x) - f_S(x)$$

Then $\phi'_j(x) \geq \phi_j(x)$.

This prevents counterintuitive situations where making a feature more important in the model actually decreases its attribution.

The Uniqueness Theorem

Shapley (1953) proved that there is exactly ONE allocation method satisfying all these properties. This is profound: if you want your attributions to be fair, locally accurate, and consistent, you MUST use Shapley values. There's no alternative.

Additional Property: Additivity

For additive models, SHAP values are additive. If you combine two models:

$$f(x) = g(x) + h(x)$$

Then:

$$\phi_j^f(x) = \phi_j^g(x) + \phi_j^h(x)$$

This means SHAP values of ensemble models can be understood by decomposing into component attributions.

Property: Symmetry

If two features contribute identically in all contexts (they're functionally interchangeable), they receive equal SHAP values. You can't have arbitrary favoritism toward one of two equivalent features.

SHAP Property Summary
Property	Meaning	Why It Matters
Local Accuracy	Attributions sum to prediction	Complete explanation—no unexplained residual
Missingness	Unused features get zero	No credit for features the model ignores
Consistency	More important → higher attribution	Intuitive relationship between model behavior and explanation
Symmetry	Equivalent features get equal credit	No arbitrary favoritism
Additivity	Combine linearly for ensemble models	Decomposable explanations for complex models

Computing SHAP Values

The exact SHAP value formula requires exponentially many model evaluations — $O(2^p)$ for $p$ features. This is computationally intractable for real-world problems. Several algorithms provide tractable approximations or exact solutions for specific model classes.

KernelSHAP: Model-Agnostic Approximation

KernelSHAP treats SHAP value computation as a weighted linear regression problem. It's model-agnostic but approximate.

Algorithm intuition:

Sample random subsets of features
For each subset, evaluate model with those features "present" and others "absent"
Fit a weighted linear model to predict the model output from feature presence indicators
The linear coefficients are the SHAP values

The weighting scheme ensures the solution converges to true SHAP values as sample size increases.

kernelshap_usage.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import shap
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
 
# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)
 
# Train any model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
 
# Create KernelSHAP explainer
# Uses a background dataset to define "feature absent" values
background = shap.sample(X_train, 100)  # Subsample for efficiency
explainer = shap.KernelExplainer(
    model.predict_proba,  # Model prediction function
    background            # Background data
)
 
# Compute SHAP values for test instances
# This is slow for KernelSHAP — each instance requires many model calls
shap_values = explainer.shap_values(X_test[:10])  # Explain 10 instances
 
# shap_values[1] contains SHAP values for class 1 (malignant)
# Shape: (n_samples, n_features)
print("SHAP values shape:", shap_values[1].shape)
print("Base value (expected value):", explainer.expected_value[1])
 
# Verify local accuracy for first instance
prediction_proba = model.predict_proba(X_test[:1])[0, 1]
shap_sum = explainer.expected_value[1] + shap_values[1][0].sum()
print(f"Model prediction: {prediction_proba:.4f}")
print(f"Base + SHAP sum: {shap_sum:.4f}")

TreeSHAP: Exact and Fast for Tree Models

For tree-based models (Random Forest, XGBoost, LightGBM, CatBoost), TreeSHAP provides exact SHAP values in polynomial time — a remarkable breakthrough.

Algorithm insight: TreeSHAP exploits the recursive tree structure. By traversing the tree once per instance and tracking all possible feature subsets simultaneously, it computes exact SHAP values in $O(TLD^2)$ time where $T$ is number of trees, $L$ is maximum leaves, and $D$ is depth.

This is orders of magnitude faster than KernelSHAP and provides exact rather than approximate values.

treeshap_usage.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import shap
import xgboost as xgb
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
 
# Load data
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
# Train XGBoost model
model = xgb.XGBRegressor(
    n_estimators=100, 
    max_depth=6,
    learning_rate=0.1,
    random_state=42
)
model.fit(X_train, y_train)
 
# Create TreeExplainer — automatically uses TreeSHAP
explainer = shap.TreeExplainer(model)
 
# Compute SHAP values for entire test set — fast!
shap_values = explainer.shap_values(X_test)
 
# Results
print("SHAP values shape:", shap_values.shape)  # (n_samples, n_features)
print("Expected value:", explainer.expected_value)
 
# Verify local accuracy
idx = 0
prediction = model.predict(X_test[idx:idx+1])[0]
shap_sum = explainer.expected_value + shap_values[idx].sum()
print(f"\nInstance {idx}:")
print(f"  Prediction: {prediction:.4f}")
print(f"  Base + SHAP: {shap_sum:.4f}")
print(f"  Difference: {abs(prediction - shap_sum):.6f}")  # Should be ~0
 
# TreeSHAP is so fast we can explain thousands of instances
import time
start = time.time()
all_shap = explainer.shap_values(X_test)
elapsed = time.time() - start
print(f"\nExplained {len(X_test)} instances in {elapsed:.2f} seconds")

DeepSHAP: Neural Network Attribution

For deep learning models, DeepSHAP combines SHAP with DeepLIFT attributions. It propagates SHAP values through the network layers using efficient backpropagation-like updates.

Key insight: DeepSHAP approximates SHAP values by assuming features interact in specific ways through the network. It's not exact but is much faster than KernelSHAP for deep networks.

Gradient SHAP

An alternative for neural networks that combines integrated gradients with SHAP theory. Uses gradient information for efficient computation.

SHAP Algorithm Comparison
Algorithm	Model Types	Exactness	Complexity	Use Case
KernelSHAP	Any model	Approximate	O(n_samples × 2^k) where k = sample budget	Any model, small # of explanations
TreeSHAP	Tree ensembles	Exact	O(T × L × D²)	XGBoost, LightGBM, RF, CatBoost
DeepSHAP	Neural networks	Approximate	O(forward + backward pass)	Deep learning models
GradientSHAP	Differentiable models	Approximate	O(n_samples × gradient)	Neural networks, smooth models
LinearSHAP	Linear models	Exact	O(p)	Linear/logistic regression

Choosing the Right Algorithm

Always prefer model-specific explainers when available: TreeExplainer for trees, LinearExplainer for linear models. Fall back to KernelExplainer only when no specialized explainer exists. The speed difference can be 100x or more.

SHAP Visualizations

The SHAP library provides powerful visualizations that have become standard in ML interpretability. Understanding these visualizations is essential for communicating model behavior.

Force Plot: Single Instance Explanation

The force plot shows how each feature pushes the prediction from the base value (average) to the final prediction.

shap_force_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import shap
import matplotlib.pyplot as plt
 
# Assuming explainer and shap_values computed as above
# Force plot for a single instance
shap.initjs()  # Required for interactive plots in notebooks
 
# Single prediction explanation
idx = 0
shap.force_plot(
    explainer.expected_value,      # Base value
    shap_values[idx],              # SHAP values for this instance
    X_test[idx],                   # Feature values for this instance
    feature_names=housing.feature_names  # Feature names
)
 
# Force plot as matplotlib figure (for saving)
shap.force_plot(
    explainer.expected_value,
    shap_values[idx],
    X_test[idx],
    feature_names=housing.feature_names,
    matplotlib=True,
    show=False
)
plt.tight_layout()
plt.savefig("force_plot.png", dpi=150, bbox_inches='tight')

Reading a force plot:

Base value: Starting point (average prediction)
Features in red: Push prediction higher
Features in blue: Push prediction lower
Width of bar: Magnitude of effect
Output value: Final prediction (base + all SHAP values)

Summary Plot: Global Feature Importance

The summary plot aggregates SHAP values across all instances to show global feature importance and effect direction.

shap_summary_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import shap
import matplotlib.pyplot as plt
 
# Summary plot — shows feature importance across dataset
# Each dot is one instance; x-axis is SHAP value, color is feature value
 
plt.figure(figsize=(10, 8))
shap.summary_plot(
    shap_values,
    X_test,
    feature_names=housing.feature_names,
    show=False
)
plt.tight_layout()
plt.savefig("summary_plot.png", dpi=150, bbox_inches='tight')
plt.show()
 
# Bar version — just shows mean absolute SHAP values (magnitude only)
plt.figure(figsize=(10, 6))
shap.summary_plot(
    shap_values,
    X_test,
    feature_names=housing.feature_names,
    plot_type="bar",
    show=False
)
plt.tight_layout()
plt.savefig("summary_bar.png", dpi=150, bbox_inches='tight')
plt.show()

Reading a summary plot (beeswarm):

Y-axis: Features ranked by importance (most important at top)
X-axis: SHAP value (impact on prediction)
Color: Feature value (red = high, blue = low)
Spread: Shows distribution of effects across instances

Interpreting patterns:

Dots spread wide left/right = variable impact
Red dots on right = high feature values increase prediction
Blue dots on right = low feature values increase prediction

Dependence Plot: Feature Interactions

The dependence plot shows how a single feature's SHAP values vary with its actual values, revealing nonlinear effects and interactions.

shap_dependence_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import shap
import matplotlib.pyplot as plt
 
# Dependence plot for a specific feature
# Shows SHAP value vs feature value, colored by interacting feature
 
plt.figure(figsize=(10, 6))
shap.dependence_plot(
    "MedInc",                      # Feature to plot on x-axis
    shap_values,                   # SHAP values
    X_test,                        # Feature values
    feature_names=housing.feature_names,
    interaction_index="auto",      # Automatically find interacting feature
    show=False
)
plt.tight_layout()
plt.savefig("dependence_income.png", dpi=150, bbox_inches='tight')
plt.show()
 
# Specify interaction feature manually
plt.figure(figsize=(10, 6))
shap.dependence_plot(
    "HouseAge",
    shap_values,
    X_test,
    feature_names=housing.feature_names,
    interaction_index="MedInc",    # Color by median income
    show=False
)
plt.tight_layout()
plt.show()

Reading a dependence plot:

X-axis: Actual feature value
Y-axis: SHAP value for that feature
Color: Value of interacting feature
Vertical spread at same X: Indicates interaction effects

Waterfall Plot: Decomposed Prediction

The waterfall plot shows the cumulative contribution of each feature to a prediction.

shap_waterfall_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import shap
import matplotlib.pyplot as plt
 
# Waterfall plot for single instance
# Shows step-by-step contribution from base to final prediction
 
idx = 0
 
# Create Explanation object for new SHAP API
explanation = shap.Explanation(
    values=shap_values[idx],
    base_values=explainer.expected_value,
    data=X_test[idx],
    feature_names=housing.feature_names
)
 
plt.figure(figsize=(10, 8))
shap.waterfall_plot(explanation, show=False)
plt.tight_layout()
plt.savefig("waterfall.png", dpi=150, bbox_inches='tight')
plt.show()

SHAP Visualization Guide
Plot Type	Purpose	When to Use
Force Plot	Explain single prediction	Debugging individual cases, stakeholder explanations
Summary (beeswarm)	Global importance + direction	Overall model understanding, feature selection
Summary (bar)	Global importance ranking	Quick importance overview, presentations
Dependence	Feature effect + interactions	Understanding nonlinear relationships
Waterfall	Decomposed single prediction	Detailed case analysis, auditing
Heatmap	Compare explanations across instances	Cohort analysis, pattern detection

Local vs Global Feature Importance

A key advantage of SHAP is providing both local (per-prediction) and global (model-wide) importance from the same underlying attribution.

From Local to Global

Global SHAP importance is computed by aggregating local SHAP values:

Mean Absolute SHAP Value (most common): $$I_j = \frac{1}{n} \sum_{i=1}^{n} |\phi_j(x_i)|$$

This measures the average magnitude of a feature's impact, regardless of direction.

Mean SHAP Value: $$I_j^{signed} = \frac{1}{n} \sum_{i=1}^{n} \phi_j(x_i)$$

This measures the average directional impact (positive means feature tends to increase predictions).

global_shap_importance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import numpy as np
import pandas as pd
 
# Compute global importance from SHAP values
# shap_values shape: (n_samples, n_features)
 
# Mean absolute SHAP value — standard global importance
global_importance = np.abs(shap_values).mean(axis=0)
 
# Mean SHAP value — directional importance
directional_importance = shap_values.mean(axis=0)
 
# Standard deviation — shows variability of impact
importance_std = np.abs(shap_values).std(axis=0)
 
# Create summary dataframe
importance_df = pd.DataFrame({
    'feature': housing.feature_names,
    'mean_abs_shap': global_importance,
    'mean_shap': directional_importance,
    'std_abs_shap': importance_std
}).sort_values('mean_abs_shap', ascending=False)
 
print("Global SHAP Feature Importance")
print("="*60)
print(importance_df.to_string(index=False))
 
# Compare with permutation importance (they often agree but not always)
from sklearn.inspection import permutation_importance
 
perm_result = permutation_importance(
    model, X_test, y_test, n_repeats=30, random_state=42
)
 
comparison = pd.DataFrame({
    'feature': housing.feature_names,
    'shap_importance': global_importance,
    'permutation_importance': perm_result.importances_mean
})
comparison['shap_rank'] = comparison['shap_importance'].rank(ascending=False)
comparison['perm_rank'] = comparison['permutation_importance'].rank(ascending=False)
print("\nSHAP vs Permutation Importance Comparison:")
print(comparison.sort_values('shap_importance', ascending=False).to_string(index=False))

Understanding the Difference from Permutation Importance

SHAP global importance and permutation importance measure different things:

Aspect	SHAP	Permutation
What it measures	Average magnitude of feature contribution	Performance drop when feature is randomized
Correlated features	Both features get credit	Importance is diluted
Model reliance	How much does feature contribute to predictions	How much does shuffling hurt performance
Direction	Can be positive or negative	Always measures loss increase

When they disagree:

Correlated features: SHAP assigns importance to both; permutation spreads across
Nonlinear interactions: SHAP captures complex effects; permutation may miss
Features used but not helpful: SHAP shows contribution; permutation shows 0 (performance unchanged)

Use Both Methods

SHAP and permutation importance are complementary. SHAP answers 'What contributes to predictions?' while permutation answers 'What is necessary for performance?' Features can contribute without being necessary (if correlated alternatives exist), and vice versa in degenerate cases.

SHAP Interaction Effects

SHAP can be extended to capture interaction effects — how pairs of features jointly contribute beyond their individual effects.

SHAP Interaction Values

For each pair of features $(i, j)$, the SHAP interaction value $\Phi_{i,j}$ captures their synergistic effect:

$$\phi_i(x) = \Phi_{i,i}(x) + \sum_{j \neq i} \Phi_{i,j}(x)$$

Diagonal terms $\Phi_{i,i}$: Main effects (contribution independent of interactions)
Off-diagonal terms $\Phi_{i,j}$: Interaction effects between features $i$ and $j$

Note: $\Phi_{i,j} = \Phi_{j,i}$ (interactions are symmetric).

shap_interactions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import shap
import numpy as np
import matplotlib.pyplot as plt
 
# TreeSHAP can compute exact interaction values (expensive but tractable)
# Only available for TreeExplainer
 
explainer = shap.TreeExplainer(model)
 
# Compute interaction values (this is slower than regular SHAP)
# Shape: (n_samples, n_features, n_features)
shap_interaction = explainer.shap_interaction_values(X_test[:100])  # Subset for speed
 
print("Interaction values shape:", shap_interaction.shape)
# e.g., (100, 8, 8) for 100 samples and 8 features
 
# Verify: sum of interactions equals regular SHAP values
idx = 0
regular_shap = explainer.shap_values(X_test[idx:idx+1])[0]
reconstructed = shap_interaction[idx].sum(axis=1)
 
print("\nVerification for instance 0:")
for i, name in enumerate(housing.feature_names):
    print(f"  {name}: regular={regular_shap[i]:.4f}, from_interactions={reconstructed[i]:.4f}")
 
# Visualize main effects vs interactions
# Sum absolute interactions by feature pair
mean_abs_interaction = np.abs(shap_interaction).mean(axis=0)
 
# Off-diagonal elements are the true interactions
np.fill_diagonal(mean_abs_interaction, 0)
 
# Plot interaction heatmap
plt.figure(figsize=(10, 8))
plt.imshow(mean_abs_interaction, cmap='RdBu_r', aspect='auto')
plt.xticks(range(len(housing.feature_names)), housing.feature_names, rotation=45, ha='right')
plt.yticks(range(len(housing.feature_names)), housing.feature_names)
plt.colorbar(label='Mean |Interaction|')
plt.title('SHAP Feature Interaction Matrix')
plt.tight_layout()
plt.savefig("interaction_heatmap.png", dpi=150)
plt.show()
 
# Find top interactions
interactions = []
for i in range(len(housing.feature_names)):
    for j in range(i+1, len(housing.feature_names)):
        interactions.append({
            'feature_1': housing.feature_names[i],
            'feature_2': housing.feature_names[j],
            'interaction_strength': mean_abs_interaction[i, j]
        })
 
import pandas as pd
interaction_df = pd.DataFrame(interactions).sort_values('interaction_strength', ascending=False)
print("\nTop Feature Interactions:")
print(interaction_df.head(10).to_string(index=False))

Interaction Values Are Expensive

SHAP interaction values require O(p²) computation per instance compared to O(p) for regular SHAP values. For high-dimensional data (p > 100), this becomes very expensive. Use only when interaction understanding is specifically needed, and on a representative subsample.

Common Pitfalls and Best Practices

SHAP is powerful but can be misused. Here are common mistakes and how to avoid them.

Pitfall 1: Causal Misinterpretation

The mistake: 'SHAP shows income has high importance, so income causes the outcome.'

Reality: SHAP measures model reliance, not causation. The model uses income to predict, but this doesn't mean changing income would change the outcome. Income might be correlated with true causal factors.

Best practice: State 'The model relies heavily on income for predictions' rather than 'Income causes the outcome.'

Pitfall 2: Ignoring Background Distribution

For KernelSHAP and related methods, the background distribution (reference dataset) significantly affects SHAP values. Different backgrounds give different attributions.

background_sensitivity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import shap
import numpy as np
 
# Different backgrounds give different SHAP values!
 
# Background 1: Random sample
background_random = shap.sample(X_train, 100)
explainer_random = shap.KernelExplainer(model.predict, background_random)
shap_random = explainer_random.shap_values(X_test[:5])
 
# Background 2: K-means summary (clusters the data)
background_kmeans = shap.kmeans(X_train, 10)
explainer_kmeans = shap.KernelExplainer(model.predict, background_kmeans)
shap_kmeans = explainer_kmeans.shap_values(X_test[:5])
 
# Compare
print("SHAP values vary with background choice:")
print(f"Random background: {shap_random[0]}")
print(f"K-means background: {shap_kmeans[0]}")
 
# Best practice: Use a representative background
# - For TreeExplainer: uses training data implicitly
# - For KernelExplainer: use training data or thoughtful sample

Pitfall 3: Feature Independence Assumption

Many SHAP implementations assume features are independent when computing "absent" feature values. This is called the interventional vs observational SHAP debate.

Problem: If features are correlated, sampling them independently creates unrealistic data points (e.g., pregnant males).

TreeSHAP solution: Use feature_perturbation='tree_path_dependent' (default) which respects conditional distributions.

Pitfall 4: Ignoring High Dimensionality

SHAP Pitfalls Summary

•Causal claims: SHAP measures model behavior, not real-world causation
•Background sensitivity: KernelSHAP results depend on reference distribution
•Independence assumption: Correlated features may produce unrealistic counterfactuals
•Computational cost: Don't use KernelSHAP for high-dimensional data; prefer TreeSHAP
•Local accuracy ≠ correctness: SHAP sums correctly but attributions can still be misleading
•Aggregation choices: Mean absolute vs signed mean give different global pictures

Best Practices

•Use model-specific explainers (TreeExplainer, LinearExplainer) when available
•Verify local accuracy: base_value + sum(shap_values) ≈ prediction
•Report background distribution choices and their rationale
•Use summary plots to communicate global patterns, not just individual examples
•Cross-validate importance rankings for stability
•Combine SHAP with domain knowledge for interpretation
•Document limitations when presenting to stakeholders

Production Implementation

Deploying SHAP in production systems requires careful engineering to balance explanation quality with latency and cost.

Architecture Patterns

Pattern 1: Pre-computed Explanations For batch predictions, compute SHAP values alongside predictions and store them. Explanations are retrieved rather than computed on-demand.

Pattern 2: On-demand Lightweight For real-time, use fast explainers (TreeSHAP) and limit to top-k features. Compute full explanations asynchronously if needed.

Pattern 3: Surrogate Model Train a fast interpretable surrogate (e.g., linear) on top of complex model predictions. Use surrogate coefficients as approximate explanations.

production_shap.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
import shap
import numpy as np
import json
from typing import Dict, List, Any
from dataclasses import dataclass
from functools import lru_cache
 
@dataclass
class SHAPExplanation:
    """Structured SHAP explanation for API responses."""
    prediction: float
    base_value: float
    top_positive: List[Dict[str, Any]]
    top_negative: List[Dict[str, Any]]
    all_contributions: Dict[str, float]
    
    def to_dict(self) -> Dict:
        return {
            'prediction': self.prediction,
            'base_value': self.base_value,
            'explanation': {
                'positive_factors': self.top_positive,
                'negative_factors': self.top_negative
            },
            'full_attribution': self.all_contributions
        }
 
 
class ProductionSHAPExplainer:
    """Production-ready SHAP explainer with caching and top-k selection."""
    
    def __init__(self, model, feature_names: List[str], background_data: np.ndarray):
        self.model = model
        self.feature_names = feature_names
        self.explainer = shap.TreeExplainer(model)  # Use TreeSHAP for speed
        self.base_value = self.explainer.expected_value
        
    def explain(self, x: np.ndarray, top_k: int = 5) -> SHAPExplanation:
        """
        Generate SHAP explanation for a single instance.
        
        Parameters
        ----------
        x : ndarray of shape (n_features,)
        top_k : number of top features to highlight
        
        Returns
        -------
        SHAPExplanation with top positive/negative contributors
        """
        # Ensure correct shape
        if x.ndim == 1:
            x = x.reshape(1, -1)
        
        # Compute prediction and SHAP values
        prediction = float(self.model.predict(x)[0])
        shap_values = self.explainer.shap_values(x)[0]
        
        # Create full attribution dict
        all_contributions = {
            name: float(val) 
            for name, val in zip(self.feature_names, shap_values)
        }
        
        # Sort by absolute value
        sorted_indices = np.argsort(np.abs(shap_values))[::-1]
        
        # Extract top positive and negative
        top_positive = []
        top_negative = []
        
        for idx in sorted_indices:
            if len(top_positive) >= top_k and len(top_negative) >= top_k:
                break
            
            val = shap_values[idx]
            entry = {
                'feature': self.feature_names[idx],
                'value': float(x[0, idx]),
                'contribution': float(val)
            }
            
            if val > 0 and len(top_positive) < top_k:
                top_positive.append(entry)
            elif val < 0 and len(top_negative) < top_k:
                top_negative.append(entry)
        
        return SHAPExplanation(
            prediction=prediction,
            base_value=float(self.base_value),
            top_positive=top_positive,
            top_negative=top_negative,
            all_contributions=all_contributions
        )
    
    def explain_batch(self, X: np.ndarray) -> List[SHAPExplanation]:
        """Efficiently explain a batch of instances."""
        predictions = self.model.predict(X)
        shap_values = self.explainer.shap_values(X)
        
        explanations = []
        for i in range(len(X)):
            explanations.append(self._build_explanation(
                X[i], predictions[i], shap_values[i]
            ))
        
        return explanations
 
 
# Usage example
if __name__ == "__main__":
    # Setup
    explainer = ProductionSHAPExplainer(model, housing.feature_names, X_train)
    
    # Single explanation
    explanation = explainer.explain(X_test[0])
    
    # Format for API response
    response = explanation.to_dict()
    print(json.dumps(response, indent=2))

Latency Targets

For real-time APIs: TreeSHAP typically computes in 1-10ms per instance. KernelSHAP takes 100ms-10s. For batch systems, precompute during ETL and store explanations alongside predictions.

Summary

SHAP values provide the most principled framework for feature attribution in machine learning. Let's consolidate the key insights:

Key Takeaways

•Decomposition property: SHAP values exactly decompose predictions into feature contributions (base + Σϕ = prediction)
•Unique fairness: SHAP is the only method satisfying local accuracy, missingness, and consistency simultaneously
•Algorithm choice: Use TreeSHAP for trees (exact, fast), KernelSHAP for other models (approximate, slow)
•Visualizations: Force plots for individuals, summary plots for global patterns, dependence plots for effects
•Local to global: Aggregate local SHAP values for global importance that captures direction and magnitude
•Not causal: SHAP measures model reliance, not real-world causation
•Production: Precompute for batch, use fast explainers for real-time, return top-k for APIs

Ready for Shapley Theory

You now understand SHAP values in practice. The next page dives into the mathematical foundations: Shapley values from cooperative game theory, the formal proof of uniqueness, and why this framework is the 'correct' way to attribute contributions. This theoretical understanding will deepen your interpretation skills and prepare you for advanced applications.

2 / 5

Loading learning content...

Machine LearningML Interpretability & Fairness

Feature Attribution Methods

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

2 / 5

SHAP Values

The Gold Standard of Feature Attribution

When a machine learning model makes a prediction, stakeholders across every domain—from healthcare to finance to criminal justice—increasingly demand answers to a simple question: Why?

Theoretically grounded in cooperative game theory (Shapley values)
Model-agnostic yet with efficient specializations for common model types
Locally accurate — attributions for a prediction sum exactly to the prediction
Consistently interpretable — same feature contributions mean the same thing across models

Today, SHAP is the most cited interpretability method in machine learning research and the de facto standard in industries requiring model explanations.

What You Will Learn

The SHAP Framework

SHAP values answer a precise question for each prediction:

How much did each feature contribute to moving this prediction away from the baseline (average) prediction?

The Decomposition Property

For any prediction $f(x)$, SHAP provides a decomposition:

$$f(x) = \phi_0 + \sum_{j=1}^{p} \phi_j(x)$$

Where:

$\phi_0$ is the base value — the model's average prediction over the training data
$\phi_j(x)$ is the SHAP value for feature $j$ — that feature's contribution to moving the prediction from base value to $f(x)$

This decomposition is exact: the SHAP values sum precisely to the difference between the prediction and the average. No information is lost or unattributed.

The Fair Allocation Metaphor

Example: Understanding a House Price Prediction

Consider a house price model predicting $450,000 for a specific house. The average house price in training data is $300,000. SHAP values might show:

Feature	Value	SHAP Value	Interpretation
Size	2,500 sq ft	+$80,000	Large size increases price
Location	Downtown	+$50,000	Premium location adds value
Age	40 years	-$20,000	Older age decreases value
Bedrooms	3	+$15,000	More bedrooms add value
Condition	Good	+$25,000	Good condition premium

Verification: $300,000 + $80,000 + $50,000 - $20,000 + $15,000 + $25,000 = $450,000 ✓

The sum of SHAP values plus base value exactly equals the prediction. Every dollar is accounted for.

Local vs Global Interpretations

SHAP provides both local and global interpretability:

Local (per-prediction): SHAP values for a single instance explain why that specific prediction differs from the average. Each prediction has its own SHAP values.

This duality is a key strength — you can zoom from individual explanations to model-wide patterns seamlessly.

Theoretical Properties (Why SHAP is 'Fair')

The Three Core Properties

1. Local Accuracy (Efficiency)

$$f(x) = \phi_0 + \sum_{j=1}^{p} \phi_j(x)$$

2. Missingness (Null Player)

If a feature doesn't contribute to any prediction (it's truly unused by the model), its SHAP value is zero:

$$x_j \text{ not used by } f \Rightarrow \phi_j(x) = 0$$

Features that don't matter get zero credit.

3. Consistency (Monotonicity)

If you change a model so that a feature's contribution increases (or stays the same) in all contexts, its SHAP value can only increase (or stay the same), never decrease.

Formally: If for all subsets $S \subseteq {1,...,p} \setminus {j}$: $$f'_S \cup {j}(x) - f'_S(x) \geq f_S \cup {j}(x) - f_S(x)$$

Then $\phi'_j(x) \geq \phi_j(x)$.

This prevents counterintuitive situations where making a feature more important in the model actually decreases its attribution.

The Uniqueness Theorem

Additional Property: Additivity

For additive models, SHAP values are additive. If you combine two models:

$$f(x) = g(x) + h(x)$$

Then:

$$\phi_j^f(x) = \phi_j^g(x) + \phi_j^h(x)$$

This means SHAP values of ensemble models can be understood by decomposing into component attributions.

Property: Symmetry

SHAP Property Summary
Property	Meaning	Why It Matters
Local Accuracy	Attributions sum to prediction	Complete explanation—no unexplained residual
Missingness	Unused features get zero	No credit for features the model ignores
Consistency	More important → higher attribution	Intuitive relationship between model behavior and explanation
Symmetry	Equivalent features get equal credit	No arbitrary favoritism
Additivity	Combine linearly for ensemble models	Decomposable explanations for complex models

Computing SHAP Values

KernelSHAP: Model-Agnostic Approximation

KernelSHAP treats SHAP value computation as a weighted linear regression problem. It's model-agnostic but approximate.

Algorithm intuition:

Sample random subsets of features
For each subset, evaluate model with those features "present" and others "absent"
Fit a weighted linear model to predict the model output from feature presence indicators
The linear coefficients are the SHAP values

The weighting scheme ensures the solution converges to true SHAP values as sample size increases.

kernelshap_usage.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import shap
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
 
# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)
 
# Train any model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
 
# Create KernelSHAP explainer
# Uses a background dataset to define "feature absent" values
background = shap.sample(X_train, 100)  # Subsample for efficiency
explainer = shap.KernelExplainer(
    model.predict_proba,  # Model prediction function
    background            # Background data
)
 
# Compute SHAP values for test instances
# This is slow for KernelSHAP — each instance requires many model calls
shap_values = explainer.shap_values(X_test[:10])  # Explain 10 instances
 
# shap_values[1] contains SHAP values for class 1 (malignant)
# Shape: (n_samples, n_features)
print("SHAP values shape:", shap_values[1].shape)
print("Base value (expected value):", explainer.expected_value[1])
 
# Verify local accuracy for first instance
prediction_proba = model.predict_proba(X_test[:1])[0, 1]
shap_sum = explainer.expected_value[1] + shap_values[1][0].sum()
print(f"Model prediction: {prediction_proba:.4f}")
print(f"Base + SHAP sum: {shap_sum:.4f}")

TreeSHAP: Exact and Fast for Tree Models

For tree-based models (Random Forest, XGBoost, LightGBM, CatBoost), TreeSHAP provides exact SHAP values in polynomial time — a remarkable breakthrough.

This is orders of magnitude faster than KernelSHAP and provides exact rather than approximate values.

treeshap_usage.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import shap
import xgboost as xgb
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
 
# Load data
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
# Train XGBoost model
model = xgb.XGBRegressor(
    n_estimators=100, 
    max_depth=6,
    learning_rate=0.1,
    random_state=42
)
model.fit(X_train, y_train)
 
# Create TreeExplainer — automatically uses TreeSHAP
explainer = shap.TreeExplainer(model)
 
# Compute SHAP values for entire test set — fast!
shap_values = explainer.shap_values(X_test)
 
# Results
print("SHAP values shape:", shap_values.shape)  # (n_samples, n_features)
print("Expected value:", explainer.expected_value)
 
# Verify local accuracy
idx = 0
prediction = model.predict(X_test[idx:idx+1])[0]
shap_sum = explainer.expected_value + shap_values[idx].sum()
print(f"\nInstance {idx}:")
print(f"  Prediction: {prediction:.4f}")
print(f"  Base + SHAP: {shap_sum:.4f}")
print(f"  Difference: {abs(prediction - shap_sum):.6f}")  # Should be ~0
 
# TreeSHAP is so fast we can explain thousands of instances
import time
start = time.time()
all_shap = explainer.shap_values(X_test)
elapsed = time.time() - start
print(f"\nExplained {len(X_test)} instances in {elapsed:.2f} seconds")

DeepSHAP: Neural Network Attribution

For deep learning models, DeepSHAP combines SHAP with DeepLIFT attributions. It propagates SHAP values through the network layers using efficient backpropagation-like updates.

Key insight: DeepSHAP approximates SHAP values by assuming features interact in specific ways through the network. It's not exact but is much faster than KernelSHAP for deep networks.

Gradient SHAP

An alternative for neural networks that combines integrated gradients with SHAP theory. Uses gradient information for efficient computation.

SHAP Algorithm Comparison
Algorithm	Model Types	Exactness	Complexity	Use Case
KernelSHAP	Any model	Approximate	O(n_samples × 2^k) where k = sample budget	Any model, small # of explanations
TreeSHAP	Tree ensembles	Exact	O(T × L × D²)	XGBoost, LightGBM, RF, CatBoost
DeepSHAP	Neural networks	Approximate	O(forward + backward pass)	Deep learning models
GradientSHAP	Differentiable models	Approximate	O(n_samples × gradient)	Neural networks, smooth models
LinearSHAP	Linear models	Exact	O(p)	Linear/logistic regression

Choosing the Right Algorithm

SHAP Visualizations

The SHAP library provides powerful visualizations that have become standard in ML interpretability. Understanding these visualizations is essential for communicating model behavior.

Force Plot: Single Instance Explanation

The force plot shows how each feature pushes the prediction from the base value (average) to the final prediction.

shap_force_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import shap
import matplotlib.pyplot as plt
 
# Assuming explainer and shap_values computed as above
# Force plot for a single instance
shap.initjs()  # Required for interactive plots in notebooks
 
# Single prediction explanation
idx = 0
shap.force_plot(
    explainer.expected_value,      # Base value
    shap_values[idx],              # SHAP values for this instance
    X_test[idx],                   # Feature values for this instance
    feature_names=housing.feature_names  # Feature names
)
 
# Force plot as matplotlib figure (for saving)
shap.force_plot(
    explainer.expected_value,
    shap_values[idx],
    X_test[idx],
    feature_names=housing.feature_names,
    matplotlib=True,
    show=False
)
plt.tight_layout()
plt.savefig("force_plot.png", dpi=150, bbox_inches='tight')

Reading a force plot:

Base value: Starting point (average prediction)
Features in red: Push prediction higher
Features in blue: Push prediction lower
Width of bar: Magnitude of effect
Output value: Final prediction (base + all SHAP values)

Summary Plot: Global Feature Importance

The summary plot aggregates SHAP values across all instances to show global feature importance and effect direction.

shap_summary_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import shap
import matplotlib.pyplot as plt
 
# Summary plot — shows feature importance across dataset
# Each dot is one instance; x-axis is SHAP value, color is feature value
 
plt.figure(figsize=(10, 8))
shap.summary_plot(
    shap_values,
    X_test,
    feature_names=housing.feature_names,
    show=False
)
plt.tight_layout()
plt.savefig("summary_plot.png", dpi=150, bbox_inches='tight')
plt.show()
 
# Bar version — just shows mean absolute SHAP values (magnitude only)
plt.figure(figsize=(10, 6))
shap.summary_plot(
    shap_values,
    X_test,
    feature_names=housing.feature_names,
    plot_type="bar",
    show=False
)
plt.tight_layout()
plt.savefig("summary_bar.png", dpi=150, bbox_inches='tight')
plt.show()

Reading a summary plot (beeswarm):

Y-axis: Features ranked by importance (most important at top)
X-axis: SHAP value (impact on prediction)
Color: Feature value (red = high, blue = low)
Spread: Shows distribution of effects across instances

Interpreting patterns:

Dots spread wide left/right = variable impact
Red dots on right = high feature values increase prediction
Blue dots on right = low feature values increase prediction

Dependence Plot: Feature Interactions

The dependence plot shows how a single feature's SHAP values vary with its actual values, revealing nonlinear effects and interactions.

shap_dependence_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import shap
import matplotlib.pyplot as plt
 
# Dependence plot for a specific feature
# Shows SHAP value vs feature value, colored by interacting feature
 
plt.figure(figsize=(10, 6))
shap.dependence_plot(
    "MedInc",                      # Feature to plot on x-axis
    shap_values,                   # SHAP values
    X_test,                        # Feature values
    feature_names=housing.feature_names,
    interaction_index="auto",      # Automatically find interacting feature
    show=False
)
plt.tight_layout()
plt.savefig("dependence_income.png", dpi=150, bbox_inches='tight')
plt.show()
 
# Specify interaction feature manually
plt.figure(figsize=(10, 6))
shap.dependence_plot(
    "HouseAge",
    shap_values,
    X_test,
    feature_names=housing.feature_names,
    interaction_index="MedInc",    # Color by median income
    show=False
)
plt.tight_layout()
plt.show()

Reading a dependence plot:

X-axis: Actual feature value
Y-axis: SHAP value for that feature
Color: Value of interacting feature
Vertical spread at same X: Indicates interaction effects

Waterfall Plot: Decomposed Prediction

The waterfall plot shows the cumulative contribution of each feature to a prediction.

shap_waterfall_plot.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import shap
import matplotlib.pyplot as plt
 
# Waterfall plot for single instance
# Shows step-by-step contribution from base to final prediction
 
idx = 0
 
# Create Explanation object for new SHAP API
explanation = shap.Explanation(
    values=shap_values[idx],
    base_values=explainer.expected_value,
    data=X_test[idx],
    feature_names=housing.feature_names
)
 
plt.figure(figsize=(10, 8))
shap.waterfall_plot(explanation, show=False)
plt.tight_layout()
plt.savefig("waterfall.png", dpi=150, bbox_inches='tight')
plt.show()

SHAP Visualization Guide
Plot Type	Purpose	When to Use
Force Plot	Explain single prediction	Debugging individual cases, stakeholder explanations
Summary (beeswarm)	Global importance + direction	Overall model understanding, feature selection
Summary (bar)	Global importance ranking	Quick importance overview, presentations
Dependence	Feature effect + interactions	Understanding nonlinear relationships
Waterfall	Decomposed single prediction	Detailed case analysis, auditing
Heatmap	Compare explanations across instances	Cohort analysis, pattern detection

Local vs Global Feature Importance

A key advantage of SHAP is providing both local (per-prediction) and global (model-wide) importance from the same underlying attribution.

From Local to Global

Global SHAP importance is computed by aggregating local SHAP values:

Mean Absolute SHAP Value (most common): $$I_j = \frac{1}{n} \sum_{i=1}^{n} |\phi_j(x_i)|$$

This measures the average magnitude of a feature's impact, regardless of direction.

Mean SHAP Value: $$I_j^{signed} = \frac{1}{n} \sum_{i=1}^{n} \phi_j(x_i)$$

This measures the average directional impact (positive means feature tends to increase predictions).

global_shap_importance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import numpy as np
import pandas as pd
 
# Compute global importance from SHAP values
# shap_values shape: (n_samples, n_features)
 
# Mean absolute SHAP value — standard global importance
global_importance = np.abs(shap_values).mean(axis=0)
 
# Mean SHAP value — directional importance
directional_importance = shap_values.mean(axis=0)
 
# Standard deviation — shows variability of impact
importance_std = np.abs(shap_values).std(axis=0)
 
# Create summary dataframe
importance_df = pd.DataFrame({
    'feature': housing.feature_names,
    'mean_abs_shap': global_importance,
    'mean_shap': directional_importance,
    'std_abs_shap': importance_std
}).sort_values('mean_abs_shap', ascending=False)
 
print("Global SHAP Feature Importance")
print("="*60)
print(importance_df.to_string(index=False))
 
# Compare with permutation importance (they often agree but not always)
from sklearn.inspection import permutation_importance
 
perm_result = permutation_importance(
    model, X_test, y_test, n_repeats=30, random_state=42
)
 
comparison = pd.DataFrame({
    'feature': housing.feature_names,
    'shap_importance': global_importance,
    'permutation_importance': perm_result.importances_mean
})
comparison['shap_rank'] = comparison['shap_importance'].rank(ascending=False)
comparison['perm_rank'] = comparison['permutation_importance'].rank(ascending=False)
print("\nSHAP vs Permutation Importance Comparison:")
print(comparison.sort_values('shap_importance', ascending=False).to_string(index=False))

Understanding the Difference from Permutation Importance

SHAP global importance and permutation importance measure different things:

Aspect	SHAP	Permutation
What it measures	Average magnitude of feature contribution	Performance drop when feature is randomized
Correlated features	Both features get credit	Importance is diluted
Model reliance	How much does feature contribute to predictions	How much does shuffling hurt performance
Direction	Can be positive or negative	Always measures loss increase

When they disagree:

Correlated features: SHAP assigns importance to both; permutation spreads across
Nonlinear interactions: SHAP captures complex effects; permutation may miss
Features used but not helpful: SHAP shows contribution; permutation shows 0 (performance unchanged)

Use Both Methods

SHAP Interaction Effects

SHAP can be extended to capture interaction effects — how pairs of features jointly contribute beyond their individual effects.

SHAP Interaction Values

For each pair of features $(i, j)$, the SHAP interaction value $\Phi_{i,j}$ captures their synergistic effect:

$$\phi_i(x) = \Phi_{i,i}(x) + \sum_{j \neq i} \Phi_{i,j}(x)$$

Diagonal terms $\Phi_{i,i}$: Main effects (contribution independent of interactions)
Off-diagonal terms $\Phi_{i,j}$: Interaction effects between features $i$ and $j$

Note: $\Phi_{i,j} = \Phi_{j,i}$ (interactions are symmetric).

shap_interactions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import shap
import numpy as np
import matplotlib.pyplot as plt
 
# TreeSHAP can compute exact interaction values (expensive but tractable)
# Only available for TreeExplainer
 
explainer = shap.TreeExplainer(model)
 
# Compute interaction values (this is slower than regular SHAP)
# Shape: (n_samples, n_features, n_features)
shap_interaction = explainer.shap_interaction_values(X_test[:100])  # Subset for speed
 
print("Interaction values shape:", shap_interaction.shape)
# e.g., (100, 8, 8) for 100 samples and 8 features
 
# Verify: sum of interactions equals regular SHAP values
idx = 0
regular_shap = explainer.shap_values(X_test[idx:idx+1])[0]
reconstructed = shap_interaction[idx].sum(axis=1)
 
print("\nVerification for instance 0:")
for i, name in enumerate(housing.feature_names):
    print(f"  {name}: regular={regular_shap[i]:.4f}, from_interactions={reconstructed[i]:.4f}")
 
# Visualize main effects vs interactions
# Sum absolute interactions by feature pair
mean_abs_interaction = np.abs(shap_interaction).mean(axis=0)
 
# Off-diagonal elements are the true interactions
np.fill_diagonal(mean_abs_interaction, 0)
 
# Plot interaction heatmap
plt.figure(figsize=(10, 8))
plt.imshow(mean_abs_interaction, cmap='RdBu_r', aspect='auto')
plt.xticks(range(len(housing.feature_names)), housing.feature_names, rotation=45, ha='right')
plt.yticks(range(len(housing.feature_names)), housing.feature_names)
plt.colorbar(label='Mean |Interaction|')
plt.title('SHAP Feature Interaction Matrix')
plt.tight_layout()
plt.savefig("interaction_heatmap.png", dpi=150)
plt.show()
 
# Find top interactions
interactions = []
for i in range(len(housing.feature_names)):
    for j in range(i+1, len(housing.feature_names)):
        interactions.append({
            'feature_1': housing.feature_names[i],
            'feature_2': housing.feature_names[j],
            'interaction_strength': mean_abs_interaction[i, j]
        })
 
import pandas as pd
interaction_df = pd.DataFrame(interactions).sort_values('interaction_strength', ascending=False)
print("\nTop Feature Interactions:")
print(interaction_df.head(10).to_string(index=False))

Interaction Values Are Expensive

Common Pitfalls and Best Practices

SHAP is powerful but can be misused. Here are common mistakes and how to avoid them.

Pitfall 1: Causal Misinterpretation

The mistake: 'SHAP shows income has high importance, so income causes the outcome.'

Best practice: State 'The model relies heavily on income for predictions' rather than 'Income causes the outcome.'

Pitfall 2: Ignoring Background Distribution

For KernelSHAP and related methods, the background distribution (reference dataset) significantly affects SHAP values. Different backgrounds give different attributions.

background_sensitivity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import shap
import numpy as np
 
# Different backgrounds give different SHAP values!
 
# Background 1: Random sample
background_random = shap.sample(X_train, 100)
explainer_random = shap.KernelExplainer(model.predict, background_random)
shap_random = explainer_random.shap_values(X_test[:5])
 
# Background 2: K-means summary (clusters the data)
background_kmeans = shap.kmeans(X_train, 10)
explainer_kmeans = shap.KernelExplainer(model.predict, background_kmeans)
shap_kmeans = explainer_kmeans.shap_values(X_test[:5])
 
# Compare
print("SHAP values vary with background choice:")
print(f"Random background: {shap_random[0]}")
print(f"K-means background: {shap_kmeans[0]}")
 
# Best practice: Use a representative background
# - For TreeExplainer: uses training data implicitly
# - For KernelExplainer: use training data or thoughtful sample

Pitfall 3: Feature Independence Assumption

Many SHAP implementations assume features are independent when computing "absent" feature values. This is called the interventional vs observational SHAP debate.

Problem: If features are correlated, sampling them independently creates unrealistic data points (e.g., pregnant males).

TreeSHAP solution: Use feature_perturbation='tree_path_dependent' (default) which respects conditional distributions.

Pitfall 4: Ignoring High Dimensionality

SHAP Pitfalls Summary

•Causal claims: SHAP measures model behavior, not real-world causation
•Background sensitivity: KernelSHAP results depend on reference distribution
•Independence assumption: Correlated features may produce unrealistic counterfactuals
•Computational cost: Don't use KernelSHAP for high-dimensional data; prefer TreeSHAP
•Local accuracy ≠ correctness: SHAP sums correctly but attributions can still be misleading
•Aggregation choices: Mean absolute vs signed mean give different global pictures

Best Practices

•Use model-specific explainers (TreeExplainer, LinearExplainer) when available
•Verify local accuracy: base_value + sum(shap_values) ≈ prediction
•Report background distribution choices and their rationale
•Use summary plots to communicate global patterns, not just individual examples
•Cross-validate importance rankings for stability
•Combine SHAP with domain knowledge for interpretation
•Document limitations when presenting to stakeholders

Production Implementation

Deploying SHAP in production systems requires careful engineering to balance explanation quality with latency and cost.

Architecture Patterns

Pattern 1: Pre-computed Explanations For batch predictions, compute SHAP values alongside predictions and store them. Explanations are retrieved rather than computed on-demand.

Pattern 2: On-demand Lightweight For real-time, use fast explainers (TreeSHAP) and limit to top-k features. Compute full explanations asynchronously if needed.

Pattern 3: Surrogate Model Train a fast interpretable surrogate (e.g., linear) on top of complex model predictions. Use surrogate coefficients as approximate explanations.

production_shap.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
import shap
import numpy as np
import json
from typing import Dict, List, Any
from dataclasses import dataclass
from functools import lru_cache
 
@dataclass
class SHAPExplanation:
    """Structured SHAP explanation for API responses."""
    prediction: float
    base_value: float
    top_positive: List[Dict[str, Any]]
    top_negative: List[Dict[str, Any]]
    all_contributions: Dict[str, float]
    
    def to_dict(self) -> Dict:
        return {
            'prediction': self.prediction,
            'base_value': self.base_value,
            'explanation': {
                'positive_factors': self.top_positive,
                'negative_factors': self.top_negative
            },
            'full_attribution': self.all_contributions
        }
 
 
class ProductionSHAPExplainer:
    """Production-ready SHAP explainer with caching and top-k selection."""
    
    def __init__(self, model, feature_names: List[str], background_data: np.ndarray):
        self.model = model
        self.feature_names = feature_names
        self.explainer = shap.TreeExplainer(model)  # Use TreeSHAP for speed
        self.base_value = self.explainer.expected_value
        
    def explain(self, x: np.ndarray, top_k: int = 5) -> SHAPExplanation:
        """
        Generate SHAP explanation for a single instance.
        
        Parameters
        ----------
        x : ndarray of shape (n_features,)
        top_k : number of top features to highlight
        
        Returns
        -------
        SHAPExplanation with top positive/negative contributors
        """
        # Ensure correct shape
        if x.ndim == 1:
            x = x.reshape(1, -1)
        
        # Compute prediction and SHAP values
        prediction = float(self.model.predict(x)[0])
        shap_values = self.explainer.shap_values(x)[0]
        
        # Create full attribution dict
        all_contributions = {
            name: float(val) 
            for name, val in zip(self.feature_names, shap_values)
        }
        
        # Sort by absolute value
        sorted_indices = np.argsort(np.abs(shap_values))[::-1]
        
        # Extract top positive and negative
        top_positive = []
        top_negative = []
        
        for idx in sorted_indices:
            if len(top_positive) >= top_k and len(top_negative) >= top_k:
                break
            
            val = shap_values[idx]
            entry = {
                'feature': self.feature_names[idx],
                'value': float(x[0, idx]),
                'contribution': float(val)
            }
            
            if val > 0 and len(top_positive) < top_k:
                top_positive.append(entry)
            elif val < 0 and len(top_negative) < top_k:
                top_negative.append(entry)
        
        return SHAPExplanation(
            prediction=prediction,
            base_value=float(self.base_value),
            top_positive=top_positive,
            top_negative=top_negative,
            all_contributions=all_contributions
        )
    
    def explain_batch(self, X: np.ndarray) -> List[SHAPExplanation]:
        """Efficiently explain a batch of instances."""
        predictions = self.model.predict(X)
        shap_values = self.explainer.shap_values(X)
        
        explanations = []
        for i in range(len(X)):
            explanations.append(self._build_explanation(
                X[i], predictions[i], shap_values[i]
            ))
        
        return explanations
 
 
# Usage example
if __name__ == "__main__":
    # Setup
    explainer = ProductionSHAPExplainer(model, housing.feature_names, X_train)
    
    # Single explanation
    explanation = explainer.explain(X_test[0])
    
    # Format for API response
    response = explanation.to_dict()
    print(json.dumps(response, indent=2))

Latency Targets

For real-time APIs: TreeSHAP typically computes in 1-10ms per instance. KernelSHAP takes 100ms-10s. For batch systems, precompute during ETL and store explanations alongside predictions.

Summary

SHAP values provide the most principled framework for feature attribution in machine learning. Let's consolidate the key insights:

Key Takeaways

•Decomposition property: SHAP values exactly decompose predictions into feature contributions (base + Σϕ = prediction)
•Unique fairness: SHAP is the only method satisfying local accuracy, missingness, and consistency simultaneously
•Algorithm choice: Use TreeSHAP for trees (exact, fast), KernelSHAP for other models (approximate, slow)
•Visualizations: Force plots for individuals, summary plots for global patterns, dependence plots for effects
•Local to global: Aggregate local SHAP values for global importance that captures direction and magnitude
•Not causal: SHAP measures model reliance, not real-world causation
•Production: Precompute for batch, use fast explainers for real-time, return top-k for APIs

Ready for Shapley Theory

2 / 5