Loading content...
In 1953, mathematician and economist Lloyd Shapley tackled a seemingly simple question: How should the payoff from a cooperative game be fairly divided among the players?
Imagine three workers collaborate on a project. Worker A alone can produce $100. Worker B alone can produce $80. Worker C alone can produce $50. But working together, they produce $400—far more than the sum of their individual efforts due to synergies. How should the $400 be divided?
Shapley proved something remarkable: there is exactly one way to divide the payoff that satisfies a set of intuitive fairness criteria. This became known as the Shapley value, and Shapley was awarded the Nobel Prize in Economics in 2012 for this foundational contribution to game theory.
Six decades later, Lundberg and Lee recognized that feature attribution in machine learning is mathematically identical to this payoff division problem. The prediction is the "payoff"; the features are the "players." SHAP values are simply Shapley values applied to machine learning—and they inherit all of Shapley's theoretical guarantees.
Understanding Shapley theory isn't just academic. It reveals why SHAP values are 'correct' in a precise mathematical sense, what their limitations are, and how to extend them for new applications. This theoretical foundation distinguishes principled interpretability from ad-hoc heuristics.
Before diving into Shapley values, we need the basic vocabulary of cooperative game theory.
A coalitional game (also called cooperative game) consists of:
The function $v(S)$ represents the total payoff that coalition $S$ can achieve by cooperating. By convention, $v(\emptyset) = 0$.
Three workers can collaborate on projects:
The characteristic function captures all possible coalitions and their values.
Given that the grand coalition $N$ produces value $v(N)$, how should this value be divided among players?
We seek a payoff vector $(\phi_1, \phi_2, ..., \phi_n)$ where $\phi_i$ is the payoff to player $i$.
Desirable properties:
The brilliant insight is that a player's contribution depends on context — specifically, which other players are already present. Worker A's contribution when joining an empty group differs from their contribution when joining a group that already includes B and C.
Shapley's solution: Average a player's marginal contribution across all possible orderings of players.
In the ML context: Players = Features. The characteristic function v(S) = expected model output when only features in S are 'present' and others are 'absent.' The payoff to divide = difference between prediction and baseline.
The Shapley value for player $i$ is defined as:
$$\phi_i(v) = \sum_{S \subseteq N \setminus {i}} \frac{|S|!(n-|S|-1)!}{n!} \cdot [v(S \cup {i}) - v(S)]$$
Let's carefully unpack each component.
$$v(S \cup {i}) - v(S)$$
This is the marginal contribution of player $i$ to coalition $S$. It's how much value player $i$ adds when joining coalition $S$.
$$\frac{|S|!(n-|S|-1)!}{n!}$$
This weight appears mysterious, but it has a beautiful interpretation.
Consider all $n!$ possible orderings (permutations) of players. For any ordering:
The weight counts how many orderings produce coalition $S$ before player $i$:
The Shapley value is the expected marginal contribution when players arrive in a random order. Imagine players entering a room one by one; each player's contribution is what they add to those already present. Average over all orderings = Shapley value.
The permutation interpretation gives an equivalent formula:
$$\phi_i(v) = \frac{1}{n!} \sum_{\pi \in \Pi(N)} [v(S_i^\pi \cup {i}) - v(S_i^\pi)]$$
Where:
This formula is conceptually clearer but computationally equivalent to the first formula.
For our three workers $N = {A, B, C}$ with $v({A,B,C}) = 400$:
$$\phi_A = \frac{1}{6}[(v(A) - v(\emptyset)) + (v(A,B) - v(B)) + (v(A,C) - v(C)) + (v(A,B) - v(B)) + (v(A,C) - v(C)) + (v(A,B,C) - v(B,C))]$$
Wait—this overcounts. Using the coalition formula:
$$\phi_A = \frac{0!2!}{3!}(100-0) + \frac{1!1!}{3!}(250-80) + \frac{1!1!}{3!}(200-50) + \frac{2!0!}{3!}(400-160)$$
$$\phi_A = \frac{2}{6}(100) + \frac{1}{6}(170) + \frac{1}{6}(150) + \frac{2}{6}(240)$$
$$\phi_A = 33.33 + 28.33 + 25 + 80 = 166.67$$
Similarly: $\phi_B = 128.33$, $\phi_C = 105$.
Verification: $166.67 + 128.33 + 105 = 400$ ✓
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import numpy as npfrom itertools import permutationsfrom math import factorialfrom typing import Callable, List, Dict def shapley_value_exact(n: int, v: Callable[[set], float]) -> np.ndarray: """ Compute exact Shapley values using the permutation formula. Parameters ---------- n : number of players (0, 1, ..., n-1) v : characteristic function, v(S) -> value for coalition S Returns ------- ndarray of Shapley values for each player """ players = list(range(n)) shapley_values = np.zeros(n) # Enumerate all permutations all_perms = list(permutations(players)) for perm in all_perms: coalition = set() for player in perm: # Marginal contribution of player to current coalition marginal = v(coalition | {player}) - v(coalition) shapley_values[player] += marginal coalition.add(player) # Average over all permutations shapley_values /= factorial(n) return shapley_values def shapley_value_coalition_formula(n: int, v: Callable[[set], float]) -> np.ndarray: """ Compute exact Shapley values using the coalition formula. More efficient than permutation formula for small n. """ from itertools import combinations shapley_values = np.zeros(n) for i in range(n): others = [j for j in range(n) if j != i] for size in range(n): # |S| from 0 to n-1 for S in combinations(others, size): S_set = set(S) # Weight: |S|! * (n-|S|-1)! / n! weight = (factorial(size) * factorial(n - size - 1)) / factorial(n) # Marginal contribution marginal = v(S_set | {i}) - v(S_set) shapley_values[i] += weight * marginal return shapley_values # Example: Three workers gamedef workers_game(S: set) -> float: """Characteristic function for three workers example.""" values = { frozenset(): 0, frozenset({0}): 100, # A alone frozenset({1}): 80, # B alone frozenset({2}): 50, # C alone frozenset({0, 1}): 250, frozenset({0, 2}): 200, frozenset({1, 2}): 160, frozenset({0, 1, 2}): 400 } return values.get(frozenset(S), 0) # Compute Shapley valuesshapley = shapley_value_exact(3, workers_game)print("Shapley Values (Three Workers):")print(f" Worker A: {shapley[0]:.2f}")print(f" Worker B: {shapley[1]:.2f}")print(f" Worker C: {shapley[2]:.2f}")print(f" Sum: {shapley.sum():.2f} (should equal 400)")Shapley's profound contribution wasn't just the formula—it was proving that this is the only payoff division satisfying a set of reasonable fairness axioms.
$$\sum_{i \in N} \phi_i(v) = v(N)$$
The payoffs sum exactly to the total value of the grand coalition. Nothing is wasted, nothing is created out of thin air.
If players $i$ and $j$ are interchangeable in every coalition: $$v(S \cup {i}) = v(S \cup {j}) \text{ for all } S \subseteq N \setminus {i,j}$$
Then they receive equal payoffs: $$\phi_i(v) = \phi_j(v)$$
Functionally equivalent players get the same reward.
If player $i$ contributes nothing in every coalition: $$v(S \cup {i}) = v(S) \text{ for all } S \subseteq N$$
Then their payoff is zero: $$\phi_i(v) = 0$$
Players who don't contribute don't get paid.
For any two games $v$ and $w$ on the same player set: $$\phi_i(v + w) = \phi_i(v) + \phi_i(w)$$
where $(v+w)(S) = v(S) + w(S)$.
If you combine two games, the payoffs combine accordingly.
There exists exactly one value function $\phi$ satisfying Efficiency, Symmetry, Null Player, and Additivity—and it is the Shapley value. Any other "fair" payoff division that claims to satisfy these axioms is either the Shapley value or violates at least one axiom.
In the context of feature attribution:
Efficiency → The SHAP values plus base value exactly equal the prediction. No contribution is left unexplained.
Symmetry → Features that contribute equally in all contexts get equal attribution. No arbitrary favoritism.
Null Player → Features the model doesn't use get zero attribution. Credit only for actual contribution.
Additivity → For ensemble models, attributions can be computed per component and summed.
If you want your feature attribution to satisfy these properties (and you should), you must use SHAP values. There is no alternative.
The proof proceeds by showing that any function satisfying the four axioms must equal the Shapley formula:
| Axiom | Game Theory Meaning | ML Feature Attribution Meaning |
|---|---|---|
| Efficiency | Payoffs sum to v(N) | SHAP values + base = prediction |
| Symmetry | Equivalent players get equal pay | Equivalent features get equal credit |
| Null Player | Non-contributors get zero | Unused features get zero attribution |
| Additivity | Payoffs are linear in games | Ensemble attributions decompose |
The original Shapley axioms aren't the only way to characterize the Shapley value. Alternative axiom sets provide different insights into why SHAP is 'correct.'
Young replaced Additivity with a different axiom:
Strong Monotonicity: If player $i$'s marginal contribution increases (or stays same) in every coalition when moving from game $v$ to game $w$, then $\phi_i(w) \geq \phi_i(v)$.
Formally: If for all $S$: $$w(S \cup {i}) - w(S) \geq v(S \cup {i}) - v(S)$$
Then $\phi_i(w) \geq \phi_i(v)$.
Theorem (Young): Efficiency + Symmetry + Strong Monotonicity uniquely characterize the Shapley value.
This axiomatization is more intuitive for ML: if we change the model so a feature becomes more influential in all contexts, its attribution should increase.
Another approach uses consistency (or reduced game property):
If some players leave and take their Shapley payoffs, the Shapley values for remaining players in the "reduced game" should equal their original Shapley values.
This is philosophically appealing but technically complex.
The Shapley value can be derived from a potential function $P(v)$:
$$\phi_i(v) = P(v) - P(v_{-i})$$
where $v_{-i}$ is the game restricted to players other than $i$. This connects to marginal contributions in a more global sense.
The existence of multiple axiomatizations strengthens confidence in SHAP. No matter which set of 'fairness' properties you consider fundamental, you arrive at the same attribution method. This isn't just math—it's a strong argument that SHAP captures something deeply correct.
Translating Shapley values from game theory to machine learning requires defining the characteristic function appropriately.
For a trained model $f$, input $x = (x_1, ..., x_p)$, and baseline expectation $\mathbb{E}[f(X)]$:
$$v(S) = \mathbb{E}[f(X) | X_S = x_S] - \mathbb{E}[f(X)]$$
Here:
Alternatively, the SHAP literature often uses: $$v(S) = \mathbb{E}[f(x_S, X_{\bar{S}})]$$
where $x_S$ are observed feature values and $X_{\bar{S}}$ are drawn from a background distribution.
What does "feature absent" mean?
This is the central philosophical issue in SHAP. Several interpretations exist:
1. Marginal/Interventional: Replace absent features with random samples from the marginal distribution. Features are treated as independent.
2. Conditional/Observational: Sample absent features from their conditional distribution given present features. Respects correlations.
3. Reference-based: Replace absent features with some fixed baseline value (e.g., mean, zero, or a chosen reference point).
| Approach | How Absent Features Are Sampled | Properties |
|---|---|---|
| Interventional | From marginal P(X_absent) | May create unrealistic combinations (pregnant males) |
| Observational | From P(X_absent | X_present) | Computationally expensive, respects correlations |
| Tree Path-Dependent | Follows tree decision paths | Natural for trees, exact in TreeSHAP |
| Reference Point | Fixed baseline value (e.g., mean) | Simple but may bias attributions |
Applying the Shapley formula to the ML characteristic function:
$$\phi_j(x) = \sum_{S \subseteq {1,...,p} \setminus {j}} \frac{|S|!(p-|S|-1)!}{p!} [f_S(x_S \cup {x_j}) - f_S(x_S)]$$
where $f_S(x_S)$ is the model's expected output when only features in $S$ are observed.
Key properties inherited from game theory:
The exact Shapley formula requires evaluating 2^p coalitions. For p=20 features, that's over a million model evaluations per prediction. For p=100, it's computationally intractable. This is why efficient approximations (TreeSHAP, KernelSHAP) are essential for practical use.
The exponential nature of exact Shapley computation is a fundamental barrier. Understanding the complexity landscape guides algorithm and approximation choices.
The naive algorithm evaluates all $2^p$ coalitions:
For each player i:
For each coalition S not containing i:
Compute v(S) and v(S ∪ {i})
Accumulate weighted marginal contribution
This requires $O(n \cdot 2^{n-1})$ coalition evaluations. For p = 20 features: ~10 million. For p = 30: ~5 billion. Clearly intractable for even moderate dimensionality.
Enumerating permutations gives the same result:
For each of p! permutations:
For each position in the permutation:
Compute marginal contribution
Add to running sum
This is $O(p!)$ which is even worse than $O(2^p)$ for large $p$ (since $p! > 2^p$ for $p \geq 3$). Not useful for exact computation but inspires sampling-based approximations.
Instead of all permutations, sample $m$ random permutations and average:
$$\hat{\phi}i = \frac{1}{m} \sum{k=1}^{m} [v(S_i^{\pi_k} \cup {i}) - v(S_i^{\pi_k})]$$
where $\pi_k$ is the $k$-th sampled permutation.
Complexity: $O(m \cdot p \cdot T_{model})$ where $T_{model}$ is model inference time.
Error: By Hoeffding's inequality, the estimation error is $O(1/\sqrt{m})$. With $m = 1000$ samples, we get reasonable approximations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import numpy as npfrom typing import Callable def shapley_sampling( n: int, v: Callable[[set], float], num_samples: int = 1000, random_state: int = 42) -> np.ndarray: """ Approximate Shapley values via permutation sampling. Parameters ---------- n : number of players v : characteristic function num_samples : number of permutations to sample Returns ------- Approximate Shapley values """ rng = np.random.RandomState(random_state) players = list(range(n)) shapley_values = np.zeros(n) for _ in range(num_samples): # Random permutation perm = rng.permutation(players) coalition = set() for player in perm: # Marginal contribution marginal = v(coalition | {player}) - v(coalition) shapley_values[player] += marginal coalition.add(player) shapley_values /= num_samples return shapley_values def shapley_with_ci( n: int, v: Callable[[set], float], num_samples: int = 1000, confidence: float = 0.95, random_state: int = 42) -> dict: """ Shapley values with confidence intervals. """ import scipy.stats as stats rng = np.random.RandomState(random_state) players = list(range(n)) # Store all marginal contributions marginals = [[] for _ in range(n)] for _ in range(num_samples): perm = rng.permutation(players) coalition = set() for player in perm: marginal = v(coalition | {player}) - v(coalition) marginals[player].append(marginal) coalition.add(player) # Compute means and confidence intervals results = {} z = stats.norm.ppf(0.5 + confidence / 2) for i in range(n): mean = np.mean(marginals[i]) std = np.std(marginals[i], ddof=1) margin = z * std / np.sqrt(num_samples) results[i] = { 'mean': mean, 'std': std, 'ci_lower': mean - margin, 'ci_upper': mean + margin } return results # Example usage with convergence analysisdef analyze_convergence(n, v, max_samples=10000): """Analyze how estimates converge with sample size.""" sample_sizes = [10, 50, 100, 500, 1000, 5000, 10000] sample_sizes = [s for s in sample_sizes if s <= max_samples] # Exact value (if small enough) if n <= 10: exact = shapley_value_exact(n, v) print(f"Exact Shapley values: {exact}") print("Convergence Analysis:") print(f"{'Samples':>8} | {'Approx Values':>40} | {'Max Error':>10}") print("-" * 65) for m in sample_sizes: approx = shapley_sampling(n, v, num_samples=m) if n <= 10: max_error = np.max(np.abs(approx - exact)) print(f"{m:>8} | {str(approx.round(2)):>40} | {max_error:>10.4f}") else: print(f"{m:>8} | {str(approx.round(2)):>40}")| Method | Complexity | Exact? | Use Case |
|---|---|---|---|
| Enumerate coalitions | O(2^p) | Yes | p ≤ 15 |
| Enumerate permutations | O(p!) | Yes | p ≤ 10 |
| Permutation sampling | O(m·p·T) | No | Any p, any model |
| KernelSHAP | O(m·p·T) + regression | No | Any model, optimized |
| TreeSHAP | O(T·L·D²) | Yes* | Tree ensembles only |
Several theoretical extensions of Shapley values address specific ML challenges.
The standard Shapley value weights all permutations equally. Weighted Shapley values allow non-uniform weights on player orderings.
In ML, this might reflect:
$$\phi_i^w(v) = \sum_{S \subseteq N \setminus {i}} w(S) [v(S \cup {i}) - v(S)]$$
where $w(S)$ is a weighting scheme different from the standard $\frac{|S|!(n-|S|-1)!}{n!}$.
When players are pre-grouped into unions (e.g., features grouped by type), Owen values provide attribution that respects this structure:
This is useful when features have natural groupings (demographics, financials, behavioral signals).
Standard Shapley assumes no ordering among players. Asymmetric Shapley values allow for a partial ordering:
If feature A causally precedes feature B, we might want attributions that respect this: A gets credit before B in all orderings.
A recent extension that computes Shapley values on a graph structure representing feature dependencies, providing better handling of correlation.
These extensions are active research topics. The standard SHAP implementation uses classic Shapley values, but domain-specific applications may benefit from these variations. Check recent literature for implementation details.
Despite its theoretical elegance, Shapley-based feature attribution has fundamental limitations.
Shapley assumes players are evaluated in isolation when forming coalitions. For correlated features, this creates issues:
Problem: When feature A is "absent," what value should it take? If A and B are correlated, randomly sampling A (interventional) creates unrealistic feature combinations.
Example: If education and income are correlated, conditioning on high income while randomizing education (to make it "absent") produces unlikely scenarios.
Partial solutions:
Shapley values measure contribution to model output, not causal effect on the real-world outcome.
Example: A model might use zip code as a predictor. SHAP shows zip code has high importance. But zip code doesn't cause the outcome—it's a proxy for other factors.
Implication: SHAP answers "What does the model use?" not "What truly matters for the outcome?"
Despite approximations:
For very high-dimensional data (p > 1000), even approximations become challenging.
Shapley theory provides the rigorous mathematical foundation for SHAP values. The key insights are:
You now understand the deep theoretical foundations of SHAP. Next, we'll explore LIME (Local Interpretable Model-agnostic Explanations)—an alternative approach that explains predictions by fitting local surrogate models. While LIME lacks SHAP's theoretical guarantees, it offers computational advantages and different interpretability trade-offs.