Feature Attribution Methods - Learning Module

Loading content...

0/278

Shapley Theory

The Mathematics of Fair Attribution

In 1953, mathematician and economist Lloyd Shapley tackled a seemingly simple question: How should the payoff from a cooperative game be fairly divided among the players?

Imagine three workers collaborate on a project. Worker A alone can produce $100. Worker B alone can produce $80. Worker C alone can produce $50. But working together, they produce $400—far more than the sum of their individual efforts due to synergies. How should the $400 be divided?

Shapley proved something remarkable: there is exactly one way to divide the payoff that satisfies a set of intuitive fairness criteria. This became known as the Shapley value, and Shapley was awarded the Nobel Prize in Economics in 2012 for this foundational contribution to game theory.

Six decades later, Lundberg and Lee recognized that feature attribution in machine learning is mathematically identical to this payoff division problem. The prediction is the "payoff"; the features are the "players." SHAP values are simply Shapley values applied to machine learning—and they inherit all of Shapley's theoretical guarantees.

Why Theory Matters

Understanding Shapley theory isn't just academic. It reveals why SHAP values are 'correct' in a precise mathematical sense, what their limitations are, and how to extend them for new applications. This theoretical foundation distinguishes principled interpretability from ad-hoc heuristics.

Cooperative Game Theory Basics

Before diving into Shapley values, we need the basic vocabulary of cooperative game theory.

The Coalitional Game

A coalitional game (also called cooperative game) consists of:

A set of players: $N = {1, 2, ..., n}$
A characteristic function: $v: 2^N \rightarrow \mathbb{R}$ that assigns a value to every subset (coalition) of players

The function $v(S)$ represents the total payoff that coalition $S$ can achieve by cooperating. By convention, $v(\emptyset) = 0$.

Example: Three Workers

Three workers can collaborate on projects:

$N = {A, B, C}$
$v({A}) = 100$ (A working alone)
$v({B}) = 80$
$v({C}) = 50$
$v({A, B}) = 250$ (A and B together)
$v({A, C}) = 200$
$v({B, C}) = 160$
$v({A, B, C}) = 400$ (the grand coalition)

The characteristic function captures all possible coalitions and their values.

The Payoff Division Problem

Given that the grand coalition $N$ produces value $v(N)$, how should this value be divided among players?

We seek a payoff vector $(\phi_1, \phi_2, ..., \phi_n)$ where $\phi_i$ is the payoff to player $i$.

Desirable properties:

Efficiency: $\sum_{i} \phi_i = v(N)$ — divide the full payoff, no more, no less
Fairness: Players who contribute more should receive more
Consistency: The division method should behave sensibly across different games

The Key Insight

The brilliant insight is that a player's contribution depends on context — specifically, which other players are already present. Worker A's contribution when joining an empty group differs from their contribution when joining a group that already includes B and C.

Shapley's solution: Average a player's marginal contribution across all possible orderings of players.

From Games to ML

In the ML context: Players = Features. The characteristic function v(S) = expected model output when only features in S are 'present' and others are 'absent.' The payoff to divide = difference between prediction and baseline.

The Shapley Value Formula

The Shapley value for player $i$ is defined as:

$$\phi_i(v) = \sum_{S \subseteq N \setminus {i}} \frac{|S|!(n-|S|-1)!}{n!} \cdot [v(S \cup {i}) - v(S)]$$

Let's carefully unpack each component.

The Marginal Contribution

$$v(S \cup {i}) - v(S)$$

This is the marginal contribution of player $i$ to coalition $S$. It's how much value player $i$ adds when joining coalition $S$.

If $S = \emptyset$: How much value does $i$ create from nothing?
If $S$ contains synergistic partners: The marginal contribution may be high
If $S$ contains redundant players: The marginal contribution may be low

The Weighting Factor

$$\frac{|S|!(n-|S|-1)!}{n!}$$

This weight appears mysterious, but it has a beautiful interpretation.

Consider all $n!$ possible orderings (permutations) of players. For any ordering:

Some players come before $i$ — these form a coalition $S$
Player $i$ joins this coalition
The remaining players come after

The weight counts how many orderings produce coalition $S$ before player $i$:

Players in $S$ can be ordered in $|S|!$ ways
Players after $i$ can be ordered in $(n-|S|-1)!$ ways
Total orderings producing $S$: $|S|!(n-|S|-1)!$
Probability of $S$ among all orderings: $\frac{|S|!(n-|S|-1)!}{n!}$

The Permutation Interpretation

The Shapley value is the expected marginal contribution when players arrive in a random order. Imagine players entering a room one by one; each player's contribution is what they add to those already present. Average over all orderings = Shapley value.

Equivalent Permutation Formula

The permutation interpretation gives an equivalent formula:

$$\phi_i(v) = \frac{1}{n!} \sum_{\pi \in \Pi(N)} [v(S_i^\pi \cup {i}) - v(S_i^\pi)]$$

Where:

$\Pi(N)$ is the set of all permutations of $N$
$S_i^\pi$ is the set of players before $i$ in permutation $\pi$

This formula is conceptually clearer but computationally equivalent to the first formula.

Worked Example

For our three workers $N = {A, B, C}$ with $v({A,B,C}) = 400$:

$$\phi_A = \frac{1}{6}[(v(A) - v(\emptyset)) + (v(A,B) - v(B)) + (v(A,C) - v(C)) + (v(A,B) - v(B)) + (v(A,C) - v(C)) + (v(A,B,C) - v(B,C))]$$

Wait—this overcounts. Using the coalition formula:

$$\phi_A = \frac{0!2!}{3!}(100-0) + \frac{1!1!}{3!}(250-80) + \frac{1!1!}{3!}(200-50) + \frac{2!0!}{3!}(400-160)$$

$$\phi_A = \frac{2}{6}(100) + \frac{1}{6}(170) + \frac{1}{6}(150) + \frac{2}{6}(240)$$

$$\phi_A = 33.33 + 28.33 + 25 + 80 = 166.67$$

Similarly: $\phi_B = 128.33$, $\phi_C = 105$.

Verification: $166.67 + 128.33 + 105 = 400$ ✓

shapley_computation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
from itertools import permutations
from math import factorial
from typing import Callable, List, Dict
 
def shapley_value_exact(n: int, v: Callable[[set], float]) -> np.ndarray:
    """
    Compute exact Shapley values using the permutation formula.
    
    Parameters
    ----------
    n : number of players (0, 1, ..., n-1)
    v : characteristic function, v(S) -> value for coalition S
    
    Returns
    -------
    ndarray of Shapley values for each player
    """
    players = list(range(n))
    shapley_values = np.zeros(n)
    
    # Enumerate all permutations
    all_perms = list(permutations(players))
    
    for perm in all_perms:
        coalition = set()
        for player in perm:
            # Marginal contribution of player to current coalition
            marginal = v(coalition | {player}) - v(coalition)
            shapley_values[player] += marginal
            coalition.add(player)
    
    # Average over all permutations
    shapley_values /= factorial(n)
    return shapley_values
 
 
def shapley_value_coalition_formula(n: int, v: Callable[[set], float]) -> np.ndarray:
    """
    Compute exact Shapley values using the coalition formula.
    More efficient than permutation formula for small n.
    """
    from itertools import combinations
    
    shapley_values = np.zeros(n)
    
    for i in range(n):
        others = [j for j in range(n) if j != i]
        
        for size in range(n):  # |S| from 0 to n-1
            for S in combinations(others, size):
                S_set = set(S)
                
                # Weight: |S|! * (n-|S|-1)! / n!
                weight = (factorial(size) * factorial(n - size - 1)) / factorial(n)
                
                # Marginal contribution
                marginal = v(S_set | {i}) - v(S_set)
                
                shapley_values[i] += weight * marginal
    
    return shapley_values
 
 
# Example: Three workers game
def workers_game(S: set) -> float:
    """Characteristic function for three workers example."""
    values = {
        frozenset(): 0,
        frozenset({0}): 100,  # A alone
        frozenset({1}): 80,   # B alone
        frozenset({2}): 50,   # C alone
        frozenset({0, 1}): 250,
        frozenset({0, 2}): 200,
        frozenset({1, 2}): 160,
        frozenset({0, 1, 2}): 400
    }
    return values.get(frozenset(S), 0)
 
 
# Compute Shapley values
shapley = shapley_value_exact(3, workers_game)
print("Shapley Values (Three Workers):")
print(f"  Worker A: {shapley[0]:.2f}")
print(f"  Worker B: {shapley[1]:.2f}")
print(f"  Worker C: {shapley[2]:.2f}")
print(f"  Sum: {shapley.sum():.2f} (should equal 400)")

Axiomatic Characterization

Shapley's profound contribution wasn't just the formula—it was proving that this is the only payoff division satisfying a set of reasonable fairness axioms.

The Four Shapley Axioms

Axiom 1: Efficiency (Pareto Optimality)

$$\sum_{i \in N} \phi_i(v) = v(N)$$

The payoffs sum exactly to the total value of the grand coalition. Nothing is wasted, nothing is created out of thin air.

Axiom 2: Symmetry

If players $i$ and $j$ are interchangeable in every coalition: $$v(S \cup {i}) = v(S \cup {j}) \text{ for all } S \subseteq N \setminus {i,j}$$

Then they receive equal payoffs: $$\phi_i(v) = \phi_j(v)$$

Functionally equivalent players get the same reward.

Axiom 3: Null Player (Dummy)

If player $i$ contributes nothing in every coalition: $$v(S \cup {i}) = v(S) \text{ for all } S \subseteq N$$

Then their payoff is zero: $$\phi_i(v) = 0$$

Players who don't contribute don't get paid.

Axiom 4: Additivity (Linearity)

For any two games $v$ and $w$ on the same player set: $$\phi_i(v + w) = \phi_i(v) + \phi_i(w)$$

where $(v+w)(S) = v(S) + w(S)$.

If you combine two games, the payoffs combine accordingly.

Shapley's Uniqueness Theorem (1953)

There exists exactly one value function $\phi$ satisfying Efficiency, Symmetry, Null Player, and Additivity—and it is the Shapley value. Any other "fair" payoff division that claims to satisfy these axioms is either the Shapley value or violates at least one axiom.

Why This Matters for ML

In the context of feature attribution:

Efficiency → The SHAP values plus base value exactly equal the prediction. No contribution is left unexplained.

Symmetry → Features that contribute equally in all contexts get equal attribution. No arbitrary favoritism.

Null Player → Features the model doesn't use get zero attribution. Credit only for actual contribution.

Additivity → For ensemble models, attributions can be computed per component and summed.

If you want your feature attribution to satisfy these properties (and you should), you must use SHAP values. There is no alternative.

Proof Sketch of Uniqueness

The proof proceeds by showing that any function satisfying the four axioms must equal the Shapley formula:

Any game can be decomposed into a sum of simple "unanimity games" $u_T$ where coalition $T$ (and supersets) have value 1, others have 0
Additivity means we can compute the value for each unanimity game separately
For unanimity games, symmetry and efficiency uniquely determine the value: each player in $T$ gets $\frac{1}{|T|}$, others get 0
Reconstructing any game from unanimity games gives the Shapley formula

Shapley Axioms and Their ML Meaning
Axiom	Game Theory Meaning	ML Feature Attribution Meaning
Efficiency	Payoffs sum to v(N)	SHAP values + base = prediction
Symmetry	Equivalent players get equal pay	Equivalent features get equal credit
Null Player	Non-contributors get zero	Unused features get zero attribution
Additivity	Payoffs are linear in games	Ensemble attributions decompose

Alternative Axiomatizations

The original Shapley axioms aren't the only way to characterize the Shapley value. Alternative axiom sets provide different insights into why SHAP is 'correct.'

Young's Axiomatization (1985)

Young replaced Additivity with a different axiom:

Strong Monotonicity: If player $i$'s marginal contribution increases (or stays same) in every coalition when moving from game $v$ to game $w$, then $\phi_i(w) \geq \phi_i(v)$.

Formally: If for all $S$: $$w(S \cup {i}) - w(S) \geq v(S \cup {i}) - v(S)$$

Then $\phi_i(w) \geq \phi_i(v)$.

Theorem (Young): Efficiency + Symmetry + Strong Monotonicity uniquely characterize the Shapley value.

This axiomatization is more intuitive for ML: if we change the model so a feature becomes more influential in all contexts, its attribution should increase.

Consistency-Based Axiomatization

Another approach uses consistency (or reduced game property):

If some players leave and take their Shapley payoffs, the Shapley values for remaining players in the "reduced game" should equal their original Shapley values.

This is philosophically appealing but technically complex.

Potential Function Approach

The Shapley value can be derived from a potential function $P(v)$:

$$\phi_i(v) = P(v) - P(v_{-i})$$

where $v_{-i}$ is the game restricted to players other than $i$. This connects to marginal contributions in a more global sense.

Multiple Roads to the Same Destination

The existence of multiple axiomatizations strengthens confidence in SHAP. No matter which set of 'fairness' properties you consider fundamental, you arrive at the same attribution method. This isn't just math—it's a strong argument that SHAP captures something deeply correct.

Shapley Values in the ML Context

Translating Shapley values from game theory to machine learning requires defining the characteristic function appropriately.

The ML Characteristic Function

For a trained model $f$, input $x = (x_1, ..., x_p)$, and baseline expectation $\mathbb{E}[f(X)]$:

$$v(S) = \mathbb{E}[f(X) | X_S = x_S] - \mathbb{E}[f(X)]$$

Here:

$S \subseteq {1, ..., p}$ is the set of "present" features
$X_S = x_S$ means we condition on the S-features taking their observed values
The expectation is over the "absent" features distributed according to some background

Alternatively, the SHAP literature often uses: $$v(S) = \mathbb{E}[f(x_S, X_{\bar{S}})]$$

where $x_S$ are observed feature values and $X_{\bar{S}}$ are drawn from a background distribution.

The Interpretation Challenge

What does "feature absent" mean?

This is the central philosophical issue in SHAP. Several interpretations exist:

1. Marginal/Interventional: Replace absent features with random samples from the marginal distribution. Features are treated as independent.

2. Conditional/Observational: Sample absent features from their conditional distribution given present features. Respects correlations.

3. Reference-based: Replace absent features with some fixed baseline value (e.g., mean, zero, or a chosen reference point).

SHAP Background Distribution Choices
Approach	How Absent Features Are Sampled	Properties
Interventional	From marginal P(X_absent)	May create unrealistic combinations (pregnant males)
Observational	From P(X_absent \| X_present)	Computationally expensive, respects correlations
Tree Path-Dependent	Follows tree decision paths	Natural for trees, exact in TreeSHAP
Reference Point	Fixed baseline value (e.g., mean)	Simple but may bias attributions

The SHAP Value Formula for ML

Applying the Shapley formula to the ML characteristic function:

$$\phi_j(x) = \sum_{S \subseteq {1,...,p} \setminus {j}} \frac{|S|!(p-|S|-1)!}{p!} [f_S(x_S \cup {x_j}) - f_S(x_S)]$$

where $f_S(x_S)$ is the model's expected output when only features in $S$ are observed.

Key properties inherited from game theory:

$\sum_j \phi_j(x) = f(x) - \mathbb{E}[f(X)]$: Attributions explain deviation from baseline
$\phi_j(x) = 0$ if feature $j$ is unused by the model
If features $i$ and $j$ are functionally equivalent, $\phi_i(x) = \phi_j(x)$

Exponential Complexity

The exact Shapley formula requires evaluating 2^p coalitions. For p=20 features, that's over a million model evaluations per prediction. For p=100, it's computationally intractable. This is why efficient approximations (TreeSHAP, KernelSHAP) are essential for practical use.

Computational Complexity

The exponential nature of exact Shapley computation is a fundamental barrier. Understanding the complexity landscape guides algorithm and approximation choices.

Exact Computation: O(2^p)

The naive algorithm evaluates all $2^p$ coalitions:

For each player i:
    For each coalition S not containing i:
        Compute v(S) and v(S ∪ {i})
        Accumulate weighted marginal contribution

This requires $O(n \cdot 2^{n-1})$ coalition evaluations. For p = 20 features: ~10 million. For p = 30: ~5 billion. Clearly intractable for even moderate dimensionality.

The Permutation Approach: O(p!)

Enumerating permutations gives the same result:

For each of p! permutations:
    For each position in the permutation:
        Compute marginal contribution
        Add to running sum

This is $O(p!)$ which is even worse than $O(2^p)$ for large $p$ (since $p! > 2^p$ for $p \geq 3$). Not useful for exact computation but inspires sampling-based approximations.

Sampling-Based Approximation

Instead of all permutations, sample $m$ random permutations and average:

$$\hat{\phi}i = \frac{1}{m} \sum{k=1}^{m} [v(S_i^{\pi_k} \cup {i}) - v(S_i^{\pi_k})]$$

where $\pi_k$ is the $k$-th sampled permutation.

Complexity: $O(m \cdot p \cdot T_{model})$ where $T_{model}$ is model inference time.

Error: By Hoeffding's inequality, the estimation error is $O(1/\sqrt{m})$. With $m = 1000$ samples, we get reasonable approximations.

shapley_sampling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
from typing import Callable
 
def shapley_sampling(
    n: int, 
    v: Callable[[set], float], 
    num_samples: int = 1000,
    random_state: int = 42
) -> np.ndarray:
    """
    Approximate Shapley values via permutation sampling.
    
    Parameters
    ----------
    n : number of players
    v : characteristic function
    num_samples : number of permutations to sample
    
    Returns
    -------
    Approximate Shapley values
    """
    rng = np.random.RandomState(random_state)
    players = list(range(n))
    shapley_values = np.zeros(n)
    
    for _ in range(num_samples):
        # Random permutation
        perm = rng.permutation(players)
        
        coalition = set()
        for player in perm:
            # Marginal contribution
            marginal = v(coalition | {player}) - v(coalition)
            shapley_values[player] += marginal
            coalition.add(player)
    
    shapley_values /= num_samples
    return shapley_values
 
 
def shapley_with_ci(
    n: int,
    v: Callable[[set], float],
    num_samples: int = 1000,
    confidence: float = 0.95,
    random_state: int = 42
) -> dict:
    """
    Shapley values with confidence intervals.
    """
    import scipy.stats as stats
    
    rng = np.random.RandomState(random_state)
    players = list(range(n))
    
    # Store all marginal contributions
    marginals = [[] for _ in range(n)]
    
    for _ in range(num_samples):
        perm = rng.permutation(players)
        coalition = set()
        for player in perm:
            marginal = v(coalition | {player}) - v(coalition)
            marginals[player].append(marginal)
            coalition.add(player)
    
    # Compute means and confidence intervals
    results = {}
    z = stats.norm.ppf(0.5 + confidence / 2)
    
    for i in range(n):
        mean = np.mean(marginals[i])
        std = np.std(marginals[i], ddof=1)
        margin = z * std / np.sqrt(num_samples)
        results[i] = {
            'mean': mean,
            'std': std,
            'ci_lower': mean - margin,
            'ci_upper': mean + margin
        }
    
    return results
 
 
# Example usage with convergence analysis
def analyze_convergence(n, v, max_samples=10000):
    """Analyze how estimates converge with sample size."""
    sample_sizes = [10, 50, 100, 500, 1000, 5000, 10000]
    sample_sizes = [s for s in sample_sizes if s <= max_samples]
    
    # Exact value (if small enough)
    if n <= 10:
        exact = shapley_value_exact(n, v)
        print(f"Exact Shapley values: {exact}")
    
    print("
Convergence Analysis:")
    print(f"{'Samples':>8} | {'Approx Values':>40} | {'Max Error':>10}")
    print("-" * 65)
    
    for m in sample_sizes:
        approx = shapley_sampling(n, v, num_samples=m)
        if n <= 10:
            max_error = np.max(np.abs(approx - exact))
            print(f"{m:>8} | {str(approx.round(2)):>40} | {max_error:>10.4f}")
        else:
            print(f"{m:>8} | {str(approx.round(2)):>40}")

Shapley Computation Complexity
Method	Complexity	Exact?	Use Case
Enumerate coalitions	O(2^p)	Yes	p ≤ 15
Enumerate permutations	O(p!)	Yes	p ≤ 10
Permutation sampling	O(m·p·T)	No	Any p, any model
KernelSHAP	O(m·p·T) + regression	No	Any model, optimized
TreeSHAP	O(T·L·D²)	Yes*	Tree ensembles only

Extensions and Variations

Several theoretical extensions of Shapley values address specific ML challenges.

Weighted Shapley Values

The standard Shapley value weights all permutations equally. Weighted Shapley values allow non-uniform weights on player orderings.

In ML, this might reflect:

Prior beliefs about feature importance
Domain-specific knowledge about causal ordering
Regulatory requirements to prioritize certain features

$$\phi_i^w(v) = \sum_{S \subseteq N \setminus {i}} w(S) [v(S \cup {i}) - v(S)]$$

where $w(S)$ is a weighting scheme different from the standard $\frac{|S|!(n-|S|-1)!}{n!}$.

Owen Values (Coalitional Structure)

When players are pre-grouped into unions (e.g., features grouped by type), Owen values provide attribution that respects this structure:

First, attribute value among groups (treating each group as a single player)
Then, attribute each group's value among its members

This is useful when features have natural groupings (demographics, financials, behavioral signals).

Asymmetric Shapley Values

Standard Shapley assumes no ordering among players. Asymmetric Shapley values allow for a partial ordering:

If feature A causally precedes feature B, we might want attributions that respect this: A gets credit before B in all orderings.

Shapley Flow

A recent extension that computes Shapley values on a graph structure representing feature dependencies, providing better handling of correlation.

Active Research Area

These extensions are active research topics. The standard SHAP implementation uses classic Shapley values, but domain-specific applications may benefit from these variations. Check recent literature for implementation details.

Limitations of Shapley Theory

Despite its theoretical elegance, Shapley-based feature attribution has fundamental limitations.

The Correlation Problem

Shapley assumes players are evaluated in isolation when forming coalitions. For correlated features, this creates issues:

Problem: When feature A is "absent," what value should it take? If A and B are correlated, randomly sampling A (interventional) creates unrealistic feature combinations.

Example: If education and income are correlated, conditioning on high income while randomizing education (to make it "absent") produces unlikely scenarios.

Partial solutions:

Use observational/conditional SHAP (computationally expensive)
TreeSHAP's tree_path_dependent method
Grouped SHAP for correlated feature sets

No Causal Semantics

Shapley values measure contribution to model output, not causal effect on the real-world outcome.

Example: A model might use zip code as a predictor. SHAP shows zip code has high importance. But zip code doesn't cause the outcome—it's a proxy for other factors.

Implication: SHAP answers "What does the model use?" not "What truly matters for the outcome?"

Exponential Complexity Remains

Despite approximations:

TreeSHAP only works for tree models
KernelSHAP is slow for large p
Sampling-based methods have variance

For very high-dimensional data (p > 1000), even approximations become challenging.

Key Limitations Summary

•Correlation handling: Feature independence assumption can create unrealistic counterfactuals
•Computational cost: Exact computation is exponential; approximations have trade-offs
•Not causal: Measures model reliance, not real-world causation
•Background sensitivity: Results depend on chosen reference distribution
•Additivity axiom: May not match human intuition in all cases
•Local-only: Each prediction requires separate computation; no amortization

Summary

Shapley theory provides the rigorous mathematical foundation for SHAP values. The key insights are:

Key Takeaways

•Cooperative game theory: Shapley values fairly divide a game's payoff among players based on their marginal contributions
•The Shapley formula: Average marginal contribution across all orderings, weighted by ordering probability
•Uniqueness theorem: Shapley values are the ONLY allocation satisfying efficiency, symmetry, null player, and additivity
•ML translation: Features are players; predictions are payoffs; SHAP values are Shapley values
•Exponential complexity: Exact computation requires O(2^p); approximations are necessary for practical use
•Theoretical guarantees: Local accuracy, consistency, and fairness properties are mathematically proven
•Limitations: Correlation challenges, no causal semantics, background sensitivity persist

Ready for LIME

You now understand the deep theoretical foundations of SHAP. Next, we'll explore LIME (Local Interpretable Model-agnostic Explanations)—an alternative approach that explains predictions by fitting local surrogate models. While LIME lacks SHAP's theoretical guarantees, it offers computational advantages and different interpretability trade-offs.