Loading learning content...
Impurity-based importance tells us which features the model uses internally—but does this translate to what actually matters for prediction? A feature might be heavily used during training splits yet contribute little to the model's ability to generalize. Conversely, a feature rarely selected during training might be the one keeping predictions accurate.
Permutation importance addresses this gap directly. Instead of looking at the model's internal mechanics, it asks a simple but profound question: "What happens to model performance if we break the relationship between this feature and the target?" By randomly shuffling each feature's values and measuring the resulting performance drop, we obtain a measure that reflects true predictive contribution.
By the end of this page, you will understand: (1) The theoretical motivation for permutation importance, (2) The complete algorithm and its implementation, (3) Statistical properties including variance and bias, (4) How to interpret negative importance values, and (5) When permutation importance outperforms impurity-based methods.
The brilliance of permutation importance lies in its simplicity. Consider what happens when you randomly shuffle the values of a single feature across samples:
If the model's performance drops significantly after shuffling, the feature must have been carrying information essential for prediction. If performance barely changes, the feature is either uninformative or redundant with other features.
Imagine you're a model trying to predict whether a customer will churn. If someone secretly shuffled the 'days_since_last_purchase' column, mixing up each customer's value with random other customers, you'd suddenly find that feature useless for prediction—even though it 'looks' the same statistically. Permutation importance quantifies exactly this scenario.
Why shuffling instead of removing?
One might ask: why not just remove the feature entirely and retrain? While theoretically cleaner, this approach has critical drawbacks:
| Approach | Pros | Cons |
|---|---|---|
| Shuffling (Permutation) | No retraining required; Fast; Measures marginal contribution | Doesn't account for model adaptation |
| Removing + Retraining | Accounts for feature interactions; Measures true absence | Expensive (retrain per feature); Model changes confound importance |
Permutation importance strikes an excellent balance: it measures the importance of a feature given the current model structure, which is usually what we care about when interpreting a trained model. The next page covers drop-column importance for when retraining is acceptable.
The algorithm for computing permutation importance is straightforward and elegant:
Algorithm: Permutation Importance
Input: Fitted model f, dataset (X, y), scoring function S, number of permutations K
Output: Importance score for each feature
1. Compute baseline score: score_baseline = S(y, f(X))
2. For each feature j in {1, 2, ..., p}:
a. Initialize importance_j = 0
b. For k = 1 to K:
i. Create X_permuted by randomly shuffling column j of X
ii. Compute permuted score: score_permuted = S(y, f(X_permuted))
iii. importance_j += (score_baseline - score_permuted) / K
c. Store importance_j
3. Return importances for all features
Key design choices:
We define importance as (baseline - permuted), so positive importance means performance DROPPED after permutation (the feature was helpful). If performance improves after permutation, importance is negative—a highly suspicious situation we'll discuss later.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
import numpy as npfrom sklearn.metrics import accuracy_score, r2_scorefrom typing import Callable, Tuple def permutation_importance( model, X: np.ndarray, y: np.ndarray, scoring_func: Callable, n_repeats: int = 10, random_state: int = 42) -> Tuple[np.ndarray, np.ndarray]: """ Compute permutation importance for all features. Args: model: Fitted model with predict method X: Feature matrix (n_samples, n_features) y: Target array (n_samples,) scoring_func: Function(y_true, y_pred) -> score (higher is better) n_repeats: Number of permutation iterations per feature random_state: Random seed for reproducibility Returns: importances_mean: Mean importance for each feature importances_std: Standard deviation of importance estimates """ rng = np.random.RandomState(random_state) n_samples, n_features = X.shape # Compute baseline score baseline_score = scoring_func(y, model.predict(X)) # Store all importance measurements importances = np.zeros((n_features, n_repeats)) for feature_idx in range(n_features): for repeat_idx in range(n_repeats): # Create a copy to avoid modifying original data X_permuted = X.copy() # Randomly shuffle this feature's values X_permuted[:, feature_idx] = rng.permutation(X[:, feature_idx]) # Score with permuted feature permuted_score = scoring_func(y, model.predict(X_permuted)) # Importance = drop in performance importances[feature_idx, repeat_idx] = baseline_score - permuted_score importances_mean = importances.mean(axis=1) importances_std = importances.std(axis=1) return importances_mean, importances_std # Example usage with sklearn's built-in implementationfrom sklearn.inspection import permutation_importance as sklearn_perm_impfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_split # Generate dataX, y = make_classification( n_samples=1000, n_features=10, n_informative=5, n_redundant=2, random_state=42)X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.3, random_state=42) # Train modelrf = RandomForestClassifier(n_estimators=100, random_state=42)rf.fit(X_train, y_train) # Compute permutation importance on VALIDATION setresult = sklearn_perm_imp( rf, X_val, y_val, n_repeats=20, random_state=42, n_jobs=-1) # Display resultsprint("Permutation Importance (Validation Set)")print("=" * 50)for i in np.argsort(result.importances_mean)[::-1]: print(f"Feature {i}: {result.importances_mean[i]:.4f} " f"± {result.importances_std[i]:.4f}")A critical decision when computing permutation importance is whether to use the training set or a held-out validation/test set. This choice has profound implications for what the importance scores actually measure.
Training set permutation importance:
Validation/test set permutation importance:
Always compute permutation importance on held-out data (validation or test set) when evaluating feature importance for prediction. Training set importance can dramatically overstate the value of noise features and features that enabled overfitting.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
import numpy as npimport pandas as pdfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.inspection import permutation_importancefrom sklearn.model_selection import train_test_split # Create dataset with informative AND pure noise featuresnp.random.seed(42)n_samples = 1000n_informative = 5n_noise = 5 # Informative featuresX_informative = np.random.randn(n_samples, n_informative)y = (X_informative.sum(axis=1) > 0).astype(int) # Pure noise features (no relationship to target)X_noise = np.random.randn(n_samples, n_noise) # CombineX = np.hstack([X_informative, X_noise])feature_names = [f"informative_{i}" for i in range(n_informative)] + [f"noise_{i}" for i in range(n_noise)] # SplitX_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.3, random_state=42) # Train a deep tree (prone to overfitting)rf = RandomForestClassifier( n_estimators=100, max_depth=None, # Fully grown trees - can overfit random_state=42)rf.fit(X_train, y_train) print(f"Training accuracy: {rf.score(X_train, y_train):.4f}")print(f"Validation accuracy: {rf.score(X_val, y_val):.4f}")print() # Compute importance on TRAINING setperm_train = permutation_importance( rf, X_train, y_train, n_repeats=20, random_state=42) # Compute importance on VALIDATION set perm_val = permutation_importance( rf, X_val, y_val, n_repeats=20, random_state=42) # Compare resultsresults = pd.DataFrame({ 'feature': feature_names, 'train_importance': perm_train.importances_mean, 'val_importance': perm_val.importances_mean, 'true_type': ['informative'] * n_informative + ['noise'] * n_noise}) print("Training vs Validation Permutation Importance")print("=" * 70)print(results.sort_values('train_importance', ascending=False).to_string(index=False))print()print("🔍 Key Insight: Notice how noise features may show higher importance")print(" on training data due to overfitting, but correctly show low importance")print(" on validation data.")Understanding the statistical behavior of permutation importance helps us interpret results correctly and design reliable analyses.
Variance in permutation importance:
The importance estimate varies based on:
By repeating the permutation process K times, we can estimate the variance in our importance estimate:
$$Var(\hat{I}j) = \frac{1}{K-1} \sum{k=1}^{K} (I_j^{(k)} - \bar{I}_j)^2$$
This variance estimate is crucial for determining whether observed importance differences are statistically significant.
With K ≥ 20 repetitions, you can construct approximate 95% confidence intervals as mean ± 1.96 × std. Features whose confidence intervals overlap likely don't differ significantly in importance.
Bias considerations:
Permutation importance has several potential sources of bias:
Feature correlation bias: When features are correlated, permuting one feature can create unrealistic data points that the model has never seen. This can either inflate or deflate importance unpredictably.
Extrapolation bias: If permutation creates out-of-distribution inputs (e.g., a height of 2 meters paired with a weight of 30 kg), the model's predictions become unreliable, and the measured importance reflects extrapolation behavior rather than true importance.
Distribution shift bias: If the dataset's feature distribution is skewed, different samples contribute unequally to the importance calculation.
Handling correlated features:
For highly correlated features, consider these approaches:
| Approach | Description | When to Use |
|---|---|---|
| Group permutation | Shuffle correlated features together | When features are known to be dependent |
| Conditional permutation | Shuffle only within local neighborhoods | When maintaining realistic combinations matters |
| SAGE/SHAP values | Proper Shapley-based attribution | When rigorous causal attribution is needed |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102
import numpy as npfrom scipy import statsfrom sklearn.inspection import permutation_importance def compute_importance_with_significance(model, X, y, n_repeats=30, scoring='accuracy', alpha=0.05): """ Compute permutation importance with statistical significance testing. Args: model: Fitted model X: Feature matrix y: Target vector n_repeats: Number of permutations (higher = more stable) scoring: Scoring metric name alpha: Significance level Returns: Dictionary with importance statistics and significance """ result = permutation_importance( model, X, y, n_repeats=n_repeats, scoring=scoring, n_jobs=-1 ) n_features = X.shape[1] # Compute test statistic: is mean significantly different from 0? # Under null hypothesis (feature unimportant), mean importance = 0 t_stats = [] p_values = [] significant = [] for j in range(n_features): importances_j = result.importances[j] # One-sample t-test against 0 t_stat, p_val = stats.ttest_1samp(importances_j, 0) t_stats.append(t_stat) p_values.append(p_val) # Is feature significantly important? (one-sided: importance > 0) # Using one-sided p-value for "is importance positive" p_one_sided = p_val / 2 if t_stat > 0 else 1 - p_val / 2 significant.append(p_one_sided < alpha) # Compute confidence intervals n = n_repeats ci_factor = stats.t.ppf(1 - alpha/2, df=n-1) ci_half_width = ci_factor * result.importances_std / np.sqrt(n) return { 'mean': result.importances_mean, 'std': result.importances_std, 'ci_lower': result.importances_mean - ci_half_width, 'ci_upper': result.importances_mean + ci_half_width, 't_statistic': np.array(t_stats), 'p_value': np.array(p_values), 'significant': np.array(significant) } # Example: Significance testing for feature importancefrom sklearn.ensemble import RandomForestClassifierfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_split # Create data with clearly informative and uninformative featuresX, y = make_classification( n_samples=500, n_features=10, n_informative=4, n_redundant=0, n_clusters_per_class=1, random_state=42) X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.3, random_state=42) rf = RandomForestClassifier(n_estimators=100, random_state=42)rf.fit(X_train, y_train) # Compute importance with significanceresults = compute_importance_with_significance( rf, X_val, y_val, n_repeats=50) print("Permutation Importance with Statistical Significance")print("=" * 75)print(f"{'Feature':<10} {'Mean':>10} {'95% CI':>20} {'p-value':>12} {'Sig?'}")print("-" * 75) for i in range(len(results['mean'])): ci_str = f"[{results['ci_lower'][i]:.4f}, {results['ci_upper'][i]:.4f}]" sig_str = "✓" if results['significant'][i] else "✗" p_val = results['p_value'][i] p_str = f"{p_val:.4f}" if p_val >= 0.0001 else "<0.0001" print(f"Feature {i:<2} {results['mean'][i]:>10.4f} {ci_str:>20} {p_str:>12} {sig_str:>5}")One of the most perplexing results in permutation importance analysis is encountering negative importance values. This occurs when the model performs better after shuffling a feature—an apparently paradoxical result.
What negative importance means:
Negative importance indicates that the feature, as used by the model, is actually hurting prediction performance. When the feature's relationship to the target is broken through shuffling, the model makes better predictions. This can happen for several reasons:
Strongly negative importance (not just near-zero) is a serious red flag. It suggests: (1) severe overfitting, (2) data leakage during training, (3) feature engineering bugs, or (4) train/test distribution mismatch. Investigate thoroughly before deploying the model.
How to respond to negative importance:
| Scenario | Likely Cause | Recommended Action |
|---|---|---|
| Slightly negative (-0.001 to 0) | Random noise | Treat as zero importance; safe to ignore |
| Moderately negative (-0.01 to -0.001) | Mild overfitting or noise | Consider regularization; validate on more data |
| Strongly negative (< -0.01) | Serious overfitting or leakage | Investigate feature; check for data leakage; consider removal |
A worked example:
Imagine a model predicting customer satisfaction. One feature is 'customer_service_email' (the exact email address customers used to contact support). During training, the model memorizes which email addresses correspond to satisfied customers. On held-out data, these memorized associations don't generalize—in fact, they're misleading. Shuffling this feature breaks the spurious associations, and prediction improves.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
import numpy as npimport pandas as pdfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.inspection import permutation_importancefrom sklearn.model_selection import train_test_split def analyze_suspicious_features(model, X_val, y_val, feature_names=None, n_repeats=30, threshold=-0.005): """ Identify and analyze features with suspiciously negative importance. Args: model: Fitted model X_val: Validation features y_val: Validation target feature_names: List of feature names n_repeats: Number of permutation repeats threshold: Importance below this is flagged as suspicious Returns: DataFrame of suspicious features with diagnostic info """ if feature_names is None: feature_names = [f"feature_{i}" for i in range(X_val.shape[1])] result = permutation_importance( model, X_val, y_val, n_repeats=n_repeats, n_jobs=-1 ) # Find suspicious features suspicious_mask = result.importances_mean < threshold suspicious_indices = np.where(suspicious_mask)[0] if len(suspicious_indices) == 0: print("✅ No features with suspiciously negative importance found.") return None suspicious = [] for idx in suspicious_indices: # Check how consistently negative the importance is neg_fraction = (result.importances[idx] < 0).mean() suspicious.append({ 'feature': feature_names[idx], 'index': idx, 'mean_importance': result.importances_mean[idx], 'std_importance': result.importances_std[idx], 'fraction_negative': neg_fraction, 'min_importance': result.importances[idx].min(), 'max_importance': result.importances[idx].max(), }) df = pd.DataFrame(suspicious) df = df.sort_values('mean_importance') print("⚠️ Suspicious Features Detected!") print("=" * 80) for _, row in df.iterrows(): print(f"\n{row['feature']} (index {row['index']}):") print(f" Mean importance: {row['mean_importance']:.4f}") print(f" Fraction of trials with negative importance: {row['fraction_negative']:.1%}") print(f" Range: [{row['min_importance']:.4f}, {row['max_importance']:.4f}]") if row['fraction_negative'] > 0.95: print(" 🔴 CRITICAL: Consistently harmful - investigate data leakage") elif row['fraction_negative'] > 0.7: print(" 🟠 WARNING: Frequently harmful - likely overfitting") else: print(" 🟡 CAUTION: Sometimes harmful - may be noise") return df # Demonstration: Create a feature that causes overfittingnp.random.seed(42)n_samples = 500 # Genuine featuresX_good = np.random.randn(n_samples, 5)y = (X_good[:, 0] + X_good[:, 1] > 0).astype(int) # Overfitting-prone feature: random ID that happens to correlate with y# in training but won't generalizeoverfit_feature = np.random.randn(n_samples) # Pure noise featurenoise_feature = np.random.randn(n_samples) X = np.column_stack([X_good, overfit_feature, noise_feature])feature_names = [f"good_{i}" for i in range(5)] + ["overfit_prone", "pure_noise"] # Split dataX_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42) # Train deep tree that can overfitrf = RandomForestClassifier(n_estimators=100, max_depth=20, random_state=42)rf.fit(X_train, y_train) # Analyzeprint(f"Training accuracy: {rf.score(X_train, y_train):.4f}")print(f"Validation accuracy: {rf.score(X_val, y_val):.4f}")print() suspicious_df = analyze_suspicious_features( rf, X_val, y_val, feature_names=feature_names, threshold=-0.001)Now that we understand both methods, let's directly compare their properties, strengths, and appropriate use cases.
Fundamental difference:
These can diverge significantly! A feature might be heavily used in splits but contribute little to generalization (overfitting), or rarely used but crucial when it is (high-value specialized feature).
| Aspect | Impurity-Based (MDI) | Permutation-Based |
|---|---|---|
| Computation cost | Free (computed during training) | O(n × p × K) — must run inference p × K times |
| Measures | Training-time usage patterns | Validation-time predictive contribution |
| Can detect overfitting? | No — can't distinguish helpful from harmful | Yes — negative importance reveals overfitting |
| Affected by cardinality? | Yes — strong bias toward high-cardinality | No — directly measures performance impact |
| Model-agnostic? | No — only for tree-based models | Yes — works for any model with predict() |
| Handles feature correlation | Poorly — splits importance arbitrarily | Poorly — can create unrealistic combinations |
| Reproducibility | Deterministic (given model) | Stochastic — varies with permutation seed |
| Sign of values | Always non-negative | Can be negative (feature hurts predictions) |
Use impurity-based when: you need quick feature screening, want to understand the model's internal logic, or are doing initial exploration. Use permutation-based when: you need reliable importance for feature selection, want to detect overfitting, or care about generalization performance.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.inspection import permutation_importancefrom sklearn.model_selection import train_test_split def compare_importance_methods(X, y, feature_names=None, random_state=42): """ Compare impurity-based and permutation importance for the same model. Demonstrates scenarios where they agree/disagree. """ if feature_names is None: feature_names = [f"F{i}" for i in range(X.shape[1])] X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.3, random_state=random_state ) # Train model rf = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=random_state) rf.fit(X_train, y_train) # Impurity-based importance imp_impurity = rf.feature_importances_ # Permutation importance (on validation set) perm_result = permutation_importance( rf, X_val, y_val, n_repeats=30, random_state=random_state, n_jobs=-1 ) imp_permutation = perm_result.importances_mean # Create comparison DataFrame comparison = pd.DataFrame({ 'feature': feature_names, 'impurity': imp_impurity, 'permutation': imp_permutation, }) # Compute correlation and disagreement correlation = np.corrcoef(imp_impurity, imp_permutation)[0, 1] # Rank each feature by both methods comparison['rank_impurity'] = comparison['impurity'].rank(ascending=False) comparison['rank_permutation'] = comparison['permutation'].rank(ascending=False) comparison['rank_diff'] = abs(comparison['rank_impurity'] - comparison['rank_permutation']) print("Importance Method Comparison") print("=" * 70) print(f"Model accuracy: Train={rf.score(X_train, y_train):.4f}, " f"Val={rf.score(X_val, y_val):.4f}") print(f"Importance correlation: {correlation:.4f}") print() print(comparison.sort_values('rank_impurity').to_string(index=False)) # Identify disagreements big_disagreements = comparison[comparison['rank_diff'] >= 3] if len(big_disagreements) > 0: print(f"\n⚠️ Features with rank difference >= 3:") for _, row in big_disagreements.iterrows(): print(f" {row['feature']}: " f"Impurity rank={int(row['rank_impurity'])}, " f"Permutation rank={int(row['rank_permutation'])}") return comparison # Create a scenario where methods disagreenp.random.seed(42)n_samples = 1000 # Feature 1: High cardinality, moderately predictive (impurity will overrate)high_card = np.random.randn(n_samples) # Feature 2: Binary, highly predictive (impurity may underrate)binary_strong = (np.random.randn(n_samples) > 0).astype(float) # Feature 3: Continuous, weak predictorcontinuous_weak = np.random.randn(n_samples) * 0.3 # Target depends mostly on binary featurey = (binary_strong * 2 + high_card * 0.5 + continuous_weak + np.random.randn(n_samples) * 0.5 > 1).astype(int) X = np.column_stack([high_card, binary_strong, continuous_weak])feature_names = ['high_cardinality', 'binary_strong', 'continuous_weak'] # Run comparisoncomparison = compare_importance_methods(X, y, feature_names) print("\n📊 Analysis:")print("- 'high_cardinality' has many unique values → Impurity importance inflated")print("- 'binary_strong' has only 2 values → Impurity importance deflated")print("- Permutation importance correctly reflects true predictive value")Permutation importance is more computationally expensive than impurity-based importance. Understanding the cost structure helps plan efficient analyses.
Time complexity:
For a model with inference time $T_{predict}$ on dataset of size $n$:
$$T_{permutation} = \mathcal{O}(p \times K \times T_{predict})$$
where $p$ is the number of features and $K$ is the number of permutation repeats.
Practical example:
Optimization strategies:
n_jobs=-1 in sklearn)123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
import numpy as npimport timefrom sklearn.inspection import permutation_importancefrom sklearn.ensemble import RandomForestClassifier def timed_permutation_importance(model, X, y, n_repeats, n_jobs=1): """Time permutation importance computation.""" start = time.time() result = permutation_importance( model, X, y, n_repeats=n_repeats, n_jobs=n_jobs, random_state=42 ) elapsed = time.time() - start return result, elapsed def efficient_importance_analysis(model, X, y, feature_names=None, initial_k=5, final_k=30, subsample_size=2000, top_n=10): """ Two-stage efficient permutation importance analysis. Stage 1: Quick screening with subsampled data and few repeats Stage 2: Detailed analysis of top features Args: model: Fitted model X, y: Full validation data feature_names: Optional feature names initial_k: Repeats for initial screening final_k: Repeats for detailed analysis subsample_size: Samples for initial screening top_n: Number of top features for detailed analysis Returns: DataFrame with importance results """ if feature_names is None: feature_names = [f"F{i}" for i in range(X.shape[1])] n_samples = X.shape[0] n_features = X.shape[1] # Stage 1: Quick screening print("Stage 1: Quick screening...") if n_samples > subsample_size: idx = np.random.choice(n_samples, subsample_size, replace=False) X_sub, y_sub = X[idx], y[idx] else: X_sub, y_sub = X, y result_quick, time_quick = timed_permutation_importance( model, X_sub, y_sub, n_repeats=initial_k, n_jobs=-1 ) print(f" Completed in {time_quick:.2f}s") # Identify top features for detailed analysis top_indices = np.argsort(result_quick.importances_mean)[::-1][:top_n] print(f" Top {top_n} features identified: {[feature_names[i] for i in top_indices]}") # Stage 2: Detailed analysis of top features only print(f"\nStage 2: Detailed analysis of top {top_n} features...") # For detailed analysis, we only permute the top features detailed_importances = {} start_detail = time.time() for idx in top_indices: # This is a simplified version - in production you'd modify # the sklearn implementation to only permute selected features X_work = X.copy() scores = [] baseline = model.score(X, y) for k in range(final_k): X_work[:, idx] = np.random.permutation(X[:, idx]) scores.append(model.score(X_work, y)) detailed_importances[feature_names[idx]] = { 'mean': baseline - np.mean(scores), 'std': np.std([baseline - s for s in scores]) } time_detail = time.time() - start_detail print(f" Completed in {time_detail:.2f}s") # Combine results print(f"\nTotal time: {time_quick + time_detail:.2f}s") print(f" (vs estimated full analysis: {n_features * final_k / initial_k * time_quick / 60:.1f} min)") return detailed_importances # Example usagefrom sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_split # Large datasetX, y = make_classification( n_samples=10000, n_features=100, n_informative=20, n_redundant=10, random_state=42)X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42) rf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)rf.fit(X_train, y_train) # Compare: Full analysis vs Two-stageprint("=" * 60)print("Two-Stage Efficient Analysis")print("=" * 60)efficient_results = efficient_importance_analysis( rf, X_val, y_val, initial_k=5, final_k=30, subsample_size=2000, top_n=15)Permutation importance provides a powerful, model-agnostic approach to measuring feature significance based on actual predictive contribution rather than training-time usage patterns. Let's consolidate the key insights:
What's next:
We've now covered the two most common feature importance methods: impurity-based (fast but biased) and permutation-based (reliable but slower). The next page explores Drop-Column Importance—a method that measures what happens when we completely retrain the model without each feature. While computationally expensive, it provides the cleanest measure of feature value when training adaptation matters.
You now understand how to compute, interpret, and apply permutation importance. You can distinguish it from impurity-based importance, recognize when to use each method, handle negative importance values appropriately, and design efficient analyses for large-scale feature importance studies.